Closed Mardak closed 10 years ago
It seems that the data store doesn't quite store/return things by most recent. I think it's just that it happens to get more recent items inserted before the others. For example currently:
https://headliner.mozillalabs.com/nytimes/mostpopular/Technology.json
Shows "url": "http://www.nytimes.com/2014/01/23/technology/personaltech/review-the-roomba-880-from-irobot.html?src=recmoz" then "url": "http://www.nytimes.com/2014/01/30/technology/personaltech/on-facebook-deciding-who-knows-youre-a-dog.html?src=recmoz"
And there's no explicit date/time field to sort by.
There's sub-optimal time ordering example: interests: '{"Programming":0.25,"Sports":0.25,"Autos":0.25,"Arts":1} List of suggested urls: 2014-01-30/technology/personaltech/on-facebook-deciding-who-knows-youre-a-dog.html 2014-01-27/sports/committing-to-play-for-a-college-then-starting-9th-grade.html 2014-01-27/automobiles/makers-pack-new-cars-with-technology-but-younger-buyers-shrug.html 2014-01-26/automobiles/autoreviews/the-ecstasy-of-excess-the-agony-of-the-sticker.html 2014-01-23/technology/personaltech/review-the-roomba-880-from-irobot.html 2013-12-05/sports/baseball/three-rings-erase-sting-of-losing-ellsbury.html 2013-11-28/arts/saul-leiter-photographer-with-a-palette-for-new-york-dies-at-89.html 2013-11-20/arts/monty-python-troupe-to-reunite-for-live-shows.html 2013-11-19/arts/syd-field-author-of-the-definitive-work-on-writing-screenplays-is-dead-at-77.html 2013-11-19/arts/barbara-park-author-of-junie-b-jones-series-dies-at-66.html
Note that all arts are pushed to the bottom.
This use case forced us to move back to randomization of interests,
We can make sure the articles are sorted by date. I don't understand why we need to randomize.
The articles are store by receipt time: https://github.com/oyiptong/up-headliner/blob/master/up/headliner/data.py#L34
This is to mimic the behavior that occurs when articles are received.
The reasoning is that news that get in the "most popular" list are those that are gaining momentum. We are returning results by the relevance to popular opinion, not by the time the article was published.
e.g. a moving profile written about Bill Clinton in 1998 in suddenly comes to light.
That's how the "Most Popular" API works. The newest articles to make the list are not necessarily the most recently published ones.
Instead of just fetching articles and concating them by interest, we'll want to make sure they're sorted by time overall.
This is as opposed to the original suggestion Mardak/profile/issues/23 to randomize.