oyiptong / up-headliner

Headliner is a JSON API that returns personalized content obtained from providers
Mozilla Public License 2.0
0 stars 2 forks source link

Server does not always honor limit=N parameter #9

Closed mzhilyaev closed 10 years ago

mzhilyaev commented 10 years ago

The headliner server does not always honor limit=10 specified in the server URL For example, this request will return 62 articles curl --data-ascii '{"Programming":1,"Sports":1}' -H "Content-Type:application/json" "http://127.0.0.1:4355/nytimes/mostpopular/personalize?limit=10" While this requests returns 10 articles curl --data-ascii '{"Programming":1.0,"Sports":1.0}' -H "Content-Type:application/json" "http://127.0.0.1:4355/nytimes/mostpopular/personalize?limit=10"

The difference is rank format in json post data: integer ranks vs. float ranks

When the interests have same ranks the number of stories returned seem to relate to how many interests are being submitted. For example for 4 equally ranked interests, the server returns only 8 articles. curl --data-ascii '{"Programming":0.25,"Sports":0.25,"Autos":0.25,"Arts":0.25}' -H "Content-Type:application/json" "http://127.0.0.1:4355/nytimes/mostpopula/personalize?limit=10"

Mardak commented 10 years ago

I wonder if this is related to how we sometimes end up with way more than 20 articles. I remember seeing a default of 20 here: https://github.com/oyiptong/up-headliner/blob/master/up/headliner/content/nytimes/urls.py#L23

Mardak commented 10 years ago

The 4 equally ranked is actually caused by https://github.com/oyiptong/up-headliner/blob/master/up/headliner/content/nytimes/urls.py#L56

It takes the floor, so .25 * 10 = 2 articles for the interest. And with 4 interests = 8 articles

Mardak commented 10 years ago

Actually on that same line, I believe if the weights are integers, it causes the weight/weight_total to be an integer divide truncating to 0. If it was limit * first, it would result in a non-0 integer. Although from the previous comment, we might want something smarter to avoid several truncates resulting in fewer than requested items.

Mardak commented 10 years ago

This line triggers the much-more-than-20 articles: https://github.com/oyiptong/up-headliner/blob/master/up/headliner/content/nytimes/urls.py#L63

fetch is being called with a limit of 0 for all the interests, and 0/None limit results in +inf/-inf as the max/min limits.