osome-iu / hoaxy-backend

Backend component for Hoaxy, a tool to visualize the spread of claims and fact checking
http://hoaxy.iuni.iu.edu/
GNU General Public License v3.0
139 stars 44 forks source link

GET Articles queries only return 100 results (Hoaxy API) #45

Closed ogelin closed 4 years ago

ogelin commented 4 years ago

When querying the API, getting articles only return up to 100 results (either the 100 most recent or 100 most relevant results).

I suggest adding a query param to specify how many results are required.

This discussion was opened in #28, but should be considered as a separate issue. The inability to obtain all articles corresponding to a query is extremely limiting to API users.

filmenczer commented 4 years ago

We understand this need. We can consider this request from two perspectives:

  1. The version of Hoaxy installed at IU (hoaxy.iuni.iu.edu) must have this limit so that our back-end does not get overwhelmed and is able to respond to queries from the front-end.

  2. In the open-source Hoaxy software (this repo), we could consider adding such a parameter, but then there would also have to be a system parameter to set a maximum value allowed (e.g., is the maximum is 100 and you ask for 1000, you would still only get 100 or an error).

The change described in (2) above would only be useful to someone who deploys their own version of Hoaxy on their servers. @ogelin, would it be a useful addition?

ogelin commented 4 years ago

2 is still pointless if it's impossible to get all the results. It would be best to at least increase from 100 in both 1 and 2. 100 is very small. I doubt incrementing to 1000 or even 10000 would actually overwhelm the back end.

I may also recommend updating how the back-end is hosted. Parallel computing would also prevent the system overloading.

glciampaglia commented 4 years ago

Breaking up the results in smaller subsets (i.e., pagination) would solve the issue. But it would entail a significant intervention on the backend. Unfortunately the middleware we use to implement the API (Flask) does not provide pagination by default. There are third-party extensions that do provide it (e.g. Flask-SQLAlchemy), but in the particular case of the articles endpoint the problem is that the articles are not being fetched directly from the DB, but from Lucene (hence the need to limit the results). So I suspect that it would still require us to write custom code for it. That's as far as I can tell, of course.

Regardless of whether we want to just add a parameter (per Fil's suggestion), or go for full-blown pagination, I think that giving some way to the user to control the number of results is still a useful feature though, so we should at least discuss whether we can implement some feature that addresses it in the long term.

ogelin commented 4 years ago

Thanks! At least incrementing the max value would already be helpful. I still find 100 surprisingly small.

filmenczer commented 4 years ago

We discussed the issue. We are unable to change the number of articles returned by the API on our own installation of Hoaxy.

@ogelin you are welcome to install Hoaxy on your own server and change the number of results returned. This is a parameter that you will find here: https://github.com/IUNetSci/hoaxy-backend/blob/master/hoaxy/ir/search.py#L148

Closing the issue.