monarch-initiative / biolink-api

API for linked biological knowledge
https://api.monarchinitiative.org/api/
BSD 3-Clause "New" or "Revised" License
63 stars 25 forks source link

Supporting pagination #265

Closed deepakunni3 closed 5 years ago

deepakunni3 commented 5 years ago

While we expose arguments for pagination, how exactly does one go about doing this? Would be good to have some examples making use of pagination.

There seems to be a performance benefit of hitting the Solr with smaller queries than doing it in-bulk.

Also, when there is a query for thousands of associations, it looks like the Solr spends a lot of time thinking (and timing out) even if we specify a rows parameter.

Example:

https://api.monarchinitiative.org/api/bioentity/phenotype/FBcv:0001347?fetch_objects=true&unselect_evidence=false&exclude_automatic_assertions=false&use_compact_associations=false&get_association_counts=true&rows=100

The term FBcv:0001347 is a 'phenotype' concept. There are 22939 associated genes, 161977 associated genotypes and 104001 associated variants.

Thinking out loud: Can we improve the behavior?

kennethbruskiewicz commented 5 years ago

I agree that this is critical.

In order to support software architectures that can stand a chance of taming this data-glut, such as an architecture that can interleave multiple requests from a single vendor to parallelize the task, we need pagination.