iobis / pyobis

OBIS Python client
https://iobis.github.io/pyobis
MIT License
14 stars 10 forks source link

incomplete fetch of checklist from `/checklist` endpoint #93

Closed ayushanand18 closed 1 year ago

ayushanand18 commented 1 year ago

Overview

When we try to generate an OBIS checklist using pyobis.checklist -> list() function, then it returns a checklist of size at max 10. This behavior is due to the fact that OBIS API returns by default returns a list of 10 records only in a query. To fetch subsequent records, we need to pass a skip parameter to skip the number of records already fetched.

For example, let us look at this request

https://api.obis.org/v3/checklist?size=10&skip=10&taxonid=1363

It fetches subsequent 10 records after first 10 have been fetched.

To reciprocate

Run

from pyobis.checklist import ChecklistQuery
ChecklistQuery().list(taxonid=1363)["total"] # total records
len(ChecklistQuery().list(taxonid=1363)["results"]) # total fetched

Note: This is not something mentioned in the documentation, and I got this insight thanks to OBIS Mapper.

ayushanand18 commented 1 year ago

We need to include a pagination process similar to occurrences.search here also. I'm writing a patch for this.

ayushanand18 commented 1 year ago

An interesting finding, although I couldn't understand it. I queried checklist for taxonid 1363. The total was showing 2140 yet when I run this query I get zero results.

https://api.obis.org/v3/checklist?taxonid=1363&skip=2129&size=5000

output

{"total":2140,"results":[]}

Something weird and outside my understanding. Please help.

7yl4r commented 1 year ago

I wonder if this is because Dorylaimina is an Order. The total could be a count of species within the order.

@pieterprovoost : can you shed some light on this?

pieterprovoost commented 1 year ago

@7yl4r Elasticsearch approximates the cardinality for better performance, so it's best to paginate until the result set is empty. In this case there are 2129 taxa. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_precision_control.

ayushanand18 commented 1 year ago

Noted, Thanks @pieterprovoost for the info!