Closed MathewBiddle closed 1 year ago
FWIW, I think this is only an issue when someone is trying to return more than the limit. For example,
pyobis.occurrences.search(datasetid = 'c606c47a-3892-4645-9521-630c9085e59f', fields = fields).execute()
works as expected.
Fetching: [████████████████████████████████████████████████████████████████████████████████████████████████████] 2341/2341
Fetched 2341 records.
Thanks for raising this issue.
When records exceed the threshold of 10k, the package does pagination in the background. For paginating we require the occurrence id
or id
field because that can uniquely identify records and is required by the API to successfully fetch subsequent batch of records. When we pass a fields
parameter, the API returns only those attributes in the JSON response. But id
is still required to paginate. This is why the process throws an error of a missing id
. The simple workaround would be to include id
too in the fields
list.
Something like this, could work
institution_id = 23070
fields = ['decimalLatitude','decimalLongitude', 'id']
occurrences.search(instituteid = institution_id, fields=fields).execute()
Thanks for digging into this and figuring out a quick solution! It seems to be working as expected now.
Two thoughts:
'id'
when using fields
parameter? Would it be reasonable to add to the package a mechanism automatically includes 'id' when using fields parameter?
This seems reasonable to me unless there a case where you would want to exclude id
. I think the only usage is to get only one page of results; if that is wanted it should be implemented differently. I am renaming this issue so we can use it to track this as feature request.
- We should, at least, update the docs to highlight this. Not sure where to add it, however.
Yes, I think the docs need to be updated with all the fixes we have seen so far.
- Would it be reasonable to add to the package a mechanism automatically includes
'id'
when usingfields
parameter?
I think we should but I have had one observation especially regarding the coordinates
querying. When we query for only decimalLongitude
or decimalLatitude
then unique values are presented, which would barely make more than 10k for any taxa. Even though we have 20k occurrence records, only a dozen of unique coordinates
are presented. So, adding id
there would add to high network overhead.
Maybe we can just introduce a paginate=False
parameter so that it just doesn't add up id
and is quite fast. But by default it will be set to True
and we will add id
automatically. Thoughts?
Yep, that sounds even better.
EDIT: The request works now.
~Digging into this issue today, and the API returned no records even though the total is still a very large count :)~
The URL I pinged: https://api.obis.org/v3/occurrence?fields=%5B%27decimalLatitude%27%2C+%27decimalLongitude%27%5D&instituteid=23070
I'm trying to query for records associated with an institution and only returning the latitude/longitude pairs. Unfortunately, it looks like
outdf
is missing the keyid
, but I'm wondering if this should bedatasetid
or something along those lines?Code snippet:
returns