Closed hmorzaria closed 8 years ago
This is a consequence of changes to the Elastic Search API (where the data is stored) that our API is layered on top of that the ridigbio package uses.
The issue that is open on our API is here:
https://github.com/iDigBio/idigbio-search-api/issues/18
The R package throwing an error due to parsing JSON is not great behavior but that comes from the error message being returned not being in JSON; I'd need to add another layer of checks to the responses to fix that. Doing so would make it more clear what is going on and where the issue comes from.
I saw that issue: Elasticsearch error when trying to pull windows larger than 10k records. #18 in the API is now closed. The fix, increasing limit to 100k did not resolve this issue.
In the case of a polygon search, my workaround is to recursively reduce the area being searched until no error is returned
The 100k record limit for both the API and ridigbio (and the Python client as well) is unfortunately going to remain in place, perhaps permanently. Removing it will require our rewriting parts of the API code significantly and we can't commit to a timeline for doing that.
Have you tried working with the download API? https://www.idigbio.org/wiki/index.php/IDigBio_Download_API
There currently is no interface in ridigbio that wraps around the download API but we are in the process of changing the way downloads are generated so adding one will be much easier for us to do in the future.
So, your work-around of limiting your spatial extent to something that returns less that 100k records is something you will continue to have to do when using R.
I'll leave this open until I push a package update that removes the max_items parameter and issues a human readable error when there are more than 100k results.
Looping in @sckott , author of spocc so he knows about this too.
Thanks for the heads up. I'll have a look in spocc
and see if I need to change anything
When trying to download all records from a country, in this case Mexico, specifying as max_items the number of total records
rec_count <- try(idig_count_records(rq=list(country=eachcountry, geopoint=list(type="exists")), fields=c("scientificname", "geopoint")))
df1 <- try(idig_search_records(rq=list(country=eachcountry, geopoint=list(type="exists")), fields=c("scientificname", "geopoint"),max_items = 1306922))
df1 <- try(idig_search_records(rq=list(country=eachcountry, geopoint=list(type="exists")), fields=c("scientificname", "geopoint"),max_items = 1306923))
df1 <- try(idig_search_records(rq=list(country=eachcountry, geopoint=list(type="exists")), fields=c("scientificname", "geopoint"),limit = 5000, offset = 45000))
head(df1)
scientificname geopoint.lon geopoint.lat 1 echinocereus maritimus -115.7390 30.04060 2 atlapetes pileatus pileatus -101.6837 19.51897
df1 <- try(idig_search_records(rq=list(country=eachcountry, geopoint=list(type="exists")), fields=c("scientificname", "geopoint"),limit = 5000, offset = 50000))