Closed sckott closed 4 years ago
I can reproduce but am not sure beyond that. Crossing 180/-180 seems likely the issue. I'll note that searching for "pollen" on https://www.pangaea.de/ and setting the bounding box there also prevents this record from being returned. So I think this might be an API issue, not a pangaear issue.
Reversing the order of longitudes with number of counts 60 finds it.
pg_search(query = "pollen", bbox = c(-171.700000, 42.320000, 51.840000, 74.550000), count = 60)
But, then most of the datasets that are pulled together with it are from the west.
Otherwise, even a composite query such as: "sediment pollen fossil", "sediment pollen", "fossil pollen", and "pollen", each with 500 counts and offset 0, 500, ...2500 cannot find the dataset.
So, it seems as a 180/-180 issue.
Thanks @karawoo I've also tried searching with longitudes in a 0-360 to see if that works, but doesn't. It's not clear if the bbox in the "Coverage" section of a dataset has to be completely encompassed by a search or not. I imagine it does have to be since this search doesn't find that dataset?
@kbh022 that bbox c(-171.700000, 42.320000, 51.840000, 74.550000)
does seem more correct, since it should be minlon, minlat, maxlon, maxlat.
Hi this looks like an issue on PANGAEA's side. It happens for datasets which cross dateline (its bbox). Our code does queries including date line correct, but the combination is broken.
I will work on a fix.
The Soap Api of PANGAEA allows 3 types of searches: intersection, full included and mean only. The web site only offers the first variant, so the bounding boxes of dataset and query need to overlap for a match. The score is ranked by distance between center of search box and mean point of dataset. This will score datasets that overlap more with higher factor. This is why you see different order if you invert the box. It then matches (because of this bug), but the score gets very low (as it's far outside the inverted box).
Thanks for this information @uschindler and for working on a fix. I'll add to the package documentation how the bounding box search is done (w/ intersection)
while you're here, curious if the Data Warehouse downloads are available programatically, or only in the web interface?
The fix does not seem easy. It affects all datasets which cross date line.
Your second question: the data warehouse is only available to users logged in, so you need a login token. But this will be available soon: users can create an api token (like on GitHub) that can be used to download datasets on behalf of some user. This allows to create and share scripts like a pangaear or pangaeapy script without including username and password.
The API for the data warehouse is included in our Soap Api, it's not available via REST yet.
Okay, will look out for the warehouse token update
Just some update: Hi we can't fix the dateline issue at the moment easily, as this is a bug in the underlying Elasticsearch engine, which is not yet fixed: https://github.com/elastic/elasticsearch/issues/22564
We may change to polygons, but that slows down.
Thanks for the update.
Hi, The issue was fixed on PANGAEA's API. Searching for datasets with bounding boxes crossing the date line ist now fully supported. Precision for search is 5km or 2.5% of size of shape (if large).
great, thanks very much!
working now, closin
Great! Thanks a lot!
Kuber
On Wed, Jan 22, 2020 at 5:48 PM Scott Chamberlain notifications@github.com wrote:
working now, closin
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/pangaear/issues/71?email_source=notifications&email_token=ALK3UISHT4M5V2UVYDBUWNLQ7B2HPA5CNFSM4IVJ4AP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJUJF4A#issuecomment-577278704, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALK3UIV3GAAEXI2DOYIQSMTQ7B2HPANCNFSM4IVJ4APQ .
someone raised issue that some datasets are hard to get in results e.g,. https://doi.pangaea.de/10.1594/PANGAEA.898389
I think it's because they cross 180/-180, but not sure. e.g.,
with that bbox, it should find the dataset above, but does not.
If you just remove the bbox search it does find the dataset