Closed alexgleith closed 2 weeks ago
Note that one possible workaround is using pathrows for querying, but then I need a lookuptable for geometry -> pathrow.
Pathrow query looks like this:
PR_ID_RIGHT = "073072"
PR_ID_LEFT = "074072"
query_left = {
"landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
"landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}
query_right = {
"landsat:wrs_row": {"eq": PR_ID_RIGHT[3:]},
"landsat:wrs_path": {"eq": PR_ID_RIGHT[:3]}
}
items_left_pr = list(
client.search(
collections=["landsat-c2-l2"],
query=query_left,
datetime=datetime
).items()
)
items_right_pr = list(
client.search(
collections=["landsat-c2-l2"],
query=query_right,
datetime=datetime
).items()
)
print(len(items_left_pr), len(items_right_pr))
Results in 6, 6
so 12 scenes returned, which is what we want!
Another note that the USGS splits their geometry at the antimeridian, so that the queries work as expected.
Evidence in this very wide image!
Curiously, they return 16
scenes, which is more than the 12
I could get on the MSPC.
usgs_client = Client.open("https://landsatlook.usgs.gov/stac-server/")
usgs_items = list(usgs_client.search(collections=["landsat-c2l2-sr"], datetime=datetime, bbox=bbox).items())
print(len(usgs_items))
16
The triangle of missing data in the below image is caused by the inability retrieve these items from the STAC API.
I'll have to check on which verison of stactools-packages/landsat we're on. It's possible we're on a version from before https://github.com/stactools-packages/landsat/pull/30 and friends.
Unfortunately just searching for the desired pathrows via metadata (landsat:wrs_path
/ landsat:wrs_row
) without any geometry in the query usually times out.
Absent fixes to the underlying stac items, one workaround for this (at least for landsat) is to:
This should also accommodate the bad geometry for some stac items, which is in another issue I can't find right now. It also might be slightly liberal in some areas, based on the actual bounding boxes, so it might make sense to include the path/row filtering if you want strict results.
Unfortunately just searching for the desired pathrows via metadata (landsat:wrs_path / landsat:wrs_row) without any geometry in the query usually times out.
You're right. Querying for a year times out after 30 seconds... so this isn't a viable option either!
from pystac_client import Client
catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
client = Client.open(catalog)
PR_ID_LEFT = "074072"
query_left = {
"landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
"landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}
items_left_pr = list(
client.search(
collections=["landsat-c2-l2"],
query=query_left,
datetime="2023"
).items()
)
print(f"Found {len(items_left_pr)} items for path/row {PR_ID_LEFT}")
APIError: The request exceeded the maximum allowed time, please try again. If the issue persists, please contact planetarycomputer@microsoft.com.
Querying for a year times out after 30 seconds
Your best bet for now might be to break the query down into smaller time intervals:
import pandas as pd
from pystac_client import Client
catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
client = Client.open(catalog)
PR_ID_LEFT = "074072"
query_left = {
"landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
"landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}
starts = pd.date_range("2023", periods=12, freq="MS", tz="UTC")
items = []
for start, stop in zip(starts, starts[1:]):
print(start)
items_left_pr = list(
client.search(
collections=["landsat-c2-l2"],
query=query_left,
bbox=[-180, -90, 180, 90],
datetime=[start, stop]
).items()
)
items.extend(items_left_pr)
That finished for me in ~55 seconds.
@mmcfarland can you remind me: if the property_index_type
field in pgstac.queryables
is null, does that mean there isn't an index on that property?
"id" "name" "collection_ids" "definition" "property_path" "property_wrapper" "property_index_type"
97 "landsat:wrs_path" "{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}" "{""type"": ""string"", ""title"": ""WRS Path""}"
98 "landsat:wrs_row" "{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}" "{""type"": ""string"", ""title"": ""WRS Row""}"
99 "landsat:wrs_type" "{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}" "{""type"": ""string""}"
Lack of an index on those fields means that querying on them isn't going to be too effective, the database will still have to scan everything before filtering.
Your best bet for now might be to break the query down into smaller time intervals:
Understood. We're trying to build a framework to do this kind of thing across the Pacific to develop a lot of products, so these kinds of workarounds are really not ideal. And if we're falling back to pathrow queries, then further breaking them down into small datetime ranges is an extra compromise. And it doesn't feel right, since I can currently query for 40 years of Landsat and get thousands of items returned without issue using a geometry query.
If they can have an index put on them, that would be great.
Matt confirmed that those fields are currently unindexed. We'll look into adding them.
So, there are STAC documents that have values in their geometry that I can't actually query against. Clearly this is a problem!
Getting back to this, which I missed the first time: I think the item's geometry is incorrect:
"bbox": [
-180.67074549,
-18.39115293,
-178.55398222,
-16.30174058
]
We'll need to check on the expected value there, but it probably shouldn't be less that -180? I'm not sure. https://github.com/stactools-packages/landsat/issues/66 looks perhaps similar, but not identitcal.
We'll need to check on the expected value there
Did this end up getting fixed @TomAugspurger? We have a workaround, so I think we're ok, but it would be nice if it was resolved.
Agreed this would be good to resolve, but we haven't gotten to it yet.
From: Alex Leith @.> Sent: Thursday, April 4, 2024 8:01 PM To: microsoft/PlanetaryComputer @.> Cc: Mention @.>; Comment @.>; Subscribed @.***> Subject: Re: [microsoft/PlanetaryComputer] Antimeridian-crossing Items and Searching For Them (Issue #296)
We'll need to check on the expected value there
Did this end up getting fixed @TomAugspurgerhttps://github.com/TomAugspurger? We have a workaround, so I think we're ok, but it would be nice if it was resolved.
— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/PlanetaryComputer/issues/296#issuecomment-2038561071 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKAOIXUGAMRRKZ5DIFGN43Y3XZVRBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTCNZUGAYTKMZUQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGAYDQOJVGUYTANVHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you were mentioned.
Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
We got the index added in landsat:wrs_path
and landsat:wrs_row
added (finally). The example which timed out earlier is quick now:
In [25]: from pystac_client import Client
...:
...: catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
...: client = Client.open(catalog)
...:
...: PR_ID_LEFT = "074072"
...:
...: query_left = {
...: "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
...: "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
...: }
...:
...: items_left_pr = list(
...: client.search(
...: collections=["landsat-c2-l2"],
...: query=query_left,
...: datetime="2023"
...: ).items()
...: )
...:
...: %time print(f"Found {len(items_left_pr)} items for path/row {PR_ID_LEFT}")
Found 45 items for path/row 074072
CPU times: user 72 us, sys: 3 us, total: 75 us
Wall time: 79.4 us
Thanks, @TomAugspurger, this is really nice!
I think this should stay open though, as the original issue is with some of the geometries/bounding boxes.
(Great to meet you the other day!)
Closed
due to inactivity, feel free to reopen if you would like to continue this discussion.
I have a complicated antimeridian issue. I'll try to outline it below.
Say I have a query like this:
Note that this bounding box is close to the
180
antimeridian, but doesn't cross it.This query returns
6
items, but, it should return12
, as there are items that cover the right hand side of the box, but their geometry doesn't overlap, because it looks more like this:Note that that item is this one.
Also note that that item has a bounding box of:
[-180.67074549, -18.39115293, -178.55398222, -16.30174058]
.Now, I thought I might be clever and put a little hack in when I'm searching close to the 180 degree antimeridian. But! If I do a query with values of less than
-180
then I get an exception back from the STAC API:So, there are STAC documents that have values in their geometry that I can't actually query against. Clearly this is a problem!
Any suggestions on how to workaround this?