microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
185 stars 9 forks source link

Antimeridian-crossing Items and Searching For Them #296

Closed alexgleith closed 2 weeks ago

alexgleith commented 1 year ago

I have a complicated antimeridian issue. I'll try to outline it below.

Say I have a query like this:

datetime = "2022-09/2022-10"
collections = ["landsat-c2-l2"]
bbox = [179.1, -17.7, 179.95, -17.0]

Note that this bounding box is close to the 180 antimeridian, but doesn't cross it.

This query returns 6 items, but, it should return 12, as there are items that cover the right hand side of the box, but their geometry doesn't overlap, because it looks more like this:

{
  'type': 'Polygon',
   'coordinates': [[
     [-178.5539822, -16.6580789],
     [-178.9381784, -18.3911529],
     [-180, -18.1699399],
     [-180.6707455, -18.0302011],
     [-180.270388, -16.3017406],
     [-180, -16.3578751],
     [-178.5539822, -16.6580789]]
  ]
}

Note that that item is this one.

Also note that that item has a bounding box of: [-180.67074549, -18.39115293, -178.55398222, -16.30174058].

Now, I thought I might be clever and put a little hack in when I'm searching close to the 180 degree antimeridian. But! If I do a query with values of less than -180 then I get an exception back from the STAC API:

APIError: 1 validation error for Request
body -> bbox
  Bounding box must be within (-180, -90, 180, 90) (type=value_error)

So, there are STAC documents that have values in their geometry that I can't actually query against. Clearly this is a problem!

Any suggestions on how to workaround this?

alexgleith commented 1 year ago

Note that one possible workaround is using pathrows for querying, but then I need a lookuptable for geometry -> pathrow.

Pathrow query looks like this:


PR_ID_RIGHT = "073072"
PR_ID_LEFT = "074072"

query_left = {
    "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}
query_right = {
    "landsat:wrs_row": {"eq": PR_ID_RIGHT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_RIGHT[:3]}
}

items_left_pr = list(
    client.search(
        collections=["landsat-c2-l2"],
        query=query_left,
        datetime=datetime
    ).items()
)
items_right_pr = list(
    client.search(
        collections=["landsat-c2-l2"],
        query=query_right,
        datetime=datetime
    ).items()
)
print(len(items_left_pr), len(items_right_pr))

Results in 6, 6 so 12 scenes returned, which is what we want!

alexgleith commented 1 year ago

Another note that the USGS splits their geometry at the antimeridian, so that the queries work as expected.

Evidence in this very wide image!

image

Curiously, they return 16 scenes, which is more than the 12 I could get on the MSPC.

usgs_client = Client.open("https://landsatlook.usgs.gov/stac-server/")
usgs_items = list(usgs_client.search(collections=["landsat-c2l2-sr"], datetime=datetime, bbox=bbox).items())

print(len(usgs_items))

16

alexgleith commented 1 year ago

The triangle of missing data in the below image is caused by the inability retrieve these items from the STAC API.

image

TomAugspurger commented 1 year ago

I'll have to check on which verison of stactools-packages/landsat we're on. It's possible we're on a version from before https://github.com/stactools-packages/landsat/pull/30 and friends.

jessjaco commented 11 months ago

Unfortunately just searching for the desired pathrows via metadata (landsat:wrs_path / landsat:wrs_row) without any geometry in the query usually times out.

Absent fixes to the underlying stac items, one workaround for this (at least for landsat) is to:

  1. Determine the landsat pathrows which intersect the (antimeridian crossing) region of interest. This is an exercise in and of itself!
  2. Split the bounding box of those pathrows on the antimeridian.
  3. Search on the split bboxes separately, and combine the results.

This should also accommodate the bad geometry for some stac items, which is in another issue I can't find right now. It also might be slightly liberal in some areas, based on the actual bounding boxes, so it might make sense to include the path/row filtering if you want strict results.

alexgleith commented 10 months ago

Unfortunately just searching for the desired pathrows via metadata (landsat:wrs_path / landsat:wrs_row) without any geometry in the query usually times out.

You're right. Querying for a year times out after 30 seconds... so this isn't a viable option either!

from pystac_client import Client

catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
client = Client.open(catalog)

PR_ID_LEFT = "074072"

query_left = {
    "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}

items_left_pr = list(
    client.search(
        collections=["landsat-c2-l2"],
        query=query_left,
        datetime="2023"
    ).items()
)

print(f"Found {len(items_left_pr)} items for path/row {PR_ID_LEFT}")

APIError: The request exceeded the maximum allowed time, please try again. If the issue persists, please contact planetarycomputer@microsoft.com.

TomAugspurger commented 10 months ago

Querying for a year times out after 30 seconds

Your best bet for now might be to break the query down into smaller time intervals:

import pandas as pd
from pystac_client import Client

catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
client = Client.open(catalog)

PR_ID_LEFT = "074072"

query_left = {
    "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}

starts = pd.date_range("2023", periods=12, freq="MS", tz="UTC")

items = []
for start, stop in zip(starts, starts[1:]):
    print(start)
    items_left_pr = list(
        client.search(
            collections=["landsat-c2-l2"],
            query=query_left,
            bbox=[-180, -90, 180, 90],
            datetime=[start, stop]
        ).items()
    )
    items.extend(items_left_pr)

That finished for me in ~55 seconds.


@mmcfarland can you remind me: if the property_index_type field in pgstac.queryables is null, does that mean there isn't an index on that property?

"id"    "name"  "collection_ids"    "definition"    "property_path" "property_wrapper"  "property_index_type"
97  "landsat:wrs_path"  "{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}" "{""type"": ""string"", ""title"": ""WRS Path""}"           
98  "landsat:wrs_row"   "{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}" "{""type"": ""string"", ""title"": ""WRS Row""}"            
99  "landsat:wrs_type"  "{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}" "{""type"": ""string""}"            

Lack of an index on those fields means that querying on them isn't going to be too effective, the database will still have to scan everything before filtering.

alexgleith commented 10 months ago

Your best bet for now might be to break the query down into smaller time intervals:

Understood. We're trying to build a framework to do this kind of thing across the Pacific to develop a lot of products, so these kinds of workarounds are really not ideal. And if we're falling back to pathrow queries, then further breaking them down into small datetime ranges is an extra compromise. And it doesn't feel right, since I can currently query for 40 years of Landsat and get thousands of items returned without issue using a geometry query.

If they can have an index put on them, that would be great.

TomAugspurger commented 10 months ago

Matt confirmed that those fields are currently unindexed. We'll look into adding them.

So, there are STAC documents that have values in their geometry that I can't actually query against. Clearly this is a problem!

Getting back to this, which I missed the first time: I think the item's geometry is incorrect:

    "bbox": [
        -180.67074549,
        -18.39115293,
        -178.55398222,
        -16.30174058
  ]

We'll need to check on the expected value there, but it probably shouldn't be less that -180? I'm not sure. https://github.com/stactools-packages/landsat/issues/66 looks perhaps similar, but not identitcal.

alexgleith commented 7 months ago

We'll need to check on the expected value there

Did this end up getting fixed @TomAugspurger? We have a workaround, so I think we're ok, but it would be nice if it was resolved.

TomAugspurger commented 7 months ago

Agreed this would be good to resolve, but we haven't gotten to it yet.


From: Alex Leith @.> Sent: Thursday, April 4, 2024 8:01 PM To: microsoft/PlanetaryComputer @.> Cc: Mention @.>; Comment @.>; Subscribed @.***> Subject: Re: [microsoft/PlanetaryComputer] Antimeridian-crossing Items and Searching For Them (Issue #296)

We'll need to check on the expected value there

Did this end up getting fixed @TomAugspurgerhttps://github.com/TomAugspurger? We have a workaround, so I think we're ok, but it would be nice if it was resolved.

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/PlanetaryComputer/issues/296#issuecomment-2038561071 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKAOIXUGAMRRKZ5DIFGN43Y3XZVRBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTCNZUGAYTKMZUQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGAYDQOJVGUYTANVHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you were mentioned.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

TomAugspurger commented 6 months ago

We got the index added in landsat:wrs_path and landsat:wrs_row added (finally). The example which timed out earlier is quick now:

In [25]: from pystac_client import Client
    ...:
    ...: catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
    ...: client = Client.open(catalog)
    ...:
    ...: PR_ID_LEFT = "074072"
    ...:
    ...: query_left = {
    ...:     "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    ...:     "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
    ...: }
    ...:
    ...: items_left_pr = list(
    ...:     client.search(
    ...:         collections=["landsat-c2-l2"],
    ...:         query=query_left,
    ...:         datetime="2023"
    ...:     ).items()
    ...: )
    ...:
    ...: %time print(f"Found {len(items_left_pr)} items for path/row {PR_ID_LEFT}")
Found 45 items for path/row 074072
CPU times: user 72 us, sys: 3 us, total: 75 us
Wall time: 79.4 us
alexgleith commented 6 months ago

Thanks, @TomAugspurger, this is really nice!

I think this should stay open though, as the original issue is with some of the geometries/bounding boxes.

(Great to meet you the other day!)

ghidalgo3 commented 2 weeks ago

Closed due to inactivity, feel free to reopen if you would like to continue this discussion.