nasa / cmr-stac

Other
52 stars 21 forks source link

`eo:cloud_cover` query parameter returns empty search #239

Open KennSmithDS opened 2 years ago

KennSmithDS commented 2 years ago

It seems that the eo:cloud_cover is not correctly filtering STAC Items in the CMR STAC API. If the query parameter is included, the pystac_client.Client.search() returns 0 resulting items, but there are valid items in the catalogs with this property:

from pystac_client import Client

cmr_earthdata_api = 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD'
cmr_earthdata_client = Client.open(url=cmr_earthdata_api)

search_results = cmr_earthdata_client.search(
    collections=['HLSL30.v2.0'],
    datetime='2021-02-01/2021-03-01',
    intersects=Point(-73.97, 40.78),
    query=["eo:cloud_cover<20"]
)

print(len(search_results)) # shows 0 for no results returned from API

If we modify the code snippet above slightly to comment out the query=["eo:cloud_cover<20"] then the search returns 2 valid items which can be seen to have the appropriate eo:cloud_cover metadata property:

...
search_results = cmr_earthdata_client.search(
    collections=['HLSL30.v2.0'],
    datetime='2021-02-01/2021-03-01',
    intersects=Point(-73.97, 40.78)
)

cmr_items = search_results.get_all_items()

for item in earthdata_items:
    print(item.id)
    print(item.properties)

Without the eo:cloud_cover query parameter used, the search now results the following: ` HLS.L30.T18TWL.2021039T153324.v2.0 {'datetime': '2021-02-08T15:33:24.028Z', 'start_datetime': '2021-02-08T15:33:24.028Z', 'end_datetime': '2021-02-08T15:33:47.911Z', 'eo:cloud_cover': 6} HLS.L30.T18TWL.2021055T153318.v2.0 {'datetime': '2021-02-24T15:33:18.868Z', 'start_datetime': '2021-02-24T15:33:18.868Z', 'end_datetime': '2021-02-24T15:33:42.759Z', 'eo:cloud_cover': 97} `

rbavery commented 2 years ago

Here's a nb that demonstrates the problem and how this problem is not present when using the eo:cloud_cover query parameter with AWS Earth Search

https://notebooksharing.space/view/7e63f879ff1bad1d8a838e568cdcd67f6a5f17b17a7394ab99dd8f531f89f5fa#displayOptions=

rbavery commented 2 years ago

after discussion with @sharkinsspatial it looks like there's some tricky stuff going on

AWS Earth Search supports the query extension ~a non standard, out of spec way to filter eo:cloud_cover for some reason. query=["eo:cloud_cover<20"] shouldn't work for any stac catalog~

this is the STAC spec way to do filter without the query extension

search= client.search(
    collections=[collection],
    intersects=point,
    datetime='2020-03-20:00:00:00Z/2020-03-30:00:00:00Z',
    max_items=10,
    query={
        "eo:cloud_cover": {
        "lt": 20
    }}
)
len(search.get_all_items())

however this is still broken for CMR, noted in this issue

@jaybarra sorry to ping but is there a timeline for fixing #206 ? We are trying to show the public a modern way to access NASA data via the CMR STAC and would like to show a solution that allows them to filter by cloud cover and other eo: properties https://carpentries-incubator.github.io/geospatial-python/05-access-data/#solution-2

matthewhanson commented 2 years ago

@rbavery Note that the alternate syntax for query isn't a feature of the Earth-Search API, but is actually a feature of pystac-client. See the docs here: https://pystac-client.readthedocs.io/en/latest/usage.html#query-extension

pystac-client converts the shortcut syntax into STAC Query JSON, so would work for any API that supports the Query extension.

earthdata-github-robot commented 1 year ago

(Comment from Alicia Aleman):

Query syntax is incorrect.

william-valencia commented 5 days ago

I've seen other intersects queries in the form of:

intersects=dict(type="Point", coordinates=[-73.97, 40.78]),

However https://bugs.earthdata.nasa.gov/browse/CMR-10200 would need to be fixed before this works.