nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
390 stars 78 forks source link

Seasonal/recurrent searches #488

Open betolink opened 6 months ago

betolink commented 6 months ago

A common search pattern is a seasonal search, e.g. Landsat scenes from July for the last 10 years. This is supported by CMR(although is not well documented) and will allow us to search without having to use for loops.

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", (182, 212)), # this is not being passed to CMR but this is the current notation 
)

Will return all HLS scenes from July for the last 10 years with max cloud coverage of 20%. I think this is very useful, however the 182, 212 range is not straight forward to calculate, maybe we need to parse a date and use it with caution as leap years will have a different offset.

Rapsodia86 commented 6 months ago

Because of the leap years, would it be possible to use mm-dd?

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", ("06-30","07-30")), # this is not being passed to CMR but this is the current notation 
)

And that brings me to another thing! I do not know how consistent you want to be with the results from the portal but it is important to set also hh:mm:ss in the temporal search. The first instance gives the same results as in the portal. The question of the date to : whether it is included or if it is up to that date but without it in earthaccess may be confusing for the user. Perhaps that would be an additional setting. Or setting the exact time is the quickest way? Anyway, that is something to specify in the documentation I assume.

>>> granules = earthaccess.search_data(
...  short_name="ECO_L2T_LSTE",
...  temporal = ("2023-01-01", "2024-01-01-23:59:59"),
...  point =(-83.08301,42.34026),
...  count=-1,
...  version="002"
... )
Granules found: 409
>>>
>>> granules = earthaccess.search_data(
...  short_name="ECO_L2T_LSTE",
...  temporal = ("2023-01-01", "2024-01-01"),
...  point =(-83.08301,42.34026),
...  count=-1,
...  version="002"
... )
Granules found: 406
betolink commented 6 months ago

I think we should try to follow the conventions from the portal, actually I think this behavior (bug) was already reported by @amfriesz in #190

Rapsodia86 commented 6 months ago

Ok, that is exactly the thing! Sorry, should have checked all the issues before, but it just came to my mind when I was writing the comment since I had been exploring the search parameters yesterday:)

betolink commented 6 months ago

No worries! I think we should fix that and implement the recurrent search even if we are off by a day in leap years. In the case of COGs we can have a very streamlined workflow with xarray:

  1. Seasonal search
    results = earthaccess.search_data(
        short_name=["HLSL30"],
        point=(-82.19,27.91),
        cloud_cover=(0,20),
        temporal=("2014", "2024", ("06-30","07-30")),
    )
  2. Open and load granules (filtering by band see #428)
    fo = earthaccess.open(results, bands=["B01", "B02"])
    ds = xr.open_mfdataset(fo, engine="rioxarray")
  3. Efficient operations (subset, sampling) with no services in between!
    summer_mean = ds.clip(polygon).mean("time")