nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
428 stars 85 forks source link

Seasonal/recurrent searches #488

Open betolink opened 9 months ago

betolink commented 9 months ago

A common search pattern is a seasonal search, e.g. Landsat scenes from July for the last 10 years. This is supported by CMR(although is not well documented) and will allow us to search without having to use for loops.

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", (182, 212)), # this is not being passed to CMR but this is the current notation 
)

Will return all HLS scenes from July for the last 10 years with max cloud coverage of 20%. I think this is very useful, however the 182, 212 range is not straight forward to calculate, maybe we need to parse a date and use it with caution as leap years will have a different offset.

Rapsodia86 commented 9 months ago

Because of the leap years, would it be possible to use mm-dd?

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", ("06-30","07-30")), # this is not being passed to CMR but this is the current notation 
)

And that brings me to another thing! I do not know how consistent you want to be with the results from the portal but it is important to set also hh:mm:ss in the temporal search. The first instance gives the same results as in the portal. The question of the date to : whether it is included or if it is up to that date but without it in earthaccess may be confusing for the user. Perhaps that would be an additional setting. Or setting the exact time is the quickest way? Anyway, that is something to specify in the documentation I assume.

>>> granules = earthaccess.search_data(
...  short_name="ECO_L2T_LSTE",
...  temporal = ("2023-01-01", "2024-01-01-23:59:59"),
...  point =(-83.08301,42.34026),
...  count=-1,
...  version="002"
... )
Granules found: 409
>>>
>>> granules = earthaccess.search_data(
...  short_name="ECO_L2T_LSTE",
...  temporal = ("2023-01-01", "2024-01-01"),
...  point =(-83.08301,42.34026),
...  count=-1,
...  version="002"
... )
Granules found: 406
betolink commented 9 months ago

I think we should try to follow the conventions from the portal, actually I think this behavior (bug) was already reported by @amfriesz in #190

Rapsodia86 commented 9 months ago

Ok, that is exactly the thing! Sorry, should have checked all the issues before, but it just came to my mind when I was writing the comment since I had been exploring the search parameters yesterday:)

betolink commented 9 months ago

No worries! I think we should fix that and implement the recurrent search even if we are off by a day in leap years. In the case of COGs we can have a very streamlined workflow with xarray:

  1. Seasonal search
    results = earthaccess.search_data(
        short_name=["HLSL30"],
        point=(-82.19,27.91),
        cloud_cover=(0,20),
        temporal=("2014", "2024", ("06-30","07-30")),
    )
  2. Open and load granules (filtering by band see #428)
    fo = earthaccess.open(results, bands=["B01", "B02"])
    ds = xr.open_mfdataset(fo, engine="rioxarray")
  3. Efficient operations (subset, sampling) with no services in between!
    summer_mean = ds.clip(polygon).mean("time")
chuckwondo commented 2 months ago

A common search pattern is a seasonal search, e.g. Landsat scenes from July for the last 10 years. This is supported by CMR(although is not well documented) and will allow us to search without having to use for loops.

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", (182, 212)), # this is not being passed to CMR but this is the current notation 
)

Will return all HLS scenes from July for the last 10 years with max cloud coverage of 20%. I think this is very useful, however the 182, 212 range is not straight forward to calculate, maybe we need to parse a date and use it with caution as leap years will have a different offset.

For reference, this CMR temporal range feature is documented (you have to look very closely) under Temporal Range searches.

Specifically, it is shown in this easily-missed example at the end of the list of examples:

2000-01-01T00:00:00.000Z,2023-01-31T23:59:59.999Z,1,31 - matches data between the Julian days 1 to 31 from 2000-01-01T00:00:00.000Z to 2023-01-31T23:59:59.999Z.

It can also be seen in this example under the section "Find collections with temporal" (unlinked sub-heading):

curl "https://cmr.earthdata.nasa.gov/search/collections?temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z,30,60&temporal\[\]=2000-01-01T10:00:00Z,,30&temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z"

The first two values of the parameter together define the temporal bounds. See under Temporal Range searches for different ways of specifying the temporal bounds including ISO 8601.

For temporal range search, the default is inclusive on the range boundaries. This can be changed by specifying exclude_boundary option with options[temporal][exclude_boundary]=true. This option has no impact on periodic temporal searches.

and again under the section "Finding granules with temporal" (again, unlinked):

curl "https://cmr.earthdata.nasa.gov/search/granules?collection_concept_id=C1234567-PODAAC&temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z,30,60&temporal\[\]=2000-01-01T10:00:00Z,,30&temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z"

The first two values of the parameter together define the temporal bounds. See under Temporal Range searches for different ways of specifying the temporal bounds including ISO 8601.

For temporal range search, the default is inclusive on the range boundaries. This can be changed by specifying exclude_boundary option with options[temporal][exclude_boundary]=true. This option has no impact on periodic temporal searches.

Unfortunately, when spanning multiple years with one or more leap years in the range, there appears to be no way to deal with the day offset for days of the year on or after the leap days because the CMR simply expects Julian (ordinal) days as the 3rd and 4th values in the range.

In other words, using @betolink's example, day 182 is July 1 in non-leap years, but June 30 in leap years. Since you cannot tell the CMR to use 182 or 183 (depending on leap years). There seems to be no convenient way to deal with this. If necessary, you would likely need to adjust your day range and do a bit of filtering of the query results if you need very specific dates.

Regardless, if we really want to be complete with what the CMR supports, in addition to being able to specify both Julian dates, we must also be able to specify only one of the Julian days (start or end), which is what the examples above seem to show, and which the following working examples show:

Notice that the only difference between these 2 examples is that the first one starts with Julian day 10, and the second one ends with Julian day 10.

Regardless, I agree that being able to specify the Julian days as MM-DD values would be helpful, but we should support both formats because there may very well be cases where a user is given the Julian days to use, not the MM-DD values, which would require the reverse conversion if only MM-DD format were supported.

Finally, I suggest that if we stick with the tuple format (a more specific structure might be more helpful, but that's for another discussion), that we do not specify the Julian days as a nested tuple, but rather at the top of the tuple, e.g.: ("2014", "2024", 182, 212) or ("2014", "2024", "07-01", "07-30"), or similar.