Closed scottyhq closed 2 weeks ago
I wonder if this is because the items have start and end datetimes?
I think the API is picking up any items whose time range covers your datetime
(https://github.com/stac-utils/pgstac/issues/5)? While https://e4ftl01.cr.usgs.gov/MOTA/MCD43A4.061/2001.01.01/ is giving just the items whose end_datetime
equals 2021.01.01 (or maybe it's the start_datetime
)?
In [18]: search = catalog.search(
...: collections=["modis-43A4-061"],
...: datetime='2001-01-01',
...: query={"end_datetime": {"eq": "2001-01-16T23:59:59.999999Z"}}
...: )
...: items2 = search.item_collection()
In [19]: len(items2)
Out[19]: 311
I'm not sure why it's 311 instead of 299, but that's at least much closer.
Thanks for looking @TomAugspurger! In my haste I didn't fully consult the user guide https://www.umb.edu/spectralmass/v006/mcd43a4-nbar-product/ which does clearly state:
Unlike the earlier reprocessed versions (where the date of the product signifies the first day of the retrieval period), and the Direct Broadcast version (where the date signifies the last day of the retrieval period), the date associated with each daily V006 and V006.1 retrieval is the center of the moving 16 day input window.
But I still find the API behavior counterintuitive. If both datetime
as well as start_datetime
and end_datetime
exist in the metadata I'd expect a query on datetime
to only consider that field? A workaround to hone in on a nominal date is to fully specify a +/- 8 day window and query on both start and end:
date = pd.to_datetime('2001-01-01')
start = (date - pd.Timedelta(days=8)).isoformat(timespec='microseconds')+'Z'
end = (date + pd.Timedelta(days=8) - pd.Timedelta(seconds=1)).isoformat()+'.999999Z'
print(start, end)
# 2000-12-24T00:00:00.000000Z 2001-01-08T23:59:59.999999Z
search = catalog.search(
collections=["modis-43A4-061"],
query={"start_datetime": {"eq": start},
"end_datetime": {"eq": end},
},
)
items = search.item_collection()
print(len(items))
# 299
gf = gpd.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")
print(gf.datetime.unique())
# ['2001-01-01T00:00:00Z']
It does make it a bit awkward to specify the exact search you want :/ I believe this behavior comes from the STAC API spec though, so not much we can do about it.
Closed
due to inactivity, feel free to reopen if you would like to continue this discussion.
I expect this search to only return acquisitions from
2001-01-01
(299 according to https://e4ftl01.cr.usgs.gov/MOTA/MCD43A4.061/2001.01.01/)It's strange that a large range of dates are returned. I'm guessing there might be both duplicate items from different collection updates for a single date ('2001-01-01T23:59:59.999500Z' vs '2001-01-01T00:00:00Z') , but also don't know why this search appears to be about +/- 1 week from the specified date...