Closed robbibt closed 4 months ago
For reference, doing a similar search on either RadiantEarth or Microsoft Planetary Computer's STAC APIs sucessfully returns all relevant datasets with no restrictive limit:
import pystac_client, odc.stac
catalogue = "https://planetarycomputer.microsoft.com/api/stac/v1"
catalogue = "https://earth-search.aws.element84.com/v1"
client = pystac_client.Client.open(catalogue)
# Search for items in the collection
collections = ["sentinel-2-l2a"]
query = client.search(
collections=collections,
bbox=[146.04, -34.30, 146.05, -34.28],
datetime="2023-12-01/2024-02-28",
)
# Search the STAC catalog for all items matching the query
[i.properties["datetime"] for i in query.get_items()]
A user on LinkedIn and @alexgleith have encountered a possible bug in our Explorer STAC search API (see link here).
If you do a super simple query of DEA's Sentinel-2 data from December 2023 to Feb 2024, you only get back data up to January 17, despite the data definitely existing:
It seems that by default, the query is only returning the first 20 items from the query. To get any extra data, the user has to manually provide a high limit, e.g.:
This isn't typical behavior for STAC loading: normally when using
pystac.client()
it will automatically follow "next" page links to provide the user with all datasets matching their query - the user definitely isn't limited to a tiny amount like 20.It looks to me that Explorer might be using the
DEFAULT_PAGE_SIZE
of 20 to define the absolute limit of datasets returned. This doesn't appear to follow the correct STAC API approach (see Slack conversation here and STAC API docs here). I can see this line which seems like it might the source of the issue - it seems to useDEFAULT_PAGE_SIZE
if no limit is provided: https://github.com/opendatacube/datacube-explorer/blob/3cdcf98a7394eb85566609a4f9cbf6f22009b722/cubedash/_stac.py#L433As it is, I think the current functionality is confusing to our users - they will naturally expect to get back all items matching their query (at least up to some sensibly high limit, definitely not 20), and only getting back half the time series is pretty unexpected.