stac-utils / pystac-client

Python client for searching STAC APIs
https://pystac-client.readthedocs.io
Other
161 stars 48 forks source link

Enhancement: Conditionally returning each collection from catalog.search as a nested list if multiple are found #750

Closed ben-epoch-blue closed 5 days ago

ben-epoch-blue commented 5 days ago

I am trying to search for multiple collections at once, and assign them into variables

a, b, c = catalog.search(collections=["datasetA", "datasetB", "datasetC"], bbox=bbox).item_collection()

However, when bbox intersects multiple images, then multiple values are returned. The output I would like is this:

a, b, c = [[a1, a2, a3], [b1], [c1, c2]]

But the current output is a flattened list which cannot be predictably unpacked:

a, b, c = a1, a2, a3, b1, c1, c2 --> Error

gadomski commented 5 days ago

This is pretty easy to do with the current API:

item_collections = []
for collection in ("datasetA", "datasetB", "datasetC"):
    item_collections.append(catalog.search(collections=collection, bbox=bbox).item_collection())

Client.search is meant to represent as single interaction with the /search endpoint of a STAC API, and the output of pystac-client (a single flattened list) directly reflects the outputs of a spec-compliant STAC API server: https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search#response. Specifically:

The response to a request (GET or POST) to the search endpoint must always be an ItemCollection object - a valid GeoJSON FeatureCollection that consists entirely of STAC Item objects.

Note that it does not allow for a list of items, which is what you're describing.

ben-epoch-blue commented 5 days ago

This is the output of a search on multiple datasets:

source = catalog.search(collections=["nasadem", 'esa-worldcover', "jrc-gsw"], bbox=Point([..., ...]).buffer(900*30/113200, cap_style=3).bounds).item_collection()
type "FeatureCollection"
features[] 7 items
0
type "Feature"
stac_version "1.0.0"
id "ESA_WorldCover_10m_2021_v200_N03E099"
properties
geometry
links[] 5 items
assets
bbox[] 4 items
stac_extensions[] 4 items
collection "esa-worldcover"
1
type "Feature"
stac_version "1.0.0"
id "ESA_WorldCover_10m_2021_v200_N00E099"
properties
geometry
links[] 5 items
assets
bbox[] 4 items
stac_extensions[] 4 items
collection "esa-worldcover"
2
type "Feature"
stac_version "1.0.0"
id "ESA_WorldCover_10m_2020_v100_N03E099"
properties
geometry
links[] 5 items
assets
bbox[] 4 items
stac_extensions[] 4 items
collection "esa-worldcover"
...

There are 4 images for ESA WorldCover, 2 images for NASADEM, and 1 image for GSW-JSW. All 3 datasets are output into a single FeatureCollection rather than 3 distinct FeatureCollection objects - can you confirm if this is intended?

If so, is there a way to differentiate between the different datasets returned from the search?

TomAugspurger commented 5 days ago

Yes, that's intended.

You can use itertools.groupby to group by collection

by_collection = {k: list(v) for k, v in itertools.groupby(sorted(search.item_collection(), key=lambda x: x.collection_id), key=lambda x: x.collection_id)}