stac-utils / pystac-client

Python client for searching STAC APIs
https://pystac-client.readthedocs.io
Other
161 stars 48 forks source link

Support for STAC API - Collection Search #722

Closed hrodmn closed 2 weeks ago

hrodmn commented 2 months ago

It can be difficult for a user to identify which collection they want to query from a STAC before they begin searching for items. I have been thinking a lot about improving the ergonomics of collection discovery lately while working on a tool for federated collection discovery. Most of the code in that project is just a mechanism for crawling through the collections returned by the /collections endpoint and checking to see if they match the provided search criteria.

The STAC API - Collection Search extension is intended to provide an API endpoint for filtering collections based on some criteria. It is not implemented widely yet but it enriches the collection discovery process significantly when paired with a client application like this STAC Browser example.

What needs to happen to add a collection_search method to the pystac.Client?

gadomski commented 2 months ago

While you're correct that there is an extension for collection search, it's a bit out-of-date (e.g. it references v1.0.0-rc.1 of the STAC API spec, itself is v1.0.0-rc.1, and is pilot maturity). I see from https://github.com/stac-api-extensions/collection-search/commit/4ad94f2b73b8a240d32328574367f3b2073fcd05 that there are two implementations, which helps — if they're public, that would provide us APIs to write tests against.

I think the TODOs to include collection search in pystac-client would be:

@m-mohr you've touched collection search stuff more than I have, any additional thoughts?

hrodmn commented 2 months ago

The only public implementation that I am aware of is https://emc.spacebel.be/

There is already a collection_search function in pgstac, and I am working on https://github.com/stac-utils/stac-fastapi-pgstac/pull/136. Once that's stable I intend to deploy it to some public APIs that Development Seed maintains.

m-mohr commented 2 months ago

Collection Search is there to stay, fastapi has an implementation, STAC Browser, too.

Good point that rc.1 of the API is referenced. Please open an issue for it (have to run). Thanks.

gadomski commented 2 months ago

@m-mohr issue: https://github.com/stac-api-extensions/collection-search/issues/16

hrodmn commented 1 month ago

:wave: @gadomski - I would like to get started on this feature sometime soon!

Since most STAC APIs will not have the collection-search extension enabled until it is fully implemented in the common STAC API frameworks and the updates are deployed, what would you think about adding collection filtering capability to pystac-client in the meantime?

I hacked together a system for filtering results from the /collections endpoint in the federated collection discovery repo. It is not pretty but it makes it possible to perform a collection search by iterating through the pages returned by /collections and keeping collections that overlap with the search terms.

gadomski commented 1 month ago

Since most STAC APIs will not have the collection-search extension enabled until it is fully implemented in the common STAC API frameworks and the updates are deployed, what would you think about adding collection filtering capability to pystac-client in the meantime?

I think it makes sense, maybe with a warning so the user knows that they're doing things "the hard way" (i.e. client-side).

m-mohr commented 1 month ago

... also if it's paginated, users should be made aware that the result is probably incomplete...