intake / intake-stac

Intake interface to STAC data catalogs
https://intake-stac.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
108 stars 25 forks source link

Add a `StacCollection.search` method #95

Open TomAugspurger opened 3 years ago

TomAugspurger commented 3 years ago

Purely as a convenience, it'd be nice to have a StacCollection.search method that uses pystac-client to search an endpoint with a specific collection.

cat = intake.open_stac_catalog("/path/to/catalog")
collection = cat["my-collection"]
collection.search(bbox=bbox)

The .search method would use pystac-client

  1. Find the link with a "rel": "search". Set that as the endpoint
  2. Specify collections=[self.id], to limit the search to just that collection.

I see now that intake's base Catalog class apparently defines a search method, which appears to do some kind of text-based search on the items. I suspect that most STAC users would expect search to behave like STAC search.

jhamman commented 3 years ago

This sounds great @TomAugspurger! I personally don't see any good reason to avoid overriding the base class search method but we should ask @martindurant for his thoughts.

martindurant commented 3 years ago

Please do make specialised versions of search(), the one in Catalog is super-simplistic and only meant to be a fallback when nothing better is available.

scottyhq commented 3 years ago

The .search method would use pystac-client

@TomAugspurger I really like this idea, but it would need some docs / error handling for the cases that "rel":"search" doesn't exist.

I'm thinking of the case of a static catalog/collection, where there is no API endpoint. For that case we could:

  1. Stick with default intake.search() that I think just does some string pattern matching.

  2. Or implement a simple 'local api' search with the same keywords that to filter by bbox and datetime (e.g. geopandas operations on the static catalog represented as a GeoDataFrame https://github.com/intake/intake-stac/issues/36). @matthewhanson probably has some ideas on performance here, and things would likely be quite slow if someone tries this on a really big catalog.

matthewhanson commented 3 years ago

I do rather like the idea of being able to do a search on a static catalog, but that seems like it should be implemented in pystac-client as well (which is named as such as it's a client for both static catalogs and APIs).

Currently pystac-client will raise an APIError if there is no rel=search link

jsignell commented 12 months ago

I have done a bit of work putting together an implementation of pystac-client style search for static catalogs. This work lives in https://github.com/jsignell/stac-static it could be possible to delegate search to that library. The main limitation is that it depends on having a geodataframe version of the stac metadata though.

martindurant commented 11 months ago

The main limitation is that it depends on having a geodataframe version of the stac metadata

I am not in a good position to reckon how much of a limitation that is.