stac-utils / pystac-client

Python client for searching STAC APIs
https://pystac-client.readthedocs.io
Other
162 stars 48 forks source link

Federated search #42

Open matthewhanson opened 3 years ago

matthewhanson commented 3 years ago

A big advantage of STAC is being able to use data from multiple sources. It would be a nice feature to be able to search multiple STAC endpoints and combine the results into a single FeatureCollection

gadomski commented 1 year ago

I have questions. First, would this be enough to support your use case, @matthewhanson?

import pystac_client
from pystac_client import Client

client_a = Client.open("http://stac-api-a.test")
client_b = Client.open("http://stac-api-b.test")

search_a = client_a.search(collections=["foo"], datetime="2023-06-07")
search_b = client_b.search(collections=["bar"], datetime="2023-06-07")

items = search_a.item_collection()
items.extend(search_b.item_collection())

If that's enough, then we just need to add an .extend() method to ItemCollection in pystac.

If that's not enough, I'm at a bit of a loss. Each STAC API tends to be so different that it doesn't seem realistic to, e.g., use the same collection IDs across clients. If you want to re-use the same set of parameters, it's pretty trivial to do this:

query = {
   "datetime": "2023-06-07",
   "bbox": [-73.21, 43.99, -73.12, 44.05],
}
items = client_a.search(collections=["foo"], **query).item_collection()
items.extend(client_b.search(collections=["bar"], **query).item_collection())

@matthewhanson, an you sketch out what you had in mind, if it's more than what I've described?

bitner commented 1 year ago

The important thing here would be to ensure that if an order was specified in the search that the results would be interleaved based on that order.

bitner commented 1 year ago

Quick and dirty proof of concept for a federated search that merges records according to their sortby settings.

from pystac_client import Client
import morecantile
import heapq
from functools import reduce, cmp_to_key

dot_get = lambda p, d: reduce(dict.get, p.split('.'), d)

def ogc_sort_func(sorts, a, b, depth=0):
    sort = sorts[depth]
    # print(sort, depth)
    field = sort.get('field')
    direction = sort.get('direction','asc')
    desc = 1 if direction.lower()[0] == 'd' else -1
    # print(field, direction)
    av = dot_get(field,a)
    bv = dot_get(field,b)
    # print(av, bv, av==bv)
    if (av is None and bv is None) or av == bv:
        # print('stepping through', sorts, a, b)
        return ogc_sort_func(sorts, a, b, depth=depth+1)
    elif av is None:
        out = -1
    elif bv is None:
        out = 1
    elif av < bv:
        out = 1
    else:
        out = -1
    return desc * out

tms = morecantile.tms.get("WebMercatorQuad")
x, y, z = tms.tile(-93,45,5)
bbox = list(tms.bounds(morecantile.Tile(x, y, z)))
print(bbox)

sortby = [{"field":"properties.datetime","direction":"desc"},{"field":"id","direction":"desc"}]
datetime=["2020-10-10","2020-10-10T18:00:00Z"]
catalog = Client.open('https://planetarycomputer.microsoft.com/api/stac/v1')
results = catalog.search(
    limit=100,
    max_items=1000,
    bbox=bbox,
    collections=["naip"],
    datetime=datetime,
    sortby=sortby
)
a=results.items_as_dicts()

results = catalog.search(
    limit=100,
    max_items=1000,
    bbox=bbox,
    datetime=datetime,
    collections=["landsat-c2-l2"],
    sortby=sortby
)

b=results.items_as_dicts()

results = catalog.search(
    limit=100,
    max_items=1000,
    bbox=bbox,
    datetime=datetime,
    collections=["sentinel-2-l2a"],
    sortby=sortby
)

c=results.items_as_dicts()

keyfunc = lambda l, r: ogc_sort_func(sortby, l, r)

print('merging')
g=heapq.merge(a,b,c, key=cmp_to_key(keyfunc))

print('cycling')
for i in range(100):
    row=next(g)
    print(dot_get('properties.datetime', row), row.get('id'),row.get('collection') )
bitner commented 1 year ago

For that, I did the sorting just on the items as dicts, but if we were to actually implement this, you could use Items as classes and either create a new subclass or monkeypatch a lt method onto it.