stac-utils / pystac-client

Python client for searching STAC APIs
https://pystac-client.readthedocs.io
Other
161 stars 48 forks source link

Enhancement: Return a single large image instead of many individual items #748

Closed ben-epoch-blue closed 5 days ago

ben-epoch-blue commented 6 days ago

I am trying to query a large area which returns multiple items instead of a single item, so I have to post-process it by stitching together each component image to form the whole image

Search Code (0-2 seconds) catalog.search(collections=["esa-worldcover"], bbox=bbox, limit=1, max_items=30).item_collection()

Merge Code (2.5 minute run-time for just 4 images at 30 meter resolution; EPSG4326) xx = stac.load( landcover, resolution=30/111320, patch_url=pc.sign )

gadomski commented 6 days ago

🤔 are you proposing to merge the items into a single item, and keeping all of the assets? Or are you proposing merging those assets as well? If it's the second, that's beyond the scope of this repo — we don't do any raster processing here.

ben-epoch-blue commented 5 days ago

My use case is merging together various rasters to ultimately perform a cumulative cost function on so it is necessary for me to combine the fragments returned by the pystac-client

My proposal is to have a parameter that tells the system to combine multiple images when applicable, into a reference to a single data array, instead of having several smaller ones

ben-epoch-blue commented 5 days ago

Here is the code I'm currently using for anyone curious. Pretty messy but rioxarray supports windowed reads

from rioxarray.merge import merge_arrays

catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace)
landcover_sources = catalog.search(collections=["esa-worldcover"], bbox=bbox, datetime='2021').item_collection()
landcover_xr = [rioxarray.open_rasterio(item.assets["map"].href)[0].rio.clip_box(*bbox)[:-1, :-1].rio.reproject('EPSG:4326', resolution=(30/111320, 30/111320)) for item in landcover_sources]
landcover_rasters = merge_arrays(landcover_xr)

plt.imshow(landcover_rasters, cmap=plt.cm.gray.reversed(), vmin=0, vmax=100)
gadomski commented 5 days ago

Thanks for the examples! Since you're working with the raster data (rather than just the STAC metadata) a library like stackstac or odc-stac is what you might be looking for!