Closed jsignell closed 1 year ago
IMO the community is still unsure of which (if either) is The One™. If you do integrate/switch, I would love to hear your thoughts on the comparison between the two w.r.t. ease of integration, ease of use, etc.
Yeah I read through https://github.com/opendatacube/odc-stac/issues/54 and came out the other end thinking that odc-stac probably has more of a future. I'll see if I can com e up with any ideas around how to make odc-stac more ergonomic.
Maybe have both? Currently, stackstac
produces an xarray.DataArray
whereas odc-stac
produces an xarray.Dataset
. An xr.DataArray
is suited for 2D data + bands, whereas an xr.Dataset
is suited for multi-dimensional datasets (e.g. climate model outputs), so slightly different use cases.
With xpystac=0.0.1
, you have xr.open_dataset(item_collection, ...)
using stackstac
in the backend. But realistically, you could swap stacstac
for odc-stac
to remove the .to_dataset
call here:
In addition, you could register xr.open_dataarray()
to use stackstac
instead. Of course, this might need some documentation to be clear that STAC ItemCollections passed to xr.open_dataarray()
are stacked using stackstac.stack
while those passed to xr.open_dataset()
are stacked with odc.stac.load
.
In addition, you could register
xr.open_dataarray()
to usestackstac
instead. Of course, this might need some documentation to be clear that STAC ItemCollections passed toxr.open_dataarray()
are stacked usingstackstac.stack
while those passed toxr.open_dataset()
are stacked withodc.stac.load
.
Oh that is an interesting idea. I wonder if that would feel surprising to the user.
I just stumbled upon this discussion and wanted to add to @weiji14's comment, that a major difference is also the parsing of STAC metadata to Xarray, which in my opinion is an important difference to consider. Quoting from https://github.com/opendatacube/odc-stac/issues/54#issuecomment-1103313511 :
Access to the original STAC metadata
odc-stac
doesn't really expose any of that, and there is a fundamental design choice that makes it impossible to do in a general case, but we can certainly add it for special case data loading in the future.stackstac
exposes all the metadata fields in the returned xarray, combined with delayed computation enabled by Dask this can be very handy as you can leverage all the xarray conveniences to filter out unwanted data.
Here is an example of how it can look like in practice with a dataset created from https://github.com/SAR-ARD/S1_NRB :
Users can then easily filter the array based on the parsed STAC Item properties:
ds_filtered = ds.where((ds['sat:relative_orbit'] == 44), drop=True)
I am working a lot with local, static STAC Catalogs without using an API or database to do the querying beforehand. @weiji14's suggestion is interesting and could be a bridge between both libraries. I don't think there is a shift to one or the other and I also don't think there will be The One™ anytime soon. I think it's best to not press forward too fast with #26.
Thank you for commenting! I had reached a similar decision last week and updated #26 to make the stacking library configurable as suggested by @weiji14. I just renamed the PR to indicate that change in functionality.
It seems like there is a shift towards using odc-stac rather than stackstac. I'm wondering if that needs to be configurable somehow or if this library should just pick one.