opendatacube / odc-stac

Load STAC items into xarray Datasets.
Apache License 2.0
136 stars 19 forks source link

Dimension names from cube:dimensions #136

Open clausmichele opened 9 months ago

clausmichele commented 9 months ago

It would be nice that, if at Collection or Item level the datacube extension is present, the provided dimension names would be reflected in the final returned xarray object. Currently, the dimension names are always the default ones:

Sample STAC Collection with datacube extension:

import json
import pystac
import pystac_client

url = "https://stac.eurac.edu/collections/SENTINEL2_L2A_SAMPLE"

stac_api = pystac_client.stac_api_io.StacApiIO()
stac_dict = json.loads(stac_api.read_text(url))
b_dim = None
t_dim = None
x_dim = None
y_dim = None
z_dim = None
if "cube:dimensions" in stac_dict:
    for dim in stac_dict["cube:dimensions"]:
        if stac_dict["cube:dimensions"][dim]["type"] == "bands":
            b_dim = dim
        if stac_dict["cube:dimensions"][dim]["type"] == "temporal":
            t_dim = dim
        if stac_dict["cube:dimensions"][dim]["type"] == "spatial":
            if stac_dict["cube:dimensions"][dim]["axis"] == "x":
                x_dim = dim
            if stac_dict["cube:dimensions"][dim]["axis"] == "y":
                y_dim = dim
            if stac_dict["cube:dimensions"][dim]["axis"] == "z":
                z_dim = dim
print(b_dim,t_dim,x_dim,y_dim,z_dim)

>>> bands t x y None

Result from odc-stac:

import pystac_client
import odc.stac

catalog_url = "https://stac.eurac.edu/"
collection = "SENTINEL2_L2A_SAMPLE"

catalog = pystac_client.Client.open(catalog_url)
query_params = {"collections": [collection]}

items = catalog.search(**query_params).item_collection()
data = odc.stac.load(items,chunks={})
print(data.dims)

>>> FrozenMappingWarningOnValuesAccess({'y': 86, 'x': 98, 'time': 12})

I understand that in the above example I'm passing STAC Items that do not contain the cube:dimensions field, which is provided only at Collection level. Would it make sense to give the option for using the naming convention from the STAC itself?

Kirill888 commented 9 months ago

Support for datacube extension would be cool, but it's not just about spatial dimension names though, it's about dimensions other than time,x,y, it's about multiple variables present in the same hdf/zarr/netcdf-like asset, it's about units (duplicating raster extension) and "data variable type" that seems to be extending stac "role" to components of the asset being described, and other metadata like valid data range or a set of allowed values a particular data variable can hold that one would expect to be exposed as an attribute I guess.

Not to mention that those "hdf-like" data sources often have hard-to-support geo-registration strategies like arrays of pixel locations, as opposed to CRS + Linear Transform.

And as far as spatial dimension names go, having custom names can be more of a pain than advantage, I'm still annoyed that odc-stac uses longitude/latitude dimension names when data is in geographic coordinates and x,y when using projections (that's because of opendatacube/datacube legacy, I should at least add an option to force x,y names regardless of CRS being used).