xcube-dev / xcube

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.
https://xcube.readthedocs.io/
MIT License
201 stars 20 forks source link

xcube server STAC implementation, further information needed to access data #1020

Closed konstntokas closed 4 months ago

konstntokas commented 4 months ago

Is your feature request related to a problem? Please describe. So far, the xcube server STAC implementation does not give further information regarding the store parameters to access the data. This will be needed if a client wants to access the data linked by an item's asset in the STAC catalog. See xcube viewer's data access snippet below:

from xcube.core.store import new_data_store

store = new_data_store(
    "s3",
    root="datasets",  # can also use "pyramids" here
    storage_options={
        "anon": True,
        "client_kwargs": {
            "endpoint_url": "http://localhost:8080/s3"
        }
    }
)
# store.list_data_ids()
dataset = store.open_data(data_id="zarr_file.zarr")

Describe the solution you'd like Add a new field to the asset called something like "xcube:open_kwargs". We can get inspiration from the item https://planetarycomputer.microsoft.com/api/stac/v1/collections/era5-pds/items/era5-pds-1980-01-fc, which stores extra information in "xarray:open_kwargs" for opening the data. Similarly, we could add

xcube:open_kwargs =  dict(
    root="datasets",
    endpoint_url="http://localhost:8080/s3"
)
forman commented 4 months ago

Similarly, we could add

We should provide the parameters that would allow users using the xcube data store framework. Therefore we need the following information:

If we stick to the datasets published by the same xcube Server instance that also provides the STAC API we may boil it down to just the S3 API parameters.

konstntokas commented 4 months ago

With the PR #1029 I can now read zarr, levels, geotiffs and cog-geotiffs from the STAC published by xcube server. Two questions remain:

  1. When starting the server, all files are published as zarrs. See xcube/webapi/ows/stac/controllers.py#L800 Why is that? This also means that the levels file is presented as a dataset instead of a mldataset.
  2. When adding a netcdf file to the server configuration, something goes wrong. Also the viewer does not show the datasets. I see in the examples/serve/demo/config.yml that the cube.nc is not assigned. Does this mean that the server cannot publish netcdf files?
konstntokas commented 4 months ago

The PR #1029 is converted to a draft. First the MIME-type and format extension of the dataset will be added to the asset. So far all files will be published as zarrs.

konstntokas commented 4 months ago

The xcube server publishes each dataset as .zarr and .levels on the s3/datasets and s3/pyramidsendpoint, respectively. Two assets will be published, namely analytic (asset as before) and analytic_multires, linking to the dataset and the multi-level dataset, respectively.