Closed mpiannucci closed 1 year ago
That definitely doesn't seem right to me, but I've got less experience with the depths of the Zarr router.
It also looks like forecast_reference_time
is getting dropped after a single level is selected.
I'm guessing it's not by design so feel free to start digging.
Comparing the .zattrs
that is specified by the source dataset for one variable:
"temp\/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"ocean_time\",\"s_rho\",\"eta_rho\",\"xi_rho\"],\"cell_methods\":\"ocean_time: point\",\"coordinates\":\"lon_rho lat_rho s_rho ocean_time\",\"field\":\"temperature\",\"grid\":\"grid\",\"location\":\"face\",\"long_name\":\"potential temperature\",\"standard_name\":\"sea_water_potential_temperature\",\"time\":\"ocean_time\",\"units\":\"Celsius\"}",
to the one that the zarr router publishes:
"temp/.zattrs": {
"cell_methods": "ocean_time: point",
"field": "temperature",
"grid": "grid",
"location": "face",
"long_name": "potential temperature",
"standard_name": "sea_water_potential_temperature",
"time": "ocean_time",
"units": "Celsius",
"_ARRAY_DIMENSIONS": [
"ocean_time",
"s_rho",
"eta_rho",
"xi_rho"
]
},
So it looks like all of the metadata for the array is returned except the coordinates
for some reason
The overall metadata is setup here:
But it looks like each variable is encoded here:
I believe Zarr v3 is also landing really soon which will require a bit more restructuring (consolidated by default, removing some Python JSON parsing quirks from the spec) so if it's going to be a good bit of work, it may be worth waiting till v3 and make one push to get the router compliant. Or maybe we want to split off the V2 router into it's own plugin...
yeah i was looking at the same. So coordinates
arent included as attrs on the data array. So if I encode them into the attrs separately that might fix it.
But it seems that create_zmetadata(dataset) encodes a Variable and not a DataArray and variables only have dims and no coords.
It might be worth looking into xarray.backends.zarr.ZarrStore.store()
as that is what ds.to_zarr()
ends up calling:
I got it working by encoding attributes from the DataArray instead of the Variable itself. Im not sure how youll feel about it but I will put up a PR and we can go from there
When serving a kerchunked data set from AWS, the coordinates are not all transmitted when accsssing the same dataset with the Zarr router, in this case specifically
lat_rho
andlon_rho
. The dataset used in the below examples is public.Accessed with just xarray:
Accessed with xpublish's zarr router:
As a check, I logged the
coords
of the same variable on the xpublish side, which gives the following:So I am not sure where, but somewhere along the line the
lat_rho
andlon_rho
coords are dropped by the zarr router. I am not sure if this is by design, and I can look into it, but wanted to raise it in case there is info I am missing before I dig too deeply.