Open nbren12 opened 2 years ago
@nbren12 The other way round would be useful too. Aren't there xarray extension packages around where this would fit into?
Aren't there xarray extension packages around where this would fit into?
I'm not sure. Any suggestions? Just wondering if xarray has left the door open to this kind of contribution since it
ds.info()
.@nbren12 See https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html for adding a new backend. That way we could have
xr.open_dataset('schema.cdl', engine="cdl")
- creates CDL using ds.info().
Great, this somehow went past me.
To be fair, ds.info
is not 100% CDL, but it's darn close.
To be fair, ds.info is not 100% CDL, but it's darn close.
I think making ds.info
CDL compliant would be a great feature addition.
Describe alternatives you've considered
Some kind of schema object that can be used to validate or generate an xarray Dataset, but does not contain any data.
You may be interested in xarray-schema then. We're actively working on / using this project and would be more than happy to think about how a cdl-like schema fits in there.
@jhamman We have a similar schema package https://github.com/ai2cm/fv3net/tree/master/external/synth, cool to see you confronting the same challenges and advertising your solutions more broadly. One problem we had is that our schema objects ended up being quite verbose: https://github.com/ai2cm/fv3net/blob/master/external/loaders/tests/test__batch/one_step_zarr_schema.json. CDL is a lot more concise.
Is your feature request related to a problem?
No.
Describe the solution you'd like
It would be nice to load/generate xarray datasets from Common Data Language (CDL) descriptions. CDL is a DSL that that defines a netCDF dataset, and is quite nice for testing. We use it to build mock datasets for e.g. integration testing of plotting routines/complex data analysis etc. CDL provides a concise format for storing the schema of this data. This schema can be used for validation or generation (using the CLI
ncgen
).CDL is basically the format produced by
xarray.Dataset.info
. It looks like this:I wrote a small pure python parser for CDL last night and it seems work! There are similar projects on github. Sadly, these projects seem to be abandoned so it would be nice to attach to an effort like xarray.
Describe alternatives you've considered
Some kind of
schema
object that can be used to validate or generate an xarray Dataset, but does not contain any data.Additional context
No response