Open sfinkens opened 9 months ago
As an additional option for some of these fields, I wonder if we could store them as coordinate variables?
I'm not a huge fan of the "metadata" DataTree, but I also really don't have a good alternative except for maybe coordinate variables. Just so we have the whole picture, besides units can you think of other attributes that these metadata "things" need to keep track of?
In some sense this feels like a failing of the NetCDF model. Where we might want units or other information for attributes we can't do it. I feel like most NetCDF creators have leaned towards external documentation that describes the units of attributes. Not that I want to necessarily do that, but could that simplify this if we "forced" that all distances are in meters, all times are epoch nanoseconds, etc?
Indeed, coordinate variables would work, nice idea!
<xarray.Dataset>
Dimensions: (time: 1, y: 2, x: 2, tle_dim: 2)
Coordinates:
* time (time) int64 0
* y (y) int64 0 1
* x (x) int64 0 1
projection_longitude (time) int64 0
Attributes:
units: degrees east
nominal_start_time (time) int64 0
Attributes:
units: nanoseconds since 1970-01-01
two_line_elements (time, tle_dim) int64 1 2
Data variables:
ch1 (time, y, x) int64 1 2 3 4
Just so we have the whole picture, besides units can you think of other attributes that these metadata "things" need to keep track of?
Looking at the Metadata section in the documentation, there are also attributes for area definition and raw metadata. The area def can be stored as coordinate variable, too. And raw metadata is probably too special...
Not that I want to necessarily do that, but could that simplify this if we "forced" that all distances are in meters, all times are epoch nanoseconds, etc?
This would certainly help with units, but not with dimensions or encoding.
I think it's a good idea to improve the metadata storage indeed.
Coordinate variables are good for some things. For example, in rioxarray, projection information is store in a spatial_ref
coordinate variable, which I think is interesting. As we have it now, its info is equivalent to what we have in our area
attribute, but we could use the same idea for other info, eg we could have an orbital_ref
for everything related to orbital information.
But then comes the question of what really belongs as a coordinate variable. I see coordinate variables as variables that are related in one way or another to dimension/coordinates, hence eg calibration coefficients would not really fit there.
In the first example you provide, the gain and offset are aligned to the time dimension, is this intentional, or should they be dimensionless?
calibration coefficients would not really fit there
Good point.
In the first example you provide, the gain and offset are aligned to the time dimension, is this intentional, or should they be dimensionless?
I wanted to indicate that coefficients can be time-dependent, but it isn't strictly needed. xr.concat(datasets, dim="time")
also produces a timeseries for scalars.
Feature Request
Is your feature request related to a problem? Please describe.
We repeatedly face encoding/decoding problems in the CF writer, so I have been thinking a bit about how we store metadata and possible alternatives.
Pros
Currently we store metadata in dataset attributes. This is very convenient, because we can just attach any Python type (string, array, dict etc) to a dataset.
Cons
Over time attributes have become more complex: From a simple sensor name to orbital parameters, time parameters, calibration coefficients etc. Now we're experiencing some problems with that approach
Describe the solution you'd like
I think many attributes could be stored as Data Arrays or Datasets. Then we could leverage the full potential of the xarray data model combined with CF conventions. We would still be using datasets attributes, but mainly strings (such as
units
,calendar
etc).When we are moving towards data trees (https://github.com/pytroll/satpy/issues/2605), we could organize them in a "metadata" branch (either global or dataset-specific). For example:
Describe any changes to existing user workflow Users would have to adapt to the new data structure.
Additional context For units, we could use
pint
already.