Open rly opened 3 months ago
cc @sneakers-the-rat @cmungall
I'm looking at how to 1) add attributes to arrays and 2) support labeling arrays with other arrays.
For 1), I think we need to keep supporting classes that implement linkml:NDArray
and allow those to be an axis in a linkml:DataArray
. Alternatively, we could add attributes to TemperatureDataset
and have them share a common prefix, e.g., latitude_in_deg__precision
, but this is kinda ugly and relies on a naming convention. Note that numpy arrays and other simple array formats do not allow attributes, but HDF5 (and netCDF4) datasets and Zarr arrays do. Attributes are also allowed in xarray.DataArray
.
For 2) if we allow a linkml:NDArray
to be a labeled dimension of a linkml:DataArray
, then because the NDArray
class could contain multiple arrays, we need a way to identify the intended array within the class. We can keep doing that with linkml:elements
. Alternatively, we could make values
a special slot name for any class that implements linkml:NDArray
. Or change the slot name in the example to elements
. Or change linkml:elements
to linkml:values
. (In NWB, the Data class for arrays defines a required data
field.)
ah yes, is it time for the second leg, the indexed array spec?
Can we get a few examples of the desired datasets we want to support with this? I think it might be helpful to have a few concrete test cases here so we can get a handle on the constraints we'll need to handle. are the ones in linkml-arrays
still current? From that we can generate a set of requirements and constraints that help inform these decisions ^ :)
This PR adds an alternate approach to specifying the
TemperatureDataset
defined intests/input/examples/schema_definition-native-array-1.yaml
. This approach uses classes that implementlinkml:NDArray
and have an attribute that implementslinkml:elements
as defined pre-1.7.0 release. This representation is necessary for adding additional attributes, e.g., user-specified units of measurement, conversion factor, precision/error, reference/zero point, or source, on the various arrays that make up aTemperatureDataset
. Seeking feedback on this approach.It also changes
y
->"y"
becausey
= True in YAML 1.1.