linkml / linkml-model

Link Modeling Language (LinkML) model
https://linkml.github.io/linkml-model/docs/
33 stars 16 forks source link

Add array class example #190

Open rly opened 3 months ago

rly commented 3 months ago

This PR adds an alternate approach to specifying the TemperatureDataset defined in tests/input/examples/schema_definition-native-array-1.yaml. This approach uses classes that implement linkml:NDArray and have an attribute that implements linkml:elements as defined pre-1.7.0 release. This representation is necessary for adding additional attributes, e.g., user-specified units of measurement, conversion factor, precision/error, reference/zero point, or source, on the various arrays that make up a TemperatureDataset. Seeking feedback on this approach.

It also changes y -> "y" because y = True in YAML 1.1.

rly commented 3 months ago

cc @sneakers-the-rat @cmungall

I'm looking at how to 1) add attributes to arrays and 2) support labeling arrays with other arrays.

For 1), I think we need to keep supporting classes that implement linkml:NDArray and allow those to be an axis in a linkml:DataArray. Alternatively, we could add attributes to TemperatureDataset and have them share a common prefix, e.g., latitude_in_deg__precision, but this is kinda ugly and relies on a naming convention. Note that numpy arrays and other simple array formats do not allow attributes, but HDF5 (and netCDF4) datasets and Zarr arrays do. Attributes are also allowed in xarray.DataArray.

For 2) if we allow a linkml:NDArray to be a labeled dimension of a linkml:DataArray, then because the NDArray class could contain multiple arrays, we need a way to identify the intended array within the class. We can keep doing that with linkml:elements. Alternatively, we could make values a special slot name for any class that implements linkml:NDArray. Or change the slot name in the example to elements. Or change linkml:elements to linkml:values. (In NWB, the Data class for arrays defines a required data field.)

sneakers-the-rat commented 3 months ago

ah yes, is it time for the second leg, the indexed array spec?

Can we get a few examples of the desired datasets we want to support with this? I think it might be helpful to have a few concrete test cases here so we can get a handle on the constraints we'll need to handle. are the ones in linkml-arrays still current? From that we can generate a set of requirements and constraints that help inform these decisions ^ :)