linkml / linkml-model

Link Modeling Language (LinkML) model
https://linkml.github.io/linkml-model/docs/
33 stars 16 forks source link

First pass at native NDArray support. #181

Closed cmungall closed 5 months ago

cmungall commented 5 months ago

See

This introduces first-class array support into LinkML.

A minimal example would be:

    attributes:
      temperature_matrix:
        range: float
        array_info:
          exact_dimensions: 3

The native serialization of this in json/yaml will be a LoLoL. Using linkml-xarrays it will be possible to serialize using hdf5/zarr/etc.

The corresponding nptyping type would be NDArray[Shape["*, *, *"], Float].

(note: modelers will want the ability to use ctypes but this is orthogonal)

Note that this does not force any metadata on the array; we are deferring on the datamodel for what is equivalent to xarray DataArrays, these will be supported via implements for now and first-class incorporation in a future version. This will allow binding between axes are other LinkML arrays.

Minimal metadata can be introduced via naming the axes

    attributes:
      temperature_matrix:
        range: float
        array_info:
          exact_dimensions: 3
          dimensions:
            x:
            y:
            z:

The corresponding nptyping type would be NDArray[Shape["* x, * y, * z"], Float].

The shape can be further constrained; imagine an RGB matrix with coords x, y, and a length 3 r/g/b:

    attributes:
      rgb:
        range: float
        array_info:
          exact_dimensions: 3
          dimensions:
            x:
            y:
            rgb:
              exact_cardinality: 3
              description: r, g, b values
              annotations:
                names: "[red, green, blue]"

corresponds to NDArray[Shape["* x, * y, 3 rgb"]

For now if you do want to bind dimensions to additional metadata this can be done via annotations:

classes:

  TemperatureDataset:
    tree_root: true
    annotations:
      array_data_mapping:
        data: temperatures_in_K
        dims: [x, y, t]
        coords:
          latitude_in_deg: x
          longitude_in_deg: y
          time_in_d: t
    attributes:
      name:
        identifier: true
        range: string
      latitude_in_deg:
        required: true
        range: float
        multivalued: true
        unit:
          ucum_code: deg
        array_info:
          exact_dimensions: 1
      longitude_in_deg:
        required: true
        range: float
        multivalued: true
        unit:
          ucum_code: deg
        array_info:
          exact_dimensions: 1
      time_in_d:
        range: float
        multivalued: true
        implements:
          - linkml:elements
        required: true
        unit:
          ucum_code: d
        array_info:
          exact_dimensions: 1
      temperatures_in_K:
        range: float
        multivalued: true
        required: true
        unit:
          ucum_code: K
        array_info:
          exact_dimensions: 3
rly commented 5 months ago

Notes:

Some useful validation checks:

  1. The number of items in dimensions_info must not exceed maximum_number_of_dimensions
  2. Each dimension_index must not exceed maximum_number_of_dimensions - 1
  3. Each dimension_index must be unique across dimensions_info items
  4. If array_info exists, then multivalued must be True (if we allow scalar arrays, then this is no longer always true)
  5. minimum_number_dimensions <= maximum_number_dimensions
  6. Cannot have both exact_number_dimensions and minimum_number_dimensions or both exact_dimensions and maximum_number_dimensions
  7. minimum_number_dimensions, exact_number_dimensions, maximum_number_dimensions all > 0
  8. minimum_cardinality <= maximum_cardinality
  9. Cannot have both exact_cardinality and minimum_cardinality or both exact_cardinality and maximum_cardinality
  10. minimum_cardinality, exact_cardinality, maximum_cardinality all > 0
sneakers-the-rat commented 5 months ago

for posterity, archived version of matrix of tradeoffs to be added to consolidated docs later, along with @rly 's notes and examples :)

live: https://wiki.jon-e.net/LinkML_Arrays archive: https://web.archive.org/web/20240207234701/https://wiki.jon-e.net/LinkML_Arrays