linkml / linkml-map

Mapping between LinkML schemas
https://linkml.io/linkml-map/
16 stars 2 forks source link

Create hierarchie with linkml-map #38

Open Duanoc opened 4 days ago

Duanoc commented 4 days ago

What is your feature request? I am currently testing linkml and linkml-map for the export of metadata from an electronic lab notebook (ELN) into a metadata catalogue. The metadata in the ELN are collected via forms following a certain schema. The metadata cataloque follows another schema and I wanted to use linkml-map to transform the metadata form one schema into another schema. What I figured out so far is, that linkml-map can be use to modify the values, rename keys and flatten the structure. But the flatten the structure is not reversable. Creating the inverse mapping file is not possible for those slots. This brings me to the problem I have. How can I reduce and create hierarchy and how can i assign/rearrange slots to new classes? As an examle I would like transform one of the following data.json into the other and back.

{
    "scientificMetadata": {
        "frequency": 2{
            "value": 2,
            "unit": "THz"
        }
    }
}
{
  "Has Frequency": "2000000000000 Hz",
}

Can a mapping file be created for that case and if so, how? The main problem I see is that flattened slots can not be inverted: linkml_map.inference.inverter.NonInvertibleSpecification: Cannot invert expression scientificMetadata.frequency.value in slot derivation: Has Frequency

How important is this feature to you? Select from the options below: • Important - it's a blocker and can't do work without it

Additional context It is similar to linkml/linkml#2076

  • define two separately maintained schemas (flat, structured, etc)
  • Define linkml-map rules to convert between the two: https://linkml.io/linkml-map/

The only diversion I would take from Nico's approach would be that we'd like linkml-map rules to dictate the second schema for you (let the rules manage the transform and remove the need for manually developing both the flat and normalized schema)

Is is currently possible to use linkml-map to create a structure schema from a flattened schema?

cmungall commented 3 days ago

I moved this to linkml-map as it isn't considered core yet.

Yes, this should be possible, there are some examples here:

https://linkml.io/linkml-map/#examples/Tutorial/#unit-conversions

It looks like you have seen these and something in your spec isn't working - can you post a reproducible example?

Duanoc commented 1 day ago

Actually the unit conversion is not my priority problem, but here is what I tested. The linkml model:

id: https://hzdr.de/linkml/opendata/beamline_TELBE
name: TELBE
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string
classes:
  PANDataset:
    tree_root: true
    attributes:
      scientificMetadata:
        range: ScientificMetadata
        required: true
  ScientificMetadata:
    attributes:
      frequency:
        range: FrequencyMeasurement
  FrequencyMeasurement:
    attributes:
      value:
        range: decimal
        unit:
          ucum_code: THz
      unit:
        range: FrequencyUnits

enums:
  FrequencyUnits:
    permissible_values:
      THz:
        description: current defined in terra Herz

Question here would be: How to handle multiple frequency units in enums together with ucum_code? The ucum_code would differ if another unit was chooses. Which means the value of ucum_code would needs to be the enum FrequencyUnits which does not work. It only accepts a string.

The data.json from above together with the linkml-map

class_derivations:
  PANDataset:
    name: PANDataset
    populated_from: PANDataset
    slot_derivations:
      scientificMetadata:
        range: ScientificMetadata
  ScientificMetadata:
    populated_from: ScientificMetadata
    slot_derivations:
      frequency:
        range: FrequencyMeasurement
  FrequencyMeasurement:
    populated_from: FrequencyMeasurement
    slot_derivations:
      value:
        name: "Has_FWKP:FrequencyTHz"
        populated_from: value
        unit_conversion:
          target_unit: Hz

works. I get:

WARNING:linkml_map.transformer.transformer:Unknown target range FrequencyMeasurement
WARNING:linkml_map.transformer.transformer:Unknown target range ScientificMetadata
scientificMetadata:
  frequency:
    Has_FWKP:FrequencyTHz: 2000.0000000000002

But if I exchange populated_from: value with expr: value I get {}. It seems that unit conversion does not works together with expr, which I used to flatten the structure: expr: scientificMetadata.frequency.value within scientificMetadata.

A workaround would be to use expr: value * 1000000000000.

Duanoc commented 1 day ago

I was not sure where to put that issue and there where no issues in linkml-map. That's why I put it here.

My actual problem is deflatten the structure. Starting with the data.json from my question at the top, the linkml model from the comment above and the linkml-map:

class_derivations:
  PANDataset:
    name: PANDataset
    populated_from: PANDataset
    slot_derivations:
      scientificMetadata:
        name: "Has_FWKP:FrequencyTHz"
        expr: scientificMetadata.frequency.value

I get Has_FWKP:FrequencyTHz: 2. Which is what I need as a flat structure.

If I perform the invert transformation linkml-map invert -T mapping.yaml source-schema.yaml I get the error described above linkml_map.inference.inverter.NonInvertibleSpecification: Cannot invert expression scientificMetadata.frequency.value in slot derivation: Has_FWKP:FrequencyTHz

It is the same error if I use the generated target schema via linkml-map derive-schema -T mapping.yaml source-schema.yaml instead of the source-schema.yaml.

The question is how can I transform

{
  "Has Frequency": "2000000000000",
}

into

{
    "scientificMetadata": {
        "frequency": 2{
            "value": 2,
            "unit": "THz"
        }
    }
}

using linkml-map?