kg-construct / yarrrml-spec

A human readable text-based representation for declarative Linked Data generation rules.
https://w3id.org/kg-construct/yarrrml
1 stars 1 forks source link

Beyond mapping single triples #11

Open midorna opened 3 weeks ago

midorna commented 3 weeks ago

When sub-graphs for objects shall be generated using the current YARRRML specification, we need to define multiple triples with non-blank subject nodes as references, even when these are not required further. Therefore, a shortcut notation is proposed, which can be used to reduce the number of mappings to be specified.
Note: The following proposal adds to the proposal for issue https://github.com/kg-construct/yarrrml-spec/issues/8. It uses and extends the functional notation shortcut suggested there.

Proposal Use spom(<subject>, <predicate-objects> ) as a template for subject-predicate-object mappings, which enables the creation of graphs using embedded templates.
If <subject>'s IRI is not required, a blank node can be used, which is indicated by an underscore (_).
If used for "simple" mappings, a new field spo would be required.

Example 1: current syntax

The following example is just for introducing the notation.

mappings:
  person:
    s: $(person)
    po:
      - [foaf:firstName, $(firstname)]
      - [foaf:lastName, $(lastname)]

Example 1: proposed syntax

mappings:
  person:
    spo: spom($(person), 
              [pom(foaf:firstName, $(firstname)), 
               pom(foaf:lastName, $(lastname))])

Example 2: current syntax

Now, we map values of bounding box objects, which have a height, a length and a width, each defined as own objects.

mappings
  BoundingBox:
    s: ex:BoundingBox-$(Id)
    po:
      - [a, my:BoundingBox]
      - [my:hasHeight, ex:BoxHeight-$(Height)~iri]
      - [my:hasLength, ex:BoxLength-$(Length)~iri]
      - [my:hasWidth, ex:BoxWidth-$(Width)~iri]

  BoxHeight:
    s: ex:BoxHeight-$(Height)
    po: [[my:hasValue, $(Height), xsd:float]]

  BoxLength:
    s: ex:BoxLength-$(Length)
    po: [[my:hasValue, $(Length), xsd:float]]

  BoxWidth:
      s: ex:BoxWidth-$(Width)
      po: [[my:hasValue, $(Width), xsd:float]]

Remarks
The objects of all three dimensions are defined as subclasses of qudt:QuantityValue. I.e. they all have a value and a unit. In the shown example, the unit is always the same and not further specified in the mappings. In fact, the unit is defined in a subclass of quantitykind:Length in a used ontology, and this class is declared to be the range of all 3 dimension properties of our bounding box. If units differ from a data source, the respective unit should also be part of the mapping, of course.

Example 2: proposed syntax

mappings
  BoundingBox:
    s: ex:BoundingBox-$(Id)
    po: 
      - pom(a, my:BoundingBox)
      - pom(my:hasHeight, spom(_, pom(my:hasValue, $(Height), xsd:float)))
      - pom(my:hasLength, spom(_, pom(my:hasValue, $(Length), xsd:float)))
      - pom(my:hasWidth, spom(_, pom(my:hasValue, $(Width), xsd:float)))
bjdmeest commented 1 week ago

Where I do understand an interest in nested mappings, I keep my same reservations as just mentioned in https://github.com/kg-construct/yarrrml-spec/issues/8 (i.e. not adding too much functionality outside of YAML).

Would something like below be reasonable for you?

mappings
  BoundingBox:
    s: ex:BoundingBox-$(Id)
    po: 
      - [a, my:BoundingBox]
      - p: my:hasHeight
        o:
          po: [my:hasValue, $(Height), xsd:float]
      - p: my:hasLength
        o: 
          po: [my:hasValue, $(Length), xsd:float]
      - p: my:hasWidth
        o:
          po: [my:hasValue, $(Width), xsd:float]
midorna commented 1 week ago

Thanks! This looks very reasonable, if function syntax should not go into YARRRML.

I waited to see a reaction to my proposal, since I want to go even further in allowing users to define macro shortcuts on their own. For the bounding box example, this would mean that we need a section to define shortcuts using a key (BoundingBox) with a dictionary for parameters ([Id, Height, Length, Width]) and the representing graph object (see the examples above by changing the references into parameters) . This kind of definition works perfectly in YAML (and is already used in shortcuts for sources or targets without parameters).

Such a shortcut would then allow us to use/create a bounding box graph by calling, e.g., +BoundingBox($(Id), $(Height), $(Length), $(Width)), where + would indicates a user-defined shortcut. This speeds up the development and maintenance of mappings since we avoid copy and paste of nested YAML and the creation of IRIs instead of auto-generated blank nodes. As a side effect, we avoid some indentation issues where YAML has disadvantages with missing, explicit block markers.

I could make a detailed proposal for the definition of these user-defined shortcuts, too. But we first need to know, if the door for functional notation in YARRRML will be opened or not.

Thanks again. Following your proposal would be a beneficial first step in any case, since we avoid the manual creation of IRIs and a collection of separated mapping rules.