thomas-delva commented 4 years ago

From github issue: https://github.com/RMLio/rmlmapper-java/issues/65

Description

In RML, if there is an object map that has a reference which creates multiple terms creates and that object map also has a language map, the mapper does not know (and cannot know?) which value of the reference to combine with which value of the language map.

It would be interesting to investigate a way to say in RML that the objectmap should go over the <rdaw:P10086> tags, and then extract from each a value and a language. (Whereas now, RML can only say to create multiple values from the <rdaw:P10086> tags, and also, independently, to create multiple language tags from the <rdaw:P10086> tags.)

Input data (xml)

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:rdaw="http://rdaregistry.info/Elements/w/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618">
    <rdaw:P10086 xml:lang="pt">Lórax (Beber)</rdaw:P10086>
    <rdaw:P10086 xml:lang="af">Loraks</rdaw:P10086>
    <rdaw:P10086 xml:lang="ru">Driad</rdaw:P10086>
    <rdaw:P10086 xml:lang="es">Lórax</rdaw:P10086>
  </rdf:Description>
<rdf:RDF>

Desired output

@prefix bf: <http://id.loc.gov/ontologies/bibframe/>.

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:0 .

_:0 a bf:VariantTitle;
  bf:mainTitle "Driad"@pt, "Loraks"@af, "Lórax"@ru, "Lórax (Beber)"@es .

dachafra commented 4 years ago

Hi @thomas-delva, is this issue not the same as #2?

thomas-delva commented 4 years ago

Hi @dachafra , the issues are related of course, but this one is more specific than the other: it assumes #2 is solved and language tags can be generated from data (with languageMap), but this issue is about when languageMap is used together with a reference which returns >1 things.

Hope this clears things up!

VladimirAlexiev commented 4 years ago

Seems to me RML is missing some notion of locality. What we need for this example is:

iterate over //rdaw:P10086
- make a literal with value text() and lang @xml:lang

Iterating twice is conceptually wrong:

iterate over //rdaw:P10086 and get text()
iterate over //rdaw:P10086 and get @xml:lang

pmaria commented 4 years ago

@thomas-delva I'm trying to understand the example. Particularly the desired result, since the combinations of reference value and language value seem to be randomly combined. That is, it doesn't follow the structure of the document.

Is this your intention?

If so, I don't see a way to achieve this in a reusable way, since the combination logic can't be derived from the source document.

If you want to get this result for this mapping you could do something like (untested):

[]
  rr:predicate bf:mainTitle ;
  rr:objectMap 
    [
      rml:reference "rdaw:P10086[3]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[1]/@xml:lang"
      ] ;
    ] , 
    [
      rml:reference "rdaw:P10086[2]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[2]/@xml:lang"
      ] ;
    ] , 
    [
      rml:reference "rdaw:P10086[4]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[3]/@xml:lang"
      ] ;
    ] , 
    [
      rml:reference "rdaw:P10086[1]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[4]/@xml:lang"
      ] ;
    ] ;
.

I guess you could also solve this using a function valued LanguageMap and have a function contain the logic to return the language you want, based on the input.

But, for that to work, we first need to be able to solve the more common challenge that is very close to what you describe here, or possibly, intended to.

That is, how to get the following output, which does follow the structure of the source.

@prefix bf: <http://id.loc.gov/ontologies/bibframe/>.

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:0 .

_:0 a bf:VariantTitle;
  bf:mainTitle "Lórax (Beber)"@pt, "Loraks"@af, "Driad"@ru, "Lórax"@es

The general challenge here, is when it comes to combining multiple multi-valued expressions into a result of a term map.

Language maps are one example, but another example is an rr:template with multiple expressions, which each lead to multiple values.

The RML spec needs to provide clarity on how to handle these situations. This is issue #4.

For the purpose of this dicussion, let's assume that a cartesian product approach would be the default way of handling these cases. In that case we need something else to solve the case described in this issue.

I think we can look at xR2RML's nested term map (xrr:nestedTermMap) as a possible solution approach, which allows to basically add nested iterations within a term map. We would need to add language maps into the mix though. I guess a language map should then only occur once per "tree" of (nested) term maps.

frmichel commented 4 years ago

Just a complement about xR2RML.

xR2RML has introduced the idea of an xrr:languageReference property of an object map.

About the multi-value question, xR2RML assumes that the evaluation of a reference (xrr:reference: rr:template, xrr:languageReference) can generate multiple values. Then, the term map generates RDF terms as the product of all the terms generated. So if you have this:

   xrr:reference "$.field";
   xrr:languageReference "$.lang";

and if the reference returns 2 turns and the languageReference returns 2 terms, then the term map will yield 4 RDF terms.

So this is basically "naturally" included in xR2RML, and that does not necessarily concern the xrr:nestedTermMap case.

Franck.

pmaria commented 4 years ago

@frmichel but in this case, you don't want to combine all terms, but only those that are grouped together in the source.

1: `["Lórax (Beber)"]`, and `["pt"]`,
2: `[""Loraks"]`, and `["af"]`,
3: `["Driad"]`, and `["ru"]`,
4: `["Lórax"]`, and `["es"]`.

So, is my understanding correct that you would use a xrr:nestedTermMap in this case?

frmichel commented 4 years ago

Oh ok, sorry I had not looked carefully enough. I'm not so much at ease with XPath. Spontaneously I'd write it this way in xR2RML:

[]
  rr:predicate bf:mainTitle ;
  rr:objectMap [
      xrr:reference "rdaw:P10086/*" ;
      xrr:nestedTermMap [
        xrr:reference "/";
        xrr:languageReference "/@xml:lang";
  ] .

But I'm afraid that the "rdaw:P10086/*" with return the nodes values, and thus you will loose the language attribute. Such that "/@xml:lang" will not return anything. One solution, somewhat complicated, could be to use the pushDown feature, but i'm not so confident actually.

I may have underestimated the differences between XPath and JSONPath actually...

pmaria commented 4 years ago

rdaw:P10086 will return

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="pt">Lórax (Beber)</rdaw:P10086>

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="af">Loraks</rdaw:P10086>

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="ru">Driad</rdaw:P10086>

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="es">Lórax</rdaw:P10086>

So

[]
  rr:predicate bf:mainTitle ;
  rr:objectMap [
      xrr:reference "rdaw:P10086" ;
      xrr:nestedTermMap [
        rml:reference ".";
        xrr:languageReference "@xml:lang";
  ] .

should work in this case.

frmichel commented 4 years ago

Ok, thx for the hint @pmaria . So after all yes, the concept of nestedTermMap could fill that need (I edited my mitaken example above to have xrr:reference instead of rml:language).

pmaria commented 4 years ago

Great! I think it is an elegant solution, since you can tackle arbitrarily nested structures. I would combine it with the more generic rml:LanguageMap so you can use all expression types like reference, template, function etc.

frmichel commented 4 years ago

I would combine it with the more generic rml:LanguageMap so you can use all expression types like reference, template, function etc.

Yes I agree, the rml:LanguageMap is more generic. Cool.

dachafra commented 2 years ago

@pmaria is this discussion already included (or there is a plan to include it ) in rml:LanguageMap? So we can close the issue

kg-construct / mapping-challenges

Challenge: Languagemap for multi-value reference #18

Description

Input data (xml)

Desired output