kg-construct / rml-lv

Specification repository for logical views in RML.
https://kg-construct.github.io/rml-lv/dev.html
3 stars 3 forks source link

Relation between fields and collections #3

Open thomas-delva opened 2 years ago

thomas-delva commented 2 years ago

Fields and collections both deal with multivalues in the source data, so they should be aligned.

Currently Franck and Christophe define the gather map as a way to generate collections, we should see how it works if fields are used instead of references: https://docs.google.com/presentation/d/1QYSyuzvN4xO3mC6FTja2RLsZS2JCZLqmt53DXE4KyxM/

In the fields paper a group by approach was proposed to generate collections, where field values are grouped by equal values of other fields, we should probably see how this compares with the gathering approach: image

andimou commented 2 years ago

@thomas-delva could you add the same example as the one used for collection but with fields?

Then it will be easier to compare, but in any case, I think we don't want to have 1 solution but we want them to offer same coverage

thomas-delva commented 2 years ago

Below is a fields version of the examples in the gathermap slides for easier comparison. rml:gatherBy is used to "un-flatten" the multivalues after fields flatten them. There are five distinct examples in the slides and I'll cover them in the same order below: "simple example", "relational databases", "nested iteration over source", "generating nested collections", "multiple gather maps".

Simple example

Data:

{ "a": "1",
  "b": [ "1", "2", "3" ] }

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "a_field" ;
    rml:reference "$.a" ] ;
  rml:field [
    rml:name "b_field" ;
    rml:reference "$.b.*" ] .

Intermediate representation:

field_a field_b
1 1
1 2
1 3

Object map:

... objectMap [
  rml:gather ( [ rml:reference "field_b" ] ) ;
  rml:gatherAs rdf:List ;
  rml:gatherBy "it"  # can be implicit: one level higher than field_b
                     # "it" refers to the iterator, i.e., the "field" one level above field_b
  ] .

Output:

... ( "1" "2" "3" )

Relational databases

Input == intermediate representation:

ID TITLE BOOKID SALUTATION FNAME LNAME
1 Frankenstein 1 NULL Mary Shelley
2 The Long Earth 2 Sir Terry Pratchett
3 The Long Earth 2 Null Stephen Baxter

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:field [
    rml:name "bookid_field" ;
    rml:reference "BOOKID" ] ;
  rml:field [
    rml:name "id_field" ;
    rml:reference "ID" ] .

Triples map:

<TM> a rr:TriplesMap ;
  rml:logicalSource <LS> ;
  rr:subjectMap [ rr:template "http://ex.com/book{bookid_field}" ] ;
  rr:predicateObjectMap [
    rr:predicate :writtenBy ;
    objectMap [
      rml:gather ( [ rr:template "http://ex.com/author{id_field}" ] ) ;
      rml:gatherAs rdf:List ;
      rml:gatherBy "bookid_field"
    ] ] .

Output:

:book1 :writtenBy ( :author1 ) .
:book2 :writtenBy ( :author2 :author3 ) .

Nested iteration

Here, fields can be declared once and then used to generate collections from different iteration levels (compare the two predicate-object maps).

Data:

{ "id": "id",
  "a": [ [ "1", "2", "3" ],
         [ "4", "5", "6" ] ] }

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "id_field" ;
    rml:reference "$.id" ] ;
  rml:field [
    rml:name "a_outer_field" ;
    rml:reference "$.a.*" 
    rml:field [
      rml:name "a_inner_field" ;
      rml:reference "$.*" ] ] .

Intermediate representation:

id_field a_outer_field a_inner_field
id [ "1", "2", "3" ] 1
id [ "1", "2", "3" ] 2
id [ "1", "2", "3" ] 3
id [ "4", "5", "6" ] 4
id [ "4", "5", "6" ] 5
id [ "4", "5", "6" ] 6

Triples map:

<TM> a rr:TriplesMap ;
  rml:logicalSource <LS> ;
  rr:subjectMap [ rr:template "http://ex.com/{id_field}" ] ;
  rr:predicateObjectMap [
    rr:predicate :a_values_grouped ;
    objectMap [
      rml:gather ( [ rml:reference "a_inner_field" ] ) ;
      rml:gatherAs rdf:List ;
      rml:gatherBy "a_outer_field"  # can be implicit; one level higher than a_inner_field
    ] ] ;
  rr:predicateObjectMap [
    rr:predicate :a_values_all ;
    objectMap [
      rml:gather ( [ rml:reference "a_inner_field" ] ) ;
      rml:gatherAs rdf:List ;
      rml:gatherBy "it"  # "it" refers to the iterator
    ] ] .

Output:

:id :a_values_grouped ( "1" "2" "3" ), ( "4" "5" "6" ) ;
    :a_values_all ( "1" "2" "3" "4" "5" "6" ) .

Nested gather maps

Data (same as previous):

{ "id": "id",
  "a": [ [ "1", "2", "3" ],
         [ "4", "5", "6" ] ] }

Logical source + fields (same as previous):

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "id_field" ;
    rml:reference "$.id" ] ;
  rml:field [
    rml:name "a_outer_field" ;
    rml:reference "$.a.*" 
    rml:field [
      rml:name "a_inner_field" ;
      rml:reference "$.*" ] ] .

Intermediate representation (same as previous):

id_field a_outer_field a_inner_field
id [ "1", "2", "3" ] 1
id [ "1", "2", "3" ] 2
id [ "1", "2", "3" ] 3
id [ "4", "5", "6" ] 4
id [ "4", "5", "6" ] 5
id [ "4", "5", "6" ] 6

Object map:

... rr:objectMap [
  rr:termType rr:BlankNode ;
  rml:gather ([
    rr:termType rr:BlankNode ;
    rml:gather ( [ rml:reference "a_inner_field" ] ) ;
    rml:gatherAs rdf:List ;
    rml:gatherBy "a_outer_field" # can be implicit: one level higher than a_inner_field
  ]) ;
  rml:gatherAs rdf:List;
  rml:gatherBy "it" # can be implicit: one level higher than a_outer_field
                    # "it" refers to the iterator, i.e., the "field" one level above a_outer_field
  ] ;

Output:

( ( "1" "2" "3" ) ( "4" "5" "6" ) )

Multiple term maps in gather map

Data:

{ "a": "1", 
  "b": [ "1", "2", "3" ],
  "c": [ "4", "5", "6" ] }

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "a_field" ;
    rml:reference "$.a" ] ;
  rml:field [
    rml:name "b_field" ;
    rml:reference "$.b.*" ]
  rml:field [
    rml:name "c_field" ;
    rml:reference "$.c.*" ] .

Intermediate representation:

field_a field_b field_c
1 1 4
1 1 5
1 1 6
1 2 4
1 2 5
1 2 6
1 3 4
1 3 5
1 3 6

Object map:

... objectMap [
  rml:gather ( [ rml:reference "field_b" ] [ rml:reference "field_c" ] ) ;
  rml:gatherAs rdf:List ;
  rml:gatherBy "it"  # can be implicit: one level higher than field_b
                     # "it" refers to the iterator, i.e., the "field" one level above field_b
  rml:strategy rml:Append ; # default strategy 
  ] .

Output:

... ( "1" "2" "3" "4" "5" "6" )