Open thomas-delva opened 2 years ago
@thomas-delva could you add the same example as the one used for collection but with fields?
Then it will be easier to compare, but in any case, I think we don't want to have 1 solution but we want them to offer same coverage
Below is a fields version of the examples in the gathermap slides for easier comparison. rml:gatherBy
is used to "un-flatten" the multivalues after fields flatten them.
There are five distinct examples in the slides and I'll cover them in the same order below: "simple example", "relational databases", "nested iteration over source", "generating nested collections", "multiple gather maps".
Data:
{ "a": "1",
"b": [ "1", "2", "3" ] }
Logical source + fields:
<LS> a rml:LogicalSource ;
rml:iterator "$" ;
rml:field [
rml:name "a_field" ;
rml:reference "$.a" ] ;
rml:field [
rml:name "b_field" ;
rml:reference "$.b.*" ] .
Intermediate representation:
field_a | field_b |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
Object map:
... objectMap [
rml:gather ( [ rml:reference "field_b" ] ) ;
rml:gatherAs rdf:List ;
rml:gatherBy "it" # can be implicit: one level higher than field_b
# "it" refers to the iterator, i.e., the "field" one level above field_b
] .
Output:
... ( "1" "2" "3" )
Input == intermediate representation:
ID | TITLE | BOOKID | SALUTATION | FNAME | LNAME |
---|---|---|---|---|---|
1 | Frankenstein | 1 | NULL | Mary | Shelley |
2 | The Long Earth | 2 | Sir | Terry | Pratchett |
3 | The Long Earth | 2 | Null | Stephen | Baxter |
Logical source + fields:
<LS> a rml:LogicalSource ;
rml:field [
rml:name "bookid_field" ;
rml:reference "BOOKID" ] ;
rml:field [
rml:name "id_field" ;
rml:reference "ID" ] .
Triples map:
<TM> a rr:TriplesMap ;
rml:logicalSource <LS> ;
rr:subjectMap [ rr:template "http://ex.com/book{bookid_field}" ] ;
rr:predicateObjectMap [
rr:predicate :writtenBy ;
objectMap [
rml:gather ( [ rr:template "http://ex.com/author{id_field}" ] ) ;
rml:gatherAs rdf:List ;
rml:gatherBy "bookid_field"
] ] .
Output:
:book1 :writtenBy ( :author1 ) .
:book2 :writtenBy ( :author2 :author3 ) .
Here, fields can be declared once and then used to generate collections from different iteration levels (compare the two predicate-object maps).
Data:
{ "id": "id",
"a": [ [ "1", "2", "3" ],
[ "4", "5", "6" ] ] }
Logical source + fields:
<LS> a rml:LogicalSource ;
rml:iterator "$" ;
rml:field [
rml:name "id_field" ;
rml:reference "$.id" ] ;
rml:field [
rml:name "a_outer_field" ;
rml:reference "$.a.*"
rml:field [
rml:name "a_inner_field" ;
rml:reference "$.*" ] ] .
Intermediate representation:
id_field | a_outer_field | a_inner_field |
---|---|---|
id | [ "1", "2", "3" ] | 1 |
id | [ "1", "2", "3" ] | 2 |
id | [ "1", "2", "3" ] | 3 |
id | [ "4", "5", "6" ] | 4 |
id | [ "4", "5", "6" ] | 5 |
id | [ "4", "5", "6" ] | 6 |
Triples map:
<TM> a rr:TriplesMap ;
rml:logicalSource <LS> ;
rr:subjectMap [ rr:template "http://ex.com/{id_field}" ] ;
rr:predicateObjectMap [
rr:predicate :a_values_grouped ;
objectMap [
rml:gather ( [ rml:reference "a_inner_field" ] ) ;
rml:gatherAs rdf:List ;
rml:gatherBy "a_outer_field" # can be implicit; one level higher than a_inner_field
] ] ;
rr:predicateObjectMap [
rr:predicate :a_values_all ;
objectMap [
rml:gather ( [ rml:reference "a_inner_field" ] ) ;
rml:gatherAs rdf:List ;
rml:gatherBy "it" # "it" refers to the iterator
] ] .
Output:
:id :a_values_grouped ( "1" "2" "3" ), ( "4" "5" "6" ) ;
:a_values_all ( "1" "2" "3" "4" "5" "6" ) .
Data (same as previous):
{ "id": "id",
"a": [ [ "1", "2", "3" ],
[ "4", "5", "6" ] ] }
Logical source + fields (same as previous):
<LS> a rml:LogicalSource ;
rml:iterator "$" ;
rml:field [
rml:name "id_field" ;
rml:reference "$.id" ] ;
rml:field [
rml:name "a_outer_field" ;
rml:reference "$.a.*"
rml:field [
rml:name "a_inner_field" ;
rml:reference "$.*" ] ] .
Intermediate representation (same as previous):
id_field | a_outer_field | a_inner_field |
---|---|---|
id | [ "1", "2", "3" ] | 1 |
id | [ "1", "2", "3" ] | 2 |
id | [ "1", "2", "3" ] | 3 |
id | [ "4", "5", "6" ] | 4 |
id | [ "4", "5", "6" ] | 5 |
id | [ "4", "5", "6" ] | 6 |
Object map:
... rr:objectMap [
rr:termType rr:BlankNode ;
rml:gather ([
rr:termType rr:BlankNode ;
rml:gather ( [ rml:reference "a_inner_field" ] ) ;
rml:gatherAs rdf:List ;
rml:gatherBy "a_outer_field" # can be implicit: one level higher than a_inner_field
]) ;
rml:gatherAs rdf:List;
rml:gatherBy "it" # can be implicit: one level higher than a_outer_field
# "it" refers to the iterator, i.e., the "field" one level above a_outer_field
] ;
Output:
( ( "1" "2" "3" ) ( "4" "5" "6" ) )
Data:
{ "a": "1",
"b": [ "1", "2", "3" ],
"c": [ "4", "5", "6" ] }
Logical source + fields:
<LS> a rml:LogicalSource ;
rml:iterator "$" ;
rml:field [
rml:name "a_field" ;
rml:reference "$.a" ] ;
rml:field [
rml:name "b_field" ;
rml:reference "$.b.*" ]
rml:field [
rml:name "c_field" ;
rml:reference "$.c.*" ] .
Intermediate representation:
field_a | field_b | field_c |
---|---|---|
1 | 1 | 4 |
1 | 1 | 5 |
1 | 1 | 6 |
1 | 2 | 4 |
1 | 2 | 5 |
1 | 2 | 6 |
1 | 3 | 4 |
1 | 3 | 5 |
1 | 3 | 6 |
Object map:
... objectMap [
rml:gather ( [ rml:reference "field_b" ] [ rml:reference "field_c" ] ) ;
rml:gatherAs rdf:List ;
rml:gatherBy "it" # can be implicit: one level higher than field_b
# "it" refers to the iterator, i.e., the "field" one level above field_b
rml:strategy rml:Append ; # default strategy
] .
Output:
... ( "1" "2" "3" "4" "5" "6" )
Fields and collections both deal with multivalues in the source data, so they should be aligned.
Currently Franck and Christophe define the gather map as a way to generate collections, we should see how it works if fields are used instead of references: https://docs.google.com/presentation/d/1QYSyuzvN4xO3mC6FTja2RLsZS2JCZLqmt53DXE4KyxM/
In the fields paper a group by approach was proposed to generate collections, where field values are grouped by equal values of other fields, we should probably see how this compares with the gathering approach: