Closed bjdmeest closed 7 months ago
I think this might be affected in the end by how we handle joins in RML
Came from kg-construct/rml-core#1
it is currently possible to join values across data sources, but without join conditions (see eg https://github.com/RMLio/rml-fno-test-cases/blob/master/RMLFNOTC0009-CSV/mapping.ttl, see also https://kg-construct.slack.com/archives/C01QFSW77QF/p1615717859003600)
I think it goes back again to the basic definition of fnml:functionMap
; is it correct to be able to define a rml:logicalSource
for a rr:termMap
different than the rml:logicalSource
of the rr:triplesMap
to which it belongs?
The current use case for a LogicalSource
definition on a FunctionTriplesMap
seems to be:
The ability to generate values from a different source and use these values as the result of a Function Term Map.
An example of this is included in one of the proposed FnO test cases: RMLFNOTC009
However, since a FunctionTriplesMap
doesn't generate values directly, but generates intermediate function execution triples expressed in FnO, the question of how to handle joins between a TriplesMap
and a FunctionTriplesMap
with a different LogicalSource
arises.
As this is not the same type of join as a join on a RefObjectMap
this join would have to be defined. Subsequently, this would require another specific type of join to be implemented by engines.
At the same time we have a very similar mapping challenge for generating literal values by a joining different logical sources: join-on-literal challenge.
I believe it would be advantageous to come up with a solution that covers both generating literals from different LogicalSource
s using joins, as generating function values from different LogicalSource
s.
As this solution would not be specific to functions, I think we should look for a solution in the definition of LogicalSource
s. (pinging @thomas-delva)
Does a Function Triples Map need a Logical Source?
The current use case for a
LogicalSource
definition on aFunctionTriplesMap
seems to be:The ability to generate values from a different source and use these values as the result of a Function Term Map.
An example of this is included in one of the proposed FnO test cases: RMLFNOTC009
However, since a
FunctionTriplesMap
doesn't generate values directly, but generates intermediate function execution triples expressed in FnO, the question of how to handle joins between aTriplesMap
and aFunctionTriplesMap
with a differentLogicalSource
arises.As this is not the same type of join as a join on a
RefObjectMap
this join would have to be defined. Subsequently, this would require another specific type of join to be implemented by engines.At the same time we have a very similar mapping challenge for generating literal values by a joining different logical sources: join-on-literal challenge.
I believe it would be advantageous to come up with a solution that covers both generating literals from different
LogicalSource
s using joins, as generating function values from differentLogicalSource
s. As this solution would not be specific to functions, I think we should look for a solution in the definition ofLogicalSource
s. (pinging @thomas-delva)
That's the reason I insist that we should consider the big picture while defining the fundamental concepts i.e. function triples map and function term map! Check the alternative definitions with an example in overview.md
at the branch "function-alternative".
so your suggestion is to follow a similar approach as in the case of the rml:parentTriplesMap
and have functionTriplesMap
which can be optionally combined with a join? Then a FunctionTriplesMap
SHOULD have exactly 1 LogicalSource
and we define this either it is the same as the Logical Source
or not in the same way we do with the Referencing Object Map
?
@andimou yes, exactly!
I believe it would be advantageous to come up with a solution that covers both generating literals from different LogicalSources using joins, as generating function values from different LogicalSources. As this solution would not be specific to functions, I think we should look for a solution in the definition of LogicalSources.
When working on RML fields I had in mind you could do something like :sourceC rml:joinOf :sourceA, :sourceB .
and then source C would be a "virtual" logical source that has all the fields defined in sources A and B, and the data in source C would be a join of the data in sources A and B. Then source C could be used in a triples map to generate RDF from two joined sources in a very general way: generating IRIs or literals in a homogeneous way, mixing fields of both A and B to generate one RDF term, generating function values, etc. Looking back, this rml:joinOf
idea seems a bit too general and too far from current RML, so perhaps it can be simplified. Just throwing it out there. :)
In general I tend to agree FNML shouldn't need its own way to define joins. For the example in RMLFNOTC009 I wonder why one cannot just call grel:toUpperCase
in the subject map of a new triples map and then join as usual with rr:parentTriplesMap
. (This is a slight abuse as that subject map would generate literals, but as long as no invalid RDF triples are generated that should be fine imho.) (Disclaimer: I admit I'm not too up to date with the how and why of all FNML aspects.)
I have the feeling the discussion is revolving around 'functions should or should not specify their own logical source', to solve exactly this issue. I think we first need to solve that before we can solve functions properly. That's why I made following
I'm purposely not specifying the relation with existing RML and R2RML constructs, nor specifying exactly how to describe a function, instead, I'm making a proposal where we can have functions without defining their own logical sources, and still join values across data sources
TL:DR; functions are a special kind of term map / no logical source for functions / you specify input values for functions using term maps (so you can do nesting) / join conditions specify childterm and parentterm instead of child and parent (so you can put functions there) / referencingObjectMaps have a join result term to specify a new term based on values of the parent logical source instead of relying solely on the subject of the parent triples map
Using these definitions, we can:
A function description (red = FnO stuff, green = FNML stuff, feel free to ignore those colors for now):
graph LR
TM([TermMap])
FM([FunctionTermMap]):::fnml
TM -->|is-a| FM
FM -->|execution| Ex([Execution]):::fnml
FM -->|output| J(IRI):::fnml
Ex -->|function| ExOM([fno:Function TermMap]):::fno
Ex -->|parameterMap| ParamPOM([ParameterMap])
ParamPOM -->|parameter| ParamPM(parameter):::fno
ParamPOM -->|parameter value| ParamOM([parameter value TermMap])
classDef fnml fill:#8F9
classDef fno fill:#F89
classDef rml fill:#89F
classDef ls2 fill:#09F
A join description (dark blue === Parent Logical Source):
graph LR
T3M([TriplesMap])
T3M-->|predicatObjectMap| POM([PredicatObjectMap])
POM -->|predicateMap| PM([PredicateMap])
POM -->|objectMap| ROM([ReferencingObjectMap])
ROM -->|parentTriplesMap| PT3M([TriplesMap]):::ls2
ROM -->|joinCondition| JC([JoinCondition])
ROM -->|joinResultTerm| JTM([TermMap]):::ls2
JC -->|childTerm| ChTM([TermMap])
JC -->|parentTerm| PaTM([TermMap]):::ls2
classDef fnml fill:#8F9
classDef fno fill:#F89
classDef rml fill:#89F
classDef ls2 fill:#09F
A join across sources example (result is "{childsource_value}{parentsource_value}"
graph LR
T3M([TriplesMap])
T3M-->|predicatObjectMap| POM([rr:PredicatObjectMap])
POM -->|objectMap| FM
FM([FunctionTermMap])
FM -->|execution| Ex([Execution])
FM -->|output| J(grel:stringOut):::fno
Ex -->|function| ExFn(grel:array_join):::fno
Ex -->|parameterMap| ParamPOM([ParameterMap])
ParamPOM -->|parameter| P1(grel:array_value):::fno
ParamPOM -->|parameter value| O1("{childsource_value}"):::fno
ParamPOM -->|parameter| P2(grel:array_value):::fno
ParamPOM -->|parameter value| ROM([ReferencingObjectMap])
ROM -->|parentTriplesMap| PT3M([TriplesMap]):::ls2
ROM -->|joinCondition| JC([JoinCondition])
ROM -->|joinResultTerm| JTM("{parentsource_value}"):::ls2
JC -->|childTerm| ChTM([TermMap]):::ls2
JC -->|parentTerm| PaTM([TermMap]):::ls2
classDef fnml fill:#8F9
classDef fno fill:#F89
classDef rml fill:#89F
classDef ls2 fill:#09F
@samiscoding and @pmaria could you have a look at my proposal here? I have the feeling we need to fix this first before we can fix FNML :) (@dachafra putting you in the loop since you were gonna check FNML in any case ;) )
TL:DR; functions are a special kind of term map / no logical source for functions / you specify input values for functions using term maps (so you can do nesting) / join conditions specify childterm and parentterm instead of child and parent (so you can put functions there) / referencingObjectMaps have a join result term to specify a new term based on values of the parent logical source instead of relying solely on the subject of the parent triples map
Generally agree, although I don't see child and parent values as terms, rather as just "values".
Definitions
[...]
- A ReferencingMap is something that generates an RDF Term from a different Triples Map, called the Parent Triples Map, i.e., takes values from the Parent Triples Map's Logical Source, called the Parent Logical Source. It can use a JoinCondition and generates the Join Result Term.
Do I understand it correctly that this is a new construct that is the generalization of a referencing object map?
- A Join Condition specifies how to join these logical sources. It consists of a Child Term (generating a Term, taking values from the original Logical Source) and a Parent Term (generating a Term, taking values from the Parent Logical Source). By default (i.e., when no Join Condition is specified), a full join is performed.
A full join is not the current behavior when no join condition is specified for a referencing object map. What would be the use case for a full join?
Results
[...]
- if we nest functions, we can do something like join values across source
Could you give an example of what this would look like?
Generally speaking I would steer clear of joining within a function term map, because:
Pros of this approach:
Downsides of this approach:
In general my preference would still be to have a more general way to join sources, such that:
- if we nest functions, we can do something like join values across source
Could you give an example of what this would look like?
Is my final diagram a clarification? I can cook up some Turtle if you want :
Downsides of this approach:
* implementation, and I would say reasoning about the mapping, becomes complex because of conceptually different places to join. * must use functions to generate terms based on multiple sources
I agree with your preference that joins can/should probably be solved more generally, my point was more that this structure allows complex joins across functions and sources and whatever. If we can solve the joins somewhere else, we can always limit the spec that function terms cannot be referencing object maps. But I prefer having a generic structure that later is limited than a specific structure that is hard to expand later on
In general my preference would still be to have a more general way to join sources, such that:
* it is possible to generate terms based on multiples sources from templates or any other possible future expressions type * the join logic can be implemented in a single general way
:+1:
- if we nest functions, we can do something like join values across source
Could you give an example of what this would look like?
Is my final diagram a clarification? I can cook up some Turtle if you want :
yeah I think so. So basically you do something like
someFunction(value_TM1, value_TM2_via_join, ... , value_TMX_via_join)
yeah I think so. So basically you do something like
someFunction(value_TM1, value_TM2_via_join, ... , value_TMX_via_join)
exactly, way simpler represented than what I was trying 😅
It is an interesting perspective to look at the problem, however,
It is an interesting perspective to look at the problem, however,
1. Trying an example, I see that it leads to longer and more complex mapping rules compared to previous proposals. I'm a big fan of precision at the expense of complexity but if we can find a simpler solution that covers the definition of the same concepts we should consider it!
Fully agree that it becomes more (too?) complex, the argument I mostly wanted to make was "We can keep source definition out of the function construct to allow joining values across data sources". It's very complex without additional constructs, but (i) it is currently possible and (ii) we can think of a better construct separate from functions :)
2. If I understand it correctly in this case one doesn't need to use "Fields" as discussed before instead of logicalSources, right?
True
3. Based on this definition, there wouldn't be any concept of FunctionTriplesMap, right?
We can steer away from linking function definitions with the triplesmap definition, but that's not completely cleared out yet, see https://github.com/kg-construct/rml-fno-spec/pull/45#discussion_r816772074
4. I'm a bit confused by the concepts and syntaxes that you use from RML and R2RML. If we still want to reuse them then I see no reason to throw away previous proposals as we did during the Ghent meeting! Correct me if I'm wrong, wasn't the objection against our previous proposals in the meeting about not proposing it from scratch and reusing syntaxes? 😅
Huh, I had it completely the other way around, that it's confusing to reuse syntax and it would be better to make a clear distinction. Maybe we should clear that up with the community first.
I removed the FnO label, as we decided that joins and functions are 2 complementary things that shouldn't be convoluted
As it's a join issue, I'm going to move it to its corresponding repo
Agreed with Ben to make a test case and verify if this issue can be solved using logical views.
Came from kg-construct/rml-core#1
it is currently possible to join values across data sources, but without join conditions (see eg https://github.com/RMLio/rml-fno-test-cases/blob/master/RMLFNOTC0009-CSV/mapping.ttl, see also https://kg-construct.slack.com/archives/C01QFSW77QF/p1615717859003600)