Closed pmaria closed 1 year ago
Currently, it is in inside rml:Source
, those in rml:LogicalSource
are a typo.
this would allow to reuse the source description for multiple queries
True, this is not possible right now.
this would also be more in line with the behavior of rml:iterator, in that the evaluation of an rml:iterator produces a list of Records, and the evaluation of rml:query also produces a list of Records (in the case of relational databases: rows). Having these on the same resource type would simplify implementations.
That was before in R2RML, in RML it is not fixed that a query produces rows. It can also be a SPARQL query which can have results in JSON, XML, CSV, TSV. So in that case, you can have an rml:query
in rml:Source
with an interator and referenceformulation inrml:LogicalSource
. Moreover, some SQL RDBs support also outputting their results in different formats like XML. Because of this, I moved it as an 'access' thing because query results is not an iterable thing anymore except when a reference formulation and iterator is provided if the results are not tabular records.
However, I'm open to change it, if we can make it work in all cases besides relational databases.
Having these on the same resource type would simplify implementations.
I cannot follow here, where the query is should not matter for implementations? In the end, everything is just a language which can be translated in other languages. For example: an implementation could understand RML, YARRRML, and SPARQL-Generate which it all maps it on its internal implementation to execute the instructions.
Currently, it is in inside
rml:Source
, those inrml:LogicalSource
are a typo.
Ah ok.
this would allow to reuse the source description for multiple queries
True, this is not possible right now.
this would also be more in line with the behavior of rml:iterator, in that the evaluation of an rml:iterator produces a list of Records, and the evaluation of rml:query also produces a list of Records (in the case of relational databases: rows). Having these on the same resource type would simplify implementations.
That was before in R2RML, in RML it is not fixed that a query produces rows. It can also be a SPARQL query which can have results in JSON, XML, CSV, TSV. So in that case, you can have an
rml:query
inrml:Source
with an interator and referenceformulation inrml:LogicalSource
. Moreover, some SQL RDBs support also outputting their results in different formats like XML. Because of this, I moved it as an 'access' thing because query results is not an iterable thing anymore except when a reference formulation and iterator is provided if the results are not tabular records.
OK interesting. But then your reference formulation would have be one to reference one of those resulting formats, not the sql formulation, right? So how does that work? What would such a mapping look like?
In any case I think we need to describe these use cases.
However, I'm open to change it, if we can make it work in all cases besides relational databases.
Having these on the same resource type would simplify implementations.
I cannot follow here, where the query is should not matter for implementations? In the end, everything is just a language which can be translated in other languages. For example: an implementation could understand RML, YARRRML, and SPARQL-Generate which it all maps it on its internal implementation to execute the instructions.
Well my point is: there should be a single point in the language which produces record (the items on which the references are evaluates against). I think this should be the rml:LogicalSource
in this case, and not the rml:Source
.
This would keep it simple for implementations as well, in the sense that you can expect the logical source to describe how records are generated from a source. And the source to just describe the static aspects of the source.
So then the question is: what does an rml:query
produce? Does it produce records, or is it indeed always part of an rml:Source
from which you create new records using a reference formulation and an (implicit) iterator?
IMO an rml:iterator
is not essentially different from a rml:query
. Both are essentially expressions in some reference formulation that result in a list of items. So to me it feels like something that should be at the same level in the language.
OK interesting. But then your reference formulation would have be one to reference one of those resulting formats, not the sql formulation, right? So how does that work? What would such a mapping look like?
Yes, the reference formulation must be able to iterate over the results. For example: SPARQL JSON results will have a JSONPath reference formulation and JSONPath iterator. Describing these cases could benefit the spec indeed, at least adding some of examples of SQL vs SPARQL.
Mapping:
<#SDSourceAccess> a rml:Source, sd:Service;
sd:endpoint <http://example.com/sparql/>;
sd:supportedLanguage sd:SPARQL11Query;
sd:resultFormat formats:SPARQL_Results_CSV;
rml:query """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?age WHERE {
?person foaf:name ?name .
?person foaf:age ?age .
}
""";
<#TriplesMap> a rml:TriplesMap;
rml:logicalSource [ a rml:LogicalSource;
rml:source <#SDSourceAccess>;
rml:referenceFormulation ql:JSONPath;
rml:iterator "$.results.bindings[*]"
];
rml:subjectMap [ a rml:SubjectMap;
rml:template "http://example.org/{id.value}";
];
rml:predicateObjectMap [ a rml:PredicateObjectMap;
rml:predicateMap [ a rml:PredicateMap;
rml:constant foaf:name;
];
rml:objectMap [ a rml:ObjectMap;
rml:reference "name.value";
];
];
Well my point is: there should be a single point in the language which produces record (the items on which the references are evaluates against). I think this should be the rml:LogicalSource in this case, and not the rml:Source.
Agreed! That's why I moved it, for me the iterator is the one that produces records. The query is only a way to select a part of the source, but doesn't generate records on its own, it gives a result set over which an iteration must be applied over. In R2RML it was assumed that iterating over the results is done on a row-basis which we cannot do for other query languages. Let's say GraphQL, NoSQL-like, SPARQL, etc.
So then the question is: what does an rml:query produce? Does it produce records, or is it indeed always part of an rml:Source from which you create new records using a reference formulation and an (implicit) iterator?
RML query produces a result set, how that result set looks like is kinda depending on the source, hence access. The iterator and reference formulation iterate over this result set for the engine so you get the necessary records.
OK, thanks for the clarifications.
That leaves me with the following concerns.
<#SDSourceAccess> a rml:Source, sd:Service; sd:endpoint <http://example.com/sparql/>; sd:supportedLanguage sd:SPARQL11Query; sd:resultFormat formats:SPARQL_Results_CSV; rml:query """ PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?age WHERE { ?person foaf:name ?name . ?person foaf:age ?age . } """; <#TriplesMap> a rml:TriplesMap; rml:logicalSource [ a rml:LogicalSource; rml:source <#SDSourceAccess>; rml:referenceFormulation ql:JSONPath; rml:iterator "$.results.bindings[*]" ]; rml:subjectMap [ a rml:SubjectMap; rml:template "http://example.org/{id.value}"; ]; rml:predicateObjectMap [ a rml:PredicateObjectMap; rml:predicateMap [ a rml:PredicateMap; rml:constant foaf:name; ]; rml:objectMap [ a rml:ObjectMap; rml:reference "name.value"; ]; ];
Next to this, it is not clear to me how the above example would be expressed for relational databases, since rml:referenceFormulation
is a property of rml:LogicalSource
. The above example uses sd:supportedLanguage sd:SPARQL11Query
. What do you use for other source types?
@pmaria Maybe we should consider indeed a query as an iterator and use referenceFormulation something like rml:SQL2008
which indicate: (i) query as iterator is following SQL2008 and (ii) implies also how to refer to columns in the query. For SPARQL same: query is put as iterator and then referenceformulation says: formats:SPARQL_CSV_Results
which then also says how to refer to the SPARQL results.
Your argument @pmaria makes sense, and I'm getting convinced of this actually.
I know @andimou has also an opinion on this, I will wait for her as well before changing things.
@pmaria Maybe we should consider indeed a query as an iterator and use referenceFormulation something like
rml:SQL2008
which indicate: (i) query as iterator is following SQL2008 and (ii) implies also how to refer to columns in the query. For SPARQL same: query is put as iterator and then referenceformulation says:formats:SPARQL_CSV_Results
which then also says how to refer to the SPARQL results. Your argument @pmaria makes sense, and I'm getting convinced of this actually.I know @andimou has also an opinion on this, I will wait for her as well before changing things.
@pmaria @andimou Do we have an agreement here?
Basically, we would allow then to put the query in rml:iterator
and set rml:referenceFormulation
to rml:SQL2008
, formats:SPARQL_CSV_Results
, etc. ?
rml:query
would then be dropped.
If so, I can adjust the spec, testcases, etc.
+1 from me
Discussed during W3C CG meeting:
rr:tableName
as: rml:referenceFormulation rml:SQL2008Table; rml:iterator "myTable"
rr:sqlQuery
as rml:referenceFormulation rml:SQL2008Query; rml:iterator "SELECT column from myTable"
Unrelated: typo in testcase 6f
I see several examples where
rml:query
is used as a property ofrml:Source
.And other examples where
rml:query
is used as a property ofrml:LogicalSource
.My preference would be to have
rml:query
be a property ofrml:LogicalSource
. Because:rml:iterator
, in that the evaluation of anrml:iterator
produces a list of Records, and the evaluation ofrml:query
also produces a list of Records (in the case of relational databases: rows). Having these on the same resource type would simplify implementations.