Closed DylanVanAssche closed 2 years ago
@DylanVanAssche is more this a challenge or a "best-practice" than a pure problem with the RML spec? Shall we transfer the issue?
@dachafra For me, it is a spec thing because it is related to the rml:iterator
. Maybe a Literal is insufficient here?
@DylanVanAssche So... seen as well the proposal from CARML, it is more related to the Logical Source, right? Do we transfer it to that spec?
True! Fine for transferring it!
@pmaria I like the CARML approach for this issue:
rml:logicalSource [
rml:source [
a carml:Stream ;
# or in case of a file source use:
# carml:url "path-to-source" ;
carml:declaresNamespace [
carml:namespacePrefix "ex" ;
carml:namespaceName "http://www.example.com/books/1.0/" ;
] ;
] ;
rml:referenceFormulation ql:XPath ;
rml:iterator "/ex:bookstore/*" ;
] ;
What do you think of using this?
rml:logicalSource [
rml:source [
# Any kind of source
] ;
rml:iterator [ a ql:XPathIterator, rml:Iterator;
rml:namespaceName "http://www.example.com/books/1.0/" ;
rml:namespacePrefix "ex" ;
rml:value "/ex:bookstore/*";
];
]
Changes:
rml:referenceFormulation
Hmm I'm not sure the iterator is the most natural place to define the namespaces. Since you also want to be able to use these namespaces in non-iterator expressions.
@pmaria
Hmm I'm not sure the iterator is the most natural place to define the namespaces. Since you also want to be able to use these namespaces in non-iterator expressions.
When you use rml:reference
, rr:column
, rr:template
, etc. you take the rml:iterator
value, append the value of one of these references to retrieve what you need in a Triples Map.
That's why I found it a better fit there because if it specify for the reference formulation & iterator.
rml:source
is only for defining how a source should be accessed such as location. Because of that, I would keep the namespace declaration away from that since those namespaces are only used for executing the iterator & references during the data processing after the data was retrieved from the source.
When you use rml:reference, rr:column, rr:template, etc. you take the rml:iterator value, append the value of one of these references to retrieve what you need in a Triples Map.
Ah I don't see it that way necessarily. I see the rml:iterator
, rml:reference
, rr:template
conceptually operating within the same scope/context. Wherein indeed, the iterator creates an iteration of sub documents on which the other expressions are evaluated. But I see the iterator as just another expression.
But I agree that source might not be the best place for the NS definition, because it is essentially a query concern, and the namespaces don't need to match the namespaces used in a source document.
maybe it makes more sense then to add a new object to the logical source, next to the iterator? Similar to your idea, but keeping iterator as is, i.e. as just another expression.
Something like rml:ExpressionContext
.
rml:logicalSource [
rml:source [
# Any kind of source
] ;
rml:iterator "/ex:bookstore/*" ;
rml:expressionContext [ a XPathExpressionContext;
rml:namespace [
rml:namespaceName "http://www.example.com/books/1.0/" ;
rml:namespacePrefix "ex" ;
];
]
rml:referenceFormulation ql:XPath;
]
We could possibly combine it with the reference formulation? The rationale would be that this defines how to interpret the expressions that are based on a logical source.
So combining it with reference formulations could look like
rml:logicalSource [
rml:source [
# Any kind of source
] ;
rml:iterator "/ex:bookstore/*" ;
rml:referenceFormulation [ a ql:XPathReferenceFomulation;
ql:namespace [
ql:namespaceName "http://www.example.com/books/1.0/" ;
ql:namespacePrefix "ex" ;
] ;
] ;
]
This would be a custom specified XPath reference formulation, next to the "default" ql:XPath
.
Ah I don't see it that way necessarily. I see the rml:iterator, rml:reference, rr:template conceptually operating within the same scope/context. Wherein indeed, the iterator creates an iteration of sub documents on which the other expressions are evaluated. But I see the iterator as just another expression.
Ah depends on how you implement the spec :) Some implementations do not create subdocuments. However, I agree with you :)
But I agree that source might not be the best place for the NS definition, because it is essentially a query concern, and the namespaces don't need to match the namespaces used in a source document.
Yes! I try to separate the concerns as much as possible so it also re-usable in the future.
rml:referenceFormulation
definition:
The reference formulation (rml:referenceFormulation) defines the reference formulation used to refer to the elements of the data source. The reference formulation must be specified in the case of databases and XML and JSON data sources. By default SQL2008 for databases, as SQL2008 is the default for R2RML, XPath for XML and JSONPath for JSON data sources.
According to the definition, the last suggestion looks better to me. Are we aware of something similar for other reference formulations?
This would be a custom specified XPath reference formulation, next to the "default" ql:XPath.
Ideally, we don't even need that and have 1 IRI for both (with and without namespaces), but I'm not sure how to achieve that in RDF? Properties can be optional, but if you have none, it become something weird like this:
rml:referenceFormulation [ a ql:XPathReferenceFomulation; ] ;
We could 'solve' this by having shortcuts:
rml:referenceFormulation ql:XPath;
This shortcut points to [ a ql:XPathReferenceFomulation; ]
.
I think this is what you meant above with the "default"?
We could 'solve' this by having shortcuts:
rml:referenceFormulation ql:XPath;
Yes. I see that rml:ReferenceFormulation
is already defined in the RML ontology.
rml:referenceFormulation rdfs:range rml:ReferenceFormulation .
rml:ReferenceFormulation rdf:type owl:Class ;
rdfs:label "Reference Formulation" ;
rdfs:comment "Represents a Reference Formulation."@en .
And also defined is
ql:XPath rdf:type owl:NamedIndividual, rml:ReferenceFormulation ;
rdfs:label "XPath" ;
rdfs:comment "Denotes the XPath reference formulation, used for referring to extracts of XML sources."@en ;
ql:specification <http://www.w3.org/TR/xpath20/> ;
rml:version "2.0".
So essentially the "shortcut" is just using the named individual.
Now all we would have to do is introduce a subclass of rml:ReferenceFormulation
, rml:XPathReferenceFormulation
, and define that further, adding namespace properties.
I don't think we should introduce a new named individual for XPath with namespaces. This would limit the namespaces you could define, since the individual's scope would be global. And you might want to define different namespaces per logical source.
@pmaria Alright! I agree, let's setup our battle plan then for this issue:
rml:XPathReferenceFormulation
ql:namespaceName
and ql:namespacePrefix
in thereProblem solved then?
Yes I think so 🎉
Not forgetting ql:namespace
to spec one or more ql:Namespace
s
Why put namespace URIs in literals rather than using resources?
@chrdebru
Why put namespace URIs in literals rather than using resources?
Spec: https://www.w3.org/TR/xml-names/
[URI references identifying namespaces are compared when determining whether a name belongs to a given namespace, and whether two names belong to the same namespace. []()[Definition](): The two URIs are treated as strings, and they are identical if and only if the strings are identical, that is, if they are the same sequence of characters. ] The comparison is case-sensitive, and no %-escaping is done or undone.
AFAIK, XML Namespaces are not like Linked Data and are compared through a string-based comparison without any resolving. That's why they are a Literal here, but any insights are welcome!
Yes, but they can also be regarded as named resources that can be described (no matter whether they dereference and resolve). Having those as resources would facilitate writing SPARQL queries and inverse property paths, for instance. Just a thought, not questioning the proposal.
I would suggest renaming ql:namespaceName
to namespaceIRI. Some namespaces have titles and a namespace contains names. Turtle mentions this: "The '@prefix' or 'PREFIX' directive associates a prefix label with an IRI".
Yes, but they can also be regarded as named resources that can be described (no matter whether they dereference and resolve). Having those as resources would facilitate writing SPARQL queries and inverse property paths, for instance. Just a thought, not questioning the proposal.
I don't have much experience with that regard, so if it helps, I don't mind :) For me, it doesn't really matter as long we have a mapping prefix <-> IRI
I would suggest renaming ql:namespaceName to namespaceIRI. Some namespaces have titles and a namespace contains names. Turtle mentions this: "The '@Prefix' or 'PREFIX' directive associates a prefix label with an IRI".
Hmmm true, twice 'name' might be a bit weird :) @pmaria Do you agree on this?
Namespace name is what the spec calls it https://www.w3.org/TR/xml-names/#dt-NSName, so I would stick to that.
As far as I can tell we can't simply use IRIs, because XML expects URIs.
The main use case is to register the namespaces with an XPath engine for querying. Most implementations I've seen represent the namespace name as a string.
My feeling is that keeping it a string would be the more natural mapping to implementations, but if the arguments for using an IRI are strong I can live with that. We would however have to specify what happens when an IRI that is not a URI is used..
I don't disagree with @chrdebru but we can as well keep it ql:namespace
, whether IRI/URI or Literal can be determined based on the range, we don't need to include it in the name of the property.
Then again, if we include the restrictions in SHACL shapes, then we can decide on shape level iff it's string or IRI. There we can even provide 2 alternatives with 2 different explanations.
Another thought, I debate myself. Newer libraries might read the namespaces from the file, would we still want to give the option to define the namespaces?
Another thought, I debate myself. Newer libraries might read the namespaces from the file, would we still want to give the option to define the namespaces?
In my experience this is not that trivial, especially in non-DOM based approaches, e.g. a streaming implementation. Namespaces can be defined inline in a document, so in theory a new namespace can be declared and used at the end of a document.
I have a strong preference to be able to declare this in the mapping. Tools can always also by default provide namespace detection as a service if it fits their architecture.
I agree with @pmaria, extracting the XML namespaces is non trivial and may require consuming all XML first before any mapping can take place.
Then again, if we include the restrictions in SHACL shapes, then we can decide on shape level iff it's string or IRI. There we can even provide 2 alternatives with 2 different explanations.
SHACL can have an OR statement, but maybe to keep things straightforward we should have either a string or IRI, but not both?
@pmaria if they call them namespace names, then OK!
@DylanVanAssche XML namespaces are declared in attributes (strings) in XML. So maybe that definition comes from their technical constraints. The advantage of IRIs is that "sameness" is implied when reused, whereas now you have to explicitly state that two namespace objects (if you can call them like that) as the same, or you infer them by comparing strings. So IRIs may help us in cases where we have different prefixes for the same namespace (e.g., combining mappings).
@chrdebru I don't have a specific preference, except that I prefer either strings or IRIs, just not both ;)
XPath allows to use XML namespaces when selecting parts of an XML document. However, (most) implementations require to register these namespaces before doing an XPath query. RML does not specify how does this should happen currently:
CARML has an extension for this: https://github.com/carml/carml#xml-namespace-extension and it came up in the past already a few times without a clear solution: