w3c / rdf-star

RDF-star specification
https://w3c.github.io/rdf-star/
Other
119 stars 23 forks source link

hidden predicates can't hide from SPARQL #101

Closed pfps closed 3 years ago

pfps commented 3 years ago

Even if the reification predicates are given IRIs that cannot be written in RDF* or RDF documents, as in https://w3c.github.io/rdf-star/cg-spec/2021-02-18.html#mapping, the following SPARQL query can be used to retrieve such triples:

SELECT ?s ?p ?o WHERE { ?s ?p ?o }

This appears to affect all semantics for RDF* that use hidden predicates.

afs commented 3 years ago

I believe that it would be better to define SPARQL-star on RDF-star datasets.

https://www.w3.org/TR/sparql11-query/#BasicGraphPattern

has the way matching is BGP's is performed.

The extension entailment framework does need attention.

https://www.w3.org/TR/sparql11-query/#sparqlBGPExtend

pfps commented 3 years ago

Isn't SPARQL defined in term of simple entailment? It seems to me that that requires a full semantics for RDF*.

afs commented 3 years ago

(as discussed in the telecon today):

SPARQL matching: https://www.w3.org/TR/sparql11-query/#defn_PatternInstanceMapping

Just simple entailment results in infinite results - SPARQL needs to retain (e.g.) which blank node matched so the algebra (e.g. OPTIONAL) works.

Any entailment regimes matching has some specific requirements: https://www.w3.org/TR/sparql11-query/#sparqlBGPExtend

pfps commented 3 years ago

I spent some time trying to find out how to define querying in terms of entailment. It's not easy.

pfps commented 3 years ago

It turns out that the support for querying as simple entailment easily comes from the RDF 1.1 Semantics document. I was trying to show the result from first principles but this is not necessary.

In Appendix C there is the following proof:

Interpolation Lemma (from Section 5.3): G simply entails a graph E if and only if a subgraph of G is an instance of E.

Proof (from Appendix C): If a subgraph E' of G is an instance of E then G entails E' which entails E, so G entails E. Now suppose G entails E, and consider the Herbrand interpretation I of G defined as follows. IR contains the names and blank nodes which occur in the graph, with I(n)=n for each name n; n is in IP and <a, b> in IEXT(n) just when the triple is in the graph. (For IRIs which do not occur in the graph, assign them values in IR at random.) I satisfies every triple < s p o > in E; that is, for some mapping A from the blank nodes of E to the vocabulary of G, the triple < I+A I(p) I+A > occurs in G. But this is an instance of < s p o > under the instance mapping A; so an instance of E is a subgraph of G. QED.

SPARQL BGPs are in essence RDF graphs with two kinds of blank nodes - blank nodes and query variables. If we consider query variables to be blank nodes then an RDF graph G entails a SPARQL BGP E iff a subgraph of G is an instance of E. As G and E do not share any blank nodes or query variables, then the instance mapping M that makes E a subgraph of G must map all blank nodes (and query variables) in E to nodes in G. (Technically, G and E could share blank nodes. In this case it is not necessary for the shared blank nodes to be in the instance mapping, but such instance mappings can be just extended to map unmapped blank nodes to themselves.) Each such mapping is a demonstration that G simply entails E.

A SPARQL 1.1 BGP solution is just an instance mapping that maps all the blank nodes and variables in the BGP, E, such that the mapping of E is a subgraph of the active graph, G. A solution mapping is just an instance mapping without the mapping of the blank nodes in E. So G simply entails E iff there is a SPARQL 1.1 BGP solution mapping for E against G and each solution mapping is part of an instance mapping demonstrating that G simpy entails E.

TallTed commented 3 years ago

@pfps --

Github interprets the text of your entry as Markdown, so (for example) your comment above shows --

triple < I+A I(p) I+A > occurs

-- but what you meant was --

triple < [I+A](s) I(p) [I+A](o) > occurs

-- which reads very differently... You may or may not know, backtick wrappers are all you need for things like this, a la --

triple `< [I+A](s) I(p) [I+A](o) >` occurs

There's nothing else obviously wrong here, but it seems worth your review ... and keeping in mind for future.

pchampin commented 3 years ago

@pfps The thing is, I am not sure that the Interpolation Lemma still stands for RDF-star (with our current version of the semantic). Given any RDF-star graph G, G entails unstar(G), which is not a subgraph of G (it may contain more triples than G).

If I understand correctly what @afs was saying, it was not a design goal for SPARQL (with no entailment regime) to coincide with simple entailment. I personally made my peace with SPARQL-star (with no entailment regime) not coinciding with RDF-star simple entailment. We could of course, in addition, define an entailment regime for SPARQL-star that would support RDF-star simple entailment.

afs commented 3 years ago

The algebra requirement is that bnode identity carries over: SELECT * { ?s :p :o OPTIONAL { ?s :q ?v } } is a left join on ?s.

https://www.w3.org/TR/sparql11-entailment/#bnodes

There will be an entailment relationship between the pattern after mapping and the data - the issue is the choice of which blank nodes that are associated with variables.

pfps commented 3 years ago

If it wasn't a design goal then why does SPARQL question answering coincide with question answering in simple entailment?

pfps commented 3 years ago

I think what you want for the interpolation lemma is that it is true on RDF* graphs that don't include the hidden predicates.

What I want is the other way around. Don't hide the hidden predicates. The interpolation lemma is true on RDF graphs. Question answering is for RDF graphs is then defined on their RDF versions. Reporting back as RDF graphs is an extra step after basic question answering.

pchampin commented 3 years ago

I think what you want for the interpolation lemma is that it is true on RDF* graphs that don't include the hidden predicates.

By "RDF graphs that don't include the hidden predicates", I assume that you mean "RDF graphs as defined in the RDF-star abstract syntax".

If so, what I meant was indeed that the Interpolation Lemma, which holds for RDF graphs, does not extend to RDF-star graphs in general.

What I want is the other way around. Don't hide the hidden predicates.

I think that hiding the predicate is a separate question. Using public IRIs in the unstar mapping would yield the same problem: the Interpolation Lemma would still not hold for all RDF-star graphs.

The interpolation lemma is true on RDF graphs.

Yes of course, but some RDF-star graphs are not RDF graphs (even though they are all semantically equivalent to some RDF graph).

Question answering is for RDF* graphs is then defined on their RDF versions.

That's not the way SPARQL-star is defined today. Changing this would be a significant rework of the spec and (I believe) of existing implementations.

pchampin commented 3 years ago

Closing this issue, as the latest version of the semantics does not use hidden predicates anymore.