w3c / rdf-star

RDF-star specification
https://w3c.github.io/rdf-star/
Other
120 stars 23 forks source link

annotation syntax for SPARQL* #65

Closed hartig closed 3 years ago

hartig commented 3 years ago

This PR is meant to address the SPARQL* part of #9


Preview | Diff

hartig commented 3 years ago

I am done with extending the spec to cover the annotation syntax for SPARQL*.

Preview: https://pr-preview.s3.amazonaws.com/w3c/rdf-star/pull/65.html

The parts that I have extended are:

Please take a look.

/cc @pchampin @afs @gkellogg

gkellogg commented 3 years ago

Note that, in the existing grammar, EmbTP is really not strict enough:

[174] EmbTP               ::= '<<' EmbSubjectOrObject Verb EmbSubjectOrObject '>>'
[175] EmbSubjectOrObject  ::= Var | BlankNode | iri | RDFLiteral | NumericLiteral |
                              BooleanLiteral | EmbTP

An EmbSubjectOrObject includes literals, and can't exist as the subject of any triple. I think previously, we had VarOrBlankNodeOrIriOrEmbTP and VarOrTermOrEmbTP for subject and object, which have appropriate restrictions.

[107s] VarOrBlankNodeOrIriOrEmbTP ::= Var | BlankNode| iri | EmbTP
[176] VarOrTermOrEmbTP            ::= Var | GraphTerm | EmbTP
afs commented 3 years ago

SPARQL allows literals as subjects. They just never match.

They arise naturally - most clearly, with reverse paths.

A triple pattern is "(RDF-T ∪ V) x (I ∪ V) x (RDF-T ∪ V)"

https://www.w3.org/TR/sparql11-query/#sparqlTriplePatterns

VarOrTerm seems the place to add them because GraphTerm is without variables.

gkellogg commented 3 years ago

Thanks, @afs, if I was aware of that, I've since forgotten.

gkellogg commented 3 years ago

I think we need a change to ObjectListPath as well:

[86]  ObjectListPath ::= ObjectPath AnnotationPattern? ( ',' ObjectPath AnnotationPattern? )*

When I try the following example, that branch is hit in my parser, at least:

PREFIX : <http://bigdata.com/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex:  <http://example.org/>

SELECT ?age ?c WHERE {
   ?bob foaf:name "Bob" {| ex:certainty ?c |}.
}
hartig commented 3 years ago

@gkellogg what do you mean by "hit in my parser"? What's wrong with the query? (I don't think anything.)

gkellogg commented 3 years ago

What I meant was, that when I parsed that example the parser took the path including ObjectListPath instead of ObjectList. There are parallel paths through the grammar with and without Path. In this case, it seems to be because the path through the parser is the following:

Query
  SelectQuery
    WhereClause
      GroupGraphPattern
        GroupGraphPatternSub
          TriplesBlock
            TriplesSameSubjectPath
              PropertyListPathNotEmpty
                ObjectListPath

I believe the ObjectList production is used in CONSTRUCT, and ObjectListPath in WHERE.

hartig commented 3 years ago

Thanks Greg! In fact, the PropertyListPathNotEmpty production in the original SPARQL 1.1 grammar uses both, ObjectList and ObjectListPath.

[83]  PropertyListPathNotEmpty  ::=  ( VerbPath | VerbSimple ) ObjectListPath ( ';' ( ( VerbPath | VerbSimple ) ObjectList )? )*

Hence, in addition to extending the ObjectList production (as done in my PR so far), we also need to extend the production ObjectListPath as follows:

[86]  ObjectListPath  ::=  ObjectPath AnnotationPattern? ( ',' ObjectPath AnnotationPattern? )*

In this context, I have also discovered another error in the current SPARQL* grammar: the production ObjectPath has to be extended as well!

[87]  ObjectPath  ::=  GraphNodePath | EmbTP

I will add these two extensions to the grammar to this PR.

gkellogg commented 3 years ago

In this context, I have also discovered another error in the current SPARQL* grammar: the production ObjectPath has to be extended as well!

[87]  ObjectPath  ::=  GraphNodePath | EmbTP

Actually, this isn't required and causes a First/First conflict in my parser generator: GraphNodePath is defined as the following:

[105] GraphNodePath           ::= VarOrTermOrEmbTP | TriplesNodePath |

(Note extra | at the end, which is an error). So, VarOrTermOrEmbTP already covers the EmbTP case.

hartig commented 3 years ago

Sorry Greg. My bad.

I have fixed the issue in the grammar now (see commit https://github.com/w3c/rdf-star/pull/65/commits/ecac9c4c6869ac0a7ab3c8ede65d64de2a29351d).

afs commented 3 years ago

Annotations and paths:

ObjectList (used in template for CONSTRUCT and in SPARQL Update) is fine.

(aside: Unlike Turtle, it is possible to add to Object and ObjectPath because Collection uses GraphNode, not Object but for the moment, let's stick to ObjectList*)

For ObjectListPath some forms can not be a syntax rewrite to <<>> and would need a change to evaluation - can only do the {| |} after you know the triple in the path.

  :s :p* :o {| :pp :oo |}
  :o ^:p :s {| :pp :oo |}
  :o !:p :s {| :pp :oo |}
  :s :p/:q :o {| :pp :oo |}
  :s (:p|:q) :o {| :pp :oo |}

The grammar is quite dependent on Path being recursive and including a single term as a path element.

One option is a text note saying "If annotation, must be simple path" or slightly more ambitiously, include trailing / case.

hartig commented 3 years ago

Andy, you are right. I did not consider property path patterns. That's a problem.

Now, that you point out this problem, I would even say that it is a bad idea in general to mix property path patterns and the annotation syntax. The idea of property path patterns is to match paths (including their respective endpoints). RDF* is not about annotations of such paths but about annotations of single triples. In this sense, combining the annotation syntax with property path patterns does not seem to make much sense at all.

So, the question is whether there is an easy way to modify and extend the grammar such that the resulting grammar forbids combining property path patterns with the annotation syntax? If not, we may have to add an explicit note in the text.

afs commented 3 years ago

A lookahead on paths to distinguish property and path cases may be possible. Investigation required. SPARQL is designed to be parser-simple - it's plain LL(1) (and LALR(1)) so that the widest range of compiler tools can be easily used.

I'm keen to make the changes localised to keep the barrier to adoption low.

There is another implication with

:s :p :o {| :pp :oo |}

The embedded triple term is not available in a variable. Probably have to live with that; some things will require << >> usage.

hartig commented 3 years ago

SPARQL is designed to be parser-simple [...] I'm keen to make the changes localised to keep the barrier to adoption low.

Yes, that's what I actually meant by "an easy way."

There is another implication with :s :p :o {| :pp :oo |} The embedded triple term is not available in a variable. Probably have to live with that; some things will require << >> usage.

Right. In fact, for this purpose, just using << ... >> instead of the annotation syntax is not sufficient either. You would have to use the SPARQL* version of BIND instead. For instance, by assuming the original PG-mode-based evaluation semantics of BIND (as defined in my original paper), the corresponding query would be:

SELECT ?t WHERE {
   :s :p :o .
   BIND( <<:s :p :o>> AS ?t )
   ?t :pp :oo .
}

...and by assuming the evaluation semantics of BIND as defined in our spec now, the query would be:

SELECT ?t WHERE {
   BIND( <<:s :p :o>> AS ?t )
   ?t :pp :oo .
}
pchampin commented 3 years ago

This was discussed during today's call: https://w3c.github.io/rdf-star/Minutes/2020-12-18.html#item02

hartig commented 3 years ago

I have tried to find a simple solution to extend the grammar in a way such that it permits the annotation syntax only in triple patterns but not in property path patterns, where "simple" means something that does not require either changing major parts of the existing grammar or parsers that can look ahead more steps than what is needed with the existing grammar. After looking again at the existing grammar in detail, I don't think that such a solution exists :-(

Therefore, my proposal is to keep the grammar extension as specific in this PR and add a note that specifies the restriction in text form (similar to the notes in Section 19.8 of the SPARQL 1.1 spec).

afs commented 3 years ago

https://github.com/apache/jena/blob/main/jena-arq/Grammar/main.jj has the annotation extension added for object and objectpath and, yes, it uses a grammar note to limit the use to for paths to simple links.

The alternatives look complicated: either additional lookahead of the path production (I haven't checked that works because assuming it impacts which parser generators can be use) or split path into compound and simple cases which becomes a wide spread change in the grammar.

(I may even be able to produce a complete grammar if the toolchain for producing HTML still works after all this time).

gkellogg commented 3 years ago

@hartig if you rebase the PR branch on main (might not be pretty), the preview stuff should work again. You'll need to rebase in any case to resolve the conflicts.

hartig commented 3 years ago

Gregg, is this rebasing something that can be done automatically or do I have to do it manually?

gkellogg commented 3 years ago

I’m afraid rebasing is manual. However, you should be able to just merge main into your branch, which may be less clean, but will get the job done.

rebasing is one of the most difficulty and unintuitive parts of Git, IMO.

hartig commented 3 years ago

Thanks. I have never done such a rebasing before. Hence, my question.

Perhaps, in this case, it will be easier and less time consuming if I simply create a new branch from main, copy the changes over, and generate a new PR (as I had done with the SPARQL-star Update PR).

hartig commented 3 years ago

I have copied these changes into a new PR that I have created from the current main branch. See #106

I am closing this PR here.