w3c / rdf-turtle

https://w3c.github.io/rdf-turtle/
Other
6 stars 4 forks source link

Grammar updates for triple terms and occurrences. #51

Closed gkellogg closed 2 months ago

gkellogg commented 9 months ago

Preview | Diff

gkellogg commented 9 months ago

My interpretation (BNF only) of @afs proposed changes for triple terms and triple occurrences. No change to parser rules, thus far. Raw BNF in Files view, rendered via GitHack.

gkellogg commented 7 months ago

The nomenclature and wording in the Quoted Triples section will still require quite a bit of revision. Conceptually, we need to know how to talk about triple descriptors in relation to other triples in a graph, and how quoted triples/triple tokens/triple occurrences related to triple descriptors and what the mean. Most of this needs to go in Concepts, but needs to be echoed in Turtle and other concrete syntaxes. Also, we may discourage the direct use of triple descriptors favoring annotations and quoted triples/whatever.

The main point of this draft, so far, is to get the grammar and basic usage consistent with discussions.

TallTed commented 2 months ago

I think Allow zero or many annotations should be Allow zero or more annotations, as I don't think there's any forbiddance of one annotation, which would seem to be excluded by "many". Possibly Allow any number of annotations though that might allow negative numbers, so maybe Allow any non-negative number of annotations....

domel commented 2 months ago

@gkellogg Could you elaborate on that. Maybe I missed something but why => (two characters) is better than | (one character)?

gkellogg commented 2 months ago

I think Allow zero or many annotations should be Allow zero or more annotations, as I don't think there's any forbiddance of one annotation, which would seem to be excluded by "many". Possibly Allow any number of annotations though that might allow negative numbers, so maybe Allow any non-negative number of annotations....

Can you point out where that text is in the document? I don't see it anywhere. That said, other places in this and other documents use "zero or more", so I'd be fine with that.

gkellogg commented 2 months ago

@gkellogg Could you elaborate on that. Maybe I missed something but why => (two characters) is better than | (one character)?

@niklasl pointed out that using | creates a problem in SPARQL, where | is a property path component, so (in SPARQL) something like {| a | p o |} could either be a reifier a on the predicate/object p o, or a property path a | p so changing that separating | is something to consider. I've seen => in some more recent notes passed around the group. It's hard to think of another single character that would seem to fit.

Whatever we do for annotation will also be necessary for reifier to be consistent.

niklasl commented 2 months ago

See also https://github.com/w3c/rdf-star-wg/issues/116. One important aspect (IMO) is that a prefix or wrapping notation is valuable for reading these, to avoid reading the name (which may be a long hash, as in Wikidata qualifiers) as a predicate (in annotations) or the subject (in the << ... >> form (triple ... descriptors?)).

We also need to be careful with whatever is added so it doesn't block any other future designs that we can foresee, or steps on other syntaxes unnecessarily. In Notation 3 => is a shorthand for log:implies.

I've done some more evaluation (all the examples from the UCR plus a gamut of Wikidata data), and actually found that wrapping the name in |...| which @afs suggested (among some other alternatives) seems, with proper spacing, to be a fairly OK alternative even in annotations. I'm not sure how many other reasonable wrapping delimiters we have available.

(That's the Ruby/Rust lambda-style; also figuring in some musical and mathematical notations (|abs|), etc. I put examples in a gist.)

I've also suggested some more radical changes; but admittedly some of those caters less for what is likely the more common case (one- or many-to-one; names added for reference; annotation data still better to keep with the value). The naming-only form (with any SPARQL-compatible syntax) would still work well if you need to "tag" a bunch of triples with the name of a many-to-many reifier.

domel commented 2 months ago

@niklasl totally agree, => is used in N3. IMHO it's a bad choice.

domel commented 2 months ago

How about ~?

gkellogg commented 2 months ago

I've done some more evaluation (all the examples from the UCR plus a gamut of Wikidata data), and actually found that wrapping the name in |...| which @afs suggested (among some other alternatives) seems, with proper spacing, to be a fairly OK alternative even in annotations. I'm not sure how many other reasonable wrapping delimiters we have available.

That's pretty much what this version of the grammar does if we allow whitespace in the {| and |} tokens:

annotation            ::= '{' WS* '|' ((iri | BlankNode) '|')? predicateObjectList '|' WS* '}'

To make it a token, we'll need to define some terminals:

reifid               ::= ((iri | BlankNode) '|')
annotation           ::= ANNO_START reifid? predicateObjectList ANNO_END
ANNO_START           ::= '{' WS* '|'
ANNO_END             ::= '|' WS* '}'

I also have a version which defines a reifid rule ((iri | BlankNode) '|'), although I think we'll need to tweak the production names.

niklasl commented 2 months ago

That's pretty much what this version of the grammar does if we allow whitespace in the {| and |} tokens:

annotation            ::= '{' WS* '|' ((iri | BlankNode) '|')? predicateObjectList '|' WS* '}'

I think that's still ambiguous though, unless using a preceding whitespace is to be significant? Otherwise, this: { |:x| :y :z |} still has the problem of matching as an AlternativePath in SPARQL.

I've tried the suggestion to wrap the name, like:

annotation  ::= "{|" embeddedName? predicateObjectList? "|}"
embeddedName    ::= '|' (iri | BlankNode) '|'

which parses the examples I linked above. A simplified sample:

<Q34851> :nominatedFor <Q103618> {| |<Q103618#6698506f>| a :Nomination ;
        :forWork <Q582281> ;
        :date "1958-02-18"^^xsd:date |} ,
    <Q103618> {| |<Q103618#6698cb58>| a :Nomination ;
        :forWork <Q713979> ;
        :date "1959-02-23"^^xsd:date |} .

This also helps spotting the identifier in named "quoted" triples (using more real wikidata to illustrate the problem):

<< |s:Q34851-05722875-6765-4486-9197-729D8AB780ED| <Q34851> wd:P4342 "Elizabeth_Taylor_-_filmskuespiller" >>
    wd:P2241 <Q45403344> ;
    :rank :Deprecated .

Since otherwise you'd have to scan (as a human reader) beyond the id to know if it's the subject of a triple, or a name followed by a triple. (I've mixed up name and subject in these even when editing my own "toy" examples.)

afs commented 2 months ago

Otherwise, this: { |:x| :y :z |} still has the problem of matching as an AlternativePath in SPARQL.

@niklasl - could you expand on that point please? In SPARQL 1.1 :s | :p :o ... is illegal. | is contextual - only in paths.

Isn't it, going left-to-right, { |:x| :y :z |} is {|, URI/prefixname/bnode, | and then the choice is made?

The SPARQL grammar target is LL(1) and LALR(1) which covers the mostly available choices for many programming languages.

afs commented 2 months ago

Regardless of the technical issue, | might be considered visually confusing in real queries (with longer prefixes names) and addressed with a two character token. However, we are someway down the "agreed syntax " path - we can't keep chopping and changing.

Let's gather the possibilities and then choose from that list.

Please do not allow whitespace inside tokens, especially delimiting tokens {| and |}, without a very strong need. It is much more restrictive to future enhancements like graph terms.

niklasl commented 2 months ago

Otherwise, this: { |:x| :y :z |} still has the problem of matching as an AlternativePath in SPARQL.

@niklasl - could you expand on that point please? In SPARQL 1.1 :s | :p :o ... is illegal. | is contextual - only in paths.

This is the problem in https://github.com/w3c/rdf-star-wg/issues/116 but with different spacing. So:

SELECT * { ?s ?p ?o {| dct:issued | dct:modified "2023" |} . }

just becomes:

SELECT * { ?s ?p ?o { |dct:issued| dct:modified "2023" |} . }

and the problem remains (unless the spacing itself is significant, which would be very hard to spot; and as you note it also conflicts with possible graph literals).

With this:

SELECT * { ?s ?p ?o {| |ex:annotation1| dct:issued | dct:modified "2023" |} . }

at least it works.

afs commented 2 months ago

This is the problem in w3c/rdf-star-wg#116 but with different spacing.

An argument for not allowing whitespace in tokens.

However, it is not ambiguous event with whitespace because tokenizing happens as a greedy process and before grammar rules so { |dct:issued| dct:modified "2023" |} is still ANNO_START and won't be VBAR.

As soon as the parse is inside the {| there are two cases - implicit and explicit reifier. That's a rule of one token lookahead. The |...| bracketing isn't necessary and also it is not clear bracketing.

For me, the visual case is important and multiple bracketing meanings of | isn't ideal.

IF we can agree a different reference specifier, which might go in front rather than a separator, THEN good.

niklasl commented 2 months ago

An argument for not allowing whitespace in tokens.

Definitely; that was the point.

The |...| bracketing isn't necessary and also it is not clear bracketing.

As soon as the parse is inside the {| there are two cases - implicit and explicit reifier. That's a rule of one token lookahead.

For me, the visual case is important and multiple bracketing meanings of | isn't ideal.

I didn't think so either, but since it was among your suggestions (second bullet, the "naming unit") I thought it worth trying.

IF we can agree a different reference specifier, which might go in front rather than a separator, THEN good.

Agreed. But since names can be long, "bracketing" them or using a "pseudo-predicate" could be very helpful when reading. (See above examples.)

You also advised to use a consistent form of naming both in annotations and the "quoted" triples .(A pseudo-predicate wouldn't work in the latter.)

(Aside: Not sure what to call << ... >> now, as <<( ...)>> since the triple terms. You've suggested "triple descriptor" before; which I think works.)

niklasl commented 2 months ago

@afs and @domel, you both suggested ~.

I'm not aware of any use of it in related syntaxes, so I hope it wouldn't co-opt or conflict with anything.

A. Tilde as a suffix

Annotation form:

<Q34851> :nominatedFor <Q103618> {| <Q103618#6698506f> ~ a :Nomination ;
        :forWork <Q582281> ;
        :date "1958-02-18"^^xsd:date |} ,
    <Q103618> {| <Q103618#6698cb58> ~ a :Nomination ;
        :forWork <Q713979> ;
        :date "1959-02-23"^^xsd:date |} .

Unasserted form:

<< <Q103618#6698506f> ~ <Q34851> :nominatedFor <Q103618> >> a :Nomination ;
  :forWork <Q582281> ;
  :date "1958-02-18"^^xsd:date .

<< <Q103618#6698cb58> ~ <Q34851> :nominatedFor <Q103618> >> a :Nomination ;
  :forWork <Q713979> ;
  :date "1959-02-23"^^xsd:date .

B. Tilde as a prefix

Annotation form:

<Q34851> :nominatedFor <Q103618> {| ~ <Q103618#6698506f> a :Nomination ;
        :forWork <Q582281> ;
        :date "1958-02-18"^^xsd:date |} ,
    <Q103618> {| ~ <Q103618#6698cb58> a :Nomination ;
        :forWork <Q713979> ;
        :date "1959-02-23"^^xsd:date |} .

Unasserted form:

<< ~ <Q103618#6698506f> <Q34851> :nominatedFor <Q103618> >> a :Nomination ;
  :forWork <Q582281> ;
  :date "1958-02-18"^^xsd:date .

<< ~ <Q103618#6698cb58> <Q34851> :nominatedFor <Q103618> >> a :Nomination ;
  :forWork <Q713979> ;
  :date "1959-02-23"^^xsd:date .

C. Prefix in annotations; suffix in unasserted triples

Not as consistent; then again these are different contexts.

TallTed commented 2 months ago

[@TallTed] I think Allow zero or many annotations should be Allow zero or more annotations, as I don't think there's any forbiddance of one annotation, which would seem to be excluded by "many". Possibly Allow any number of annotations though that might allow negative numbers, so maybe Allow any non-negative number of annotations....

[@gkellogg] Can you point out where that text is in the document? I don't see it anywhere. That said, other places in this and other documents use "zero or more", so I'd be fine with that.

I've edited my comment, in the quote here and in the original, to have the link. That text doesn't appear in the document, so far as I know; only in the commit comment on https://github.com/w3c/rdf-turtle/pull/51/commits/e2715131251bdd817098b5307ccd70ccb8062d4a (so it poses a potential for confusion only for a few likely readers).

gkellogg commented 2 months ago

Another prefix syntax which could be considered is to use =. In N3, = is a stand-in for owl:sameAs and there were some discussions that this could have special meaning as the first part of a annotation (similar to Kurt's proposal for BlankNodePropertyList):

<Q34851> :nominatedFor <Q103618> {| = <Q103618#6698506f>; a :Nomination ;
        :forWork <Q582281> ;
        :date "1958-02-18"^^xsd:date |} ,
    <Q103618> {| = <Q103618#6698cb58>; a :Nomination ;
        :forWork <Q713979> ;
        :date "1959-02-23"^^xsd:date |} .

Note that this also makes = <Q103618#6698506f> it's own predicate-object, except that the same-as nature is folded into the identifier for the annotation (or reified triple term).

niklasl commented 2 months ago

Another prefix syntax which could be considered is to use =. In N3, = is a stand-in for owl:sameAs and there were some discussions that this could have special meaning as the first part of a annotation (similar to Kurt's proposal for BlankNodePropertyList):

[...]

Note that this also makes = <Q103618#6698506f> it's own predicate-object, except that the same-as nature is folded into the identifier for the annotation (or reified triple term).

Yes; this is what I call the "pseudo-predicate option". If chosen, I think it should only be allowed immediately after the opening terminal (i.e. not as a special predicate anywhere within).

I'm worried about stepping on N3 by declaring = built-in in turtle, but being sugar for owl:sameAs in N3. (Unless it's moved away from that? Still, tooling does still use it (e.g. RDFLib).)

N3 used to have the :- "iso" operator as such. But as mentioned in the issue, it has some problems.

A variant could work, including ~. (Or * :thinking: ...)

(I thought @ worked too, but the old use of it in N3 for prefixing keywords makes me less comfortable with that.)

Does it read well enough for reified triple terms (<< ... >>)? Some variants:

<< = <Q103618#6698506f> ; <Q34851> :nominatedFor <Q103618> >> a :Nomination .

<< ~ <Q103618#6698506f> ; <Q34851> :nominatedFor <Q103618> >> a :Nomination .

<< * <Q103618#6698506f> ; <Q34851> :nominatedFor <Q103618> >> a :Nomination .

(It may be odd to have that semicolon there; but it is also visually distinct, as opposed to just a prefix or suffix operator.)

afs commented 2 months ago

I have mocked up the choices discussed below in a SPARQL grammar. I'm not yet sure it is perfect because there are quite a few changes in SPARQL but it is parsing positive test cases. It actually has all the choices - they don't conflict!

SPARQL is more complicated than Turtle because of the way elements fit together e.g. BGPs can end without a DOT and be followed by non-triple forms. SPARQL is a superset of Turtle as regards for the subject and also treats predicate-object lists slightly differently.

Reifier Declarations

There is now a requirement for syntax to being able to declare reifiers without also including triples that use the reifier as subject. Without syntax, the data will have to include mention of rdf:reifies.

This removes the need to choose one triple which has <<>> as subject and the others jut use the id.

There are two forms:

    :s :p :o | :r .

triple, then optional reifier id, DOT.

Putting the reifier id after the triple makes it work in subject predicate-object-list blocks - see below. It is shorthand for:

    :s :p :o .
    :r rdf:reifies <<(:s :p :o )>> .

and the quoted triple form:

    << :r | :s :p :o >> .

being

    :r rdf:reifies <<( :s :p :o )>> .

This latter form makes << :r | :s :p :o >> behaves in syntax somewhat like [ :q :x ] . which is already a special case that can appear on its own.

Prefix and postfix declaration

This suggests one way to simplify the parser lookahead needs for quoted triples which is to have the reifier id at the end, not the beginning, of quoted triples. The quoted and annotation cases then both are postfix.

    <<:s :p :o | :r >>

This fits with

    :s :p :o | :r .

Refiier id for Annotation Syntax

Annotation syntax would be

    :s :p :o {| :q :z |} .

and with identifier

    :s :p :o | :r {| :q :z |} .

The {|..|} block uses the |:r as subject but otherwise is a predicate-object list giving possibility of use in asserte triple predicate-object list.

   :s :p1 :o1 {| :q1 :z1 |}  ;
      :p2 :o2 | :r {| :q2 :z2 |}  ;
      :p3 :o3 | :r ;
      :p4 :o4 .

not a two-style form causes by being inside the {| ... |}.

   :s :p1 :o1 {| :q1 :z1 |}  ;
      :p2 :o2 {| :r | :q2 :z2 |}  ;
      :p3 :o3 | :r ;
      :p4 :o4 .

or having :s :p :o {| :r |} for declaration.

Annotation Syntax Block

Currently {|...|}.

We do need an some kind of delimited matched pair for annotation.

In SPARQL, only simple paths (single property) can be used for annotation blocks.

SPARQL is more complicated than Turtle because BGPs can end without a DOT.

Choice of reifier id syntax

As current: postfix for annotation, prefix for quoted triple. A lookahead of 2 is needed - which is not impossible (aside: TriG already has a similar situation and there it is complicated).

postfix for quotes triple << :s :p :o | :r >> removes this need for lookahead.

An alternative in an introductory character (explored above). Using ; as a separator though seems odd to me because ; is widely used in Turtle already.

    << ~ :r | :s :p :o >>
    <<  :s :p :o ~ :r >>
    :s :p :o ~ :r .

Various single characters work, including | (even @ if a space between that and a prefix name is acceptable).

The separating | in << ~ :r | :s :p :o >> isn't necessary but avoids a quad-like appearance. Postfix form does have this consideration.

    :s :p :o ~ :r  .
    :s :p :o ~ :r {| :q :z |} .
    << :s :p :o  ~ :r >> .

~ is quite light in some fonts. Alternative:

    << ~(:r) :s :p :o >>
    :s :p :o ~(:r) .

The token is ~( - no white space between the characters.

I'm tending towards ~ :r at the moment.

gkellogg commented 2 months ago

My only concern is that a trailing reifier identifier may be counter-intuitive, but that may just be my own bias from having worked with the << :r | :s :p :o >> syntax for a while. Using ~ in a postfix notation does seem cleaner.

:s :p :o ~ :r  .
:s :p :o ~ :r {| :q :z |} .
<< :s :p :o  ~ :r >> .
gkellogg commented 2 months ago

The annotation syntax get's a bit tricky, if any number of reifiers/annotations can be added to a triple. This allows:

If the grammar has the following rules:

objectList            ::= object annotation* ( ',' object annotation* )*
reifier               ::= '~' (iri | BlankNode)
reifiedTriple         ::= '<<' subject predicate object reifier? '>>'
tripleTerm            ::= '<<(' subject predicate ttObject ')>>'
ttObject              ::=   iri | BlankNode | literal | tripleTerm
annotation            ::= reifier? ('{|' predicateObjectList '|}')?

There's a conflict because of using both annotation* and reifier? ('{|' predicateObjectList '|}')? which creates a LL(1) parser conflict. I'm sure there's a cleverer set of rules to avoid this.

niklasl commented 2 months ago

I like where this is going. (The SPARQL grammar indeed requires much more care!)

Triple reference << :s :p :o ~ :r >> . with postfix form looks in line with named annotation form (triple comes first).

The ~( name ) form could work with the many-to-one cases: ~( name1, name2 ) (with , to avoid confusion with lists, and also not looking like predicate object).

:s :p :o {| :q :z |} .

:s :p :o ~(:r) .

:s :p :o ~(:r, _:q) .

<< :s :p :o ~ :r >> .
afs commented 2 months ago

My only concern is that a trailing reifier identifier may be counter-intuitive, but that may just be my own bias from having worked with the << :r | :s :p :o >> syntax for a while. Using ~ in a postfix notation does seem cleaner.

Agreed. At one level, it is shame to change what's been written about, but at the same time, it hasn't been universally adopted. If we go postfix, then ~ vs | is pure choice and ~ disconnects from earlier writings and, for me, that is a resoanble decision to make.

afs commented 2 months ago

The annotation syntax get's a bit tricky, if any number of reifiers/annotations can be added to a triple. This allows:

  • :s :p :o ~ :r
  • :s :p :o ~ :r ~:r1
  • :s :p :o ~ :r {| :p :q |}
  • :s :p :o ~ :r {| :p :q |} ~:r1
  • :s :p :o ~ :r {| :p :q |} ~:r1 {| :p1 :q1 |}
  • :s :p :o ~ :r {| :p :q |} {| :p1 :q1 |}
  • :s :p :o {| :p :q |} ~:r1 {| :p1 :q1 |}

I think that's good to allow. That should be possible although not the way that grammar has it.

I can do that for SPARQL once the style is agree and I can trim down the universal grammar. The difference choice start to interact when Writing the multi-occurences does start to mix up with the couple some of the altenrative styles

Tentative direction:

afs commented 2 months ago

Triple reference << :s :p :o ~ :r >> . with postfix form looks in line with named annotation form (triple comes first).

Yes - while its not been the style up to now, the uniformity is appealing.

The ~( name ) form could work with the many-to-one cases: ~( name1, name2 ) (with , to avoid confusion with lists, and also not looking like predicate object).

Firstly, if we have this, we can have ~ :r and ~(:r1 r2). The comma isn't necessary and, to me, it's odd to have in some places and not others. , is object lists (which aren't lists! ... let's not go there). YMMV.

I don't think that ~() and an annotation block is very helpful especially in SPARQL.

    :s :p :o ~(:r1 r2) {| :q :z |} .
    :s :p :o ~(:r1 r2) {| :q ?z |} .

These more complex case maybe better as declaration-pattern:

    :s :p :o ~ :r .
    :d :e :f ~( :r1 r2 ) .
    :r1 :q :z .
    :r2 :q :z .

that is, ~(:r1 r2) is only allowed in a declaration form.

gkellogg commented 2 months ago

The ~( name ) form could work with the many-to-one cases: ~( name1, name2 ) (with , to avoid confusion with lists, and also not looking like predicate object).

I'm not sure that this will be a common enough pattern to create special syntax for it, as it's fairly easy (and arguably clearer) to create separate statements for each reifier. Also, serializing a graph containing duplicate reifiers with some overlapping annotations would be pretty challenging. I'd say we start with the single reifier grammar and re-consider adding ~(name1 name2) if it becomes important.

niklasl commented 2 months ago

Triple reference << :s :p :o ~ :r >> . with postfix form looks in line with named annotation form (triple comes first).

Yes - while its not been the style up to now, the uniformity is appealing.

Agreed.

The ~( name ) form could work with the many-to-one cases: ~( name1, name2 ) (with , to avoid confusion with lists, and also not looking like predicate object).

Firstly, if we have this, we can have ~ :r and ~(:r1 r2). The comma isn't necessary and, to me, it's odd to have in some places and not others. , is object lists (which aren't lists! ... let's not go there). YMMV.

Makes sense.

I don't think that ~() and an annotation block is very helpful especially in SPARQL.

    :s :p :o ~(:r1 r2) {| :q :z |} .
    :s :p :o ~(:r1 r2) {| :q ?z |} .

I agree (and find that combination harder to read too).

These more complex case maybe better as declaration-pattern:

    :s :p :o ~ :r .
    :d :e :f ~( :r1 r2 ) .
    :r1 :q :z .
    :r2 :q :z .

that is, ~(:r1 r2) is only allowed in a declaration form.

Yes, I think I'd readily accept that.

It's akin to blank nodes, where the embedded [ ... ] form works for simple cases. In complex cases (e.g. many-to-many), declaration patterns probably work better. (Both for serializing and, IMHO, for reading — I'm used to look for a "top-level description" for names that are used in many places.)

niklasl commented 2 months ago

The ~( name ) form could work with the many-to-one cases: ~( name1, name2 ) (with , to avoid confusion with lists, and also not looking like predicate object).

I'm not sure that this will be a common enough pattern to create special syntax for it, as it's fairly easy (and arguably clearer) to create separate statements for each reifier. Also, serializing a graph containing duplicate reifiers with some overlapping annotations would be pretty challenging. I'd say we start with the single reifier grammar and re-consider adding ~(name1 name2) if it becomes important.

Sure; there are pros and cons here (repeating only the object probably isn't too bad).

It might be somewhat important though, so let's keep it open for more feedback. For example, cases derived from Wikidata may map cleanly to multiple reifiers per triple — here's a sketch using the ~(name1 ...) form (and two non-asserted triple descriptions at lines 624 and 626).

Some serialization considerations. Given a "random" triple stream (here as pseudo-ntriples-with-pnames):

:s :p :o .
r1 rdf:reifies <<( :s :p :o  )>> .
:r2 rdf:type :Note .
:r1 rdf:type :Note .
:r2 rdf:reifies <<( :s :p :o )>> .
:r2 rdf:reifies <<( :s :p :q )>> .
:s :p :q .

A process with some memory of seen triples but no buffering nor indexing can still stream out "best effort" Turtle line-wise, making it more "well-formed":

:s :p :o ~ :r1 .
:r2 a :Note .
:r1 a :Note .
:s :p :o ~ :r2 .
<< :s :p :q ~ :r2 >> .
:s :p :q .

Whereas a pretty-printer with access to the entire graph could do:

:s :p :o ~( :r1 :r2 ) ,
    :q ~ :r2 .

:r1 a :Note .

:r2 a :Note .

It would, for each triple, serialize all rdf:reifies triples as annotation name "markers".

AFAICS, only if such markers are neither reifiers of any other triple, nor the subject of any other triple, can they be serialized using the blank {| ... |} annotation syntax. That may very well be in the absolute majority in practice, which is fine; and the syntax caters well for that. I hope these other declaration patterns will also work well for the other, more complex scenarios.

afs commented 2 months ago

The ~( name ) form could work with the many-to-one cases: ~( name1, name2 )

Am I reading Gregg's multiple annotations proposal correctly here and this can be done with:

  • :s :p :o ~:r ~:r1 .

?

afs commented 2 months ago

A question for clarification:

:s :p :o ~:r {| :p :q |} {| :p1 :q1 |} .

Is this case 1, which I prefer, and was my initial reading (... then I wrote the parser rule trying to make it clear what was happening rather than merely passing the language ...)

The second annotation block has a generated reification id and would be the same as writing:

:s :p :o {| :p1 :q1 |} ~:r {| :p :q |} .

and making

:s :p :o {| :p1 :q1 |} {| :p2 :q2 |} .

two separate blank nodes?

:s :p :o .
_:b1 rdf:reifiies <<(:s :p :o)>> .
_:b1 :p1 :q1 .
_:b2 rdf:reifiies <<(:s :p :o)>> .
_:b2 :p2 :q2 .

or is it case 2 where the ~:r apply to all following blocks until the next reifierId and if so what about :s :p :o {| :p1 :q1 |} {| :p2 :q2 |}. - does the second block "inherit" the blank node from the first? The "the same bnode" here in case2 feels odd.

At some point, we have to say "don't rely on gnarly expressions to do what you want - write them clearly" and provide a justifiable reading.

Case 1 style would be explaining

:s :p :o {| :p1 :q1 |} {| :p2 :q2 |} .

as shorthand for

:s :p :o {| :p1 :q1 |} .
:s :p :o {| :p2 :q2 |} .
gkellogg commented 2 months ago
:s :p :o ~:r {| :p :q |} {| :p1 :q1 |} .

Is this case 1, which I prefer, and was my initial reading (... then I wrote the parser rule trying to make it clear what was happening rather than merely passing the language ...)

The second annotation block has a generated reification id and would be the same as writing:

:s :p :o {| :p1 :q1 |} ~:r {| :p :q |} .

That's my interpretation, and what I think makes sense.

and making

:s :p :o {| :p1 :q1 |} {| :p2 :q2 |} .

two separate blank nodes?

+1

:s :p :o .
_:b1 rdf:reifiies <<(:s :p :o)>> .
_:b1 :p1 :q1 .
_:b2 rdf:reifiies <<(:s :p :o)>> .
_:b2 :p2 :q2 .

or is it case 2 where the ~:r apply to all following blocks until the next reifierId and if so what about :s :p :o {| :p1 :q1 |} {| :p2 :q2 |}. - does the second block "inherit" the blank node from the first? The "the same bnode" here in case2 feels odd.

To me, that doesn't make sense.

At some point, we have to say "don't rely on gnarly expressions to do what you want - write them clearly" and provide a justifiable reading.

Case 1 style would be explaining

:s :p :o {| :p1 :q1 |} {| :p2 :q2 |} .

as shorthand for

:s :p :o {| :p1 :q1 |} .
:s :p :o {| :p2 :q2 |} .

+1

gkellogg commented 2 months ago

There is a bit of ambiguity still in the proposed grammar. ~ :r {| :p :o |} could also be parsed as ~ :r ~ _:bn {| :p :o |}, as it's ambiguous if the annotation block stands alone or is intended to use :r as it's reifier. One way to solve this would be to define the grammar as follows:

annotation            ::= '{|' predicateObjectList '|}'
                        | reifier ('{|' predicateObjectList? '|}')

This way ~r would always need to be followed by a (potentially empty) annotation block. The example above would become ~ :r {||} {| :p :o |} to generate the following triples:

:s :p :o ~r {||} {| :p :o |} .

# expands to

:s :p :o .
:r rdf:reifiies <<(:s :p :o)>> .
_:b1 rdf:reifiies <<(:s :p :o)>> .
_:b1 :p :q .

Alternatively, the ambiguity can be resolved in parser logic and use an alternative grammar:

annotation            ::= reifier | '{|' predicateObjectList '|}'

If a parser parses a reifier and subsequently parses the annotation block it would assign the previously parsed reifier to that annotation block, but the BNF itself is ambiguous which is concerning.

niklasl commented 2 months ago

Is there a need to both name and describe the reifier in place? With bnodes its either id or description block, so it would follow the general Turtle design to either id or describe an anonymous reifier here too.

gkellogg commented 2 months ago

Is there a need to both name and describe the reifier in place? With bnodes its either id or description block, so it would follow the general Turtle design to either id or describe an anonymous reifier here too.

We need the ability to name a description block with either an IRI or a blank node. If not provided, the name (reifier) is automatically generated. Because the grammar allows both the description block and the reifier to be optional we have a conflict. Based on discussion, it seems that there is a need to both name and describe or just describe a description block.

afs commented 2 months ago

the BNF itself is ambiguous

The BNF is fine - what has to be defined is the translation from the syntax tree to triples output (section 7).

This is LL(1) for the multiple annotation case via the *

        Object           := GraphNode Annotation
    Annotation       := ( Reifier | AnnotationBlock )*
    AnnotationBlock  := <L_ANN> PropertyList <R_ANN>

(In SPARQL, PropertyList can be empty. GraphNode is anything that can go in the object position (it's not a very good name))

These parse rules do not try to associate the reifier with the annotation block. It is not showing as ambiguous because the sequence ~:e {| :x :y |}is just fine as concrete language - a reifier id followed by a {| |} block.

The meaning, the translation to triples, would have a state variable for the reifier id which is initially unset, then set by ~:e then cleared by |}. Similar to :s :p1 :o1 ; :p2 :o2 . passing the subject on until DOT.

Writing

annotation            ::= '{|' predicateObjectList '|}'
                        | reifier ('{|' predicateObjectList? '|}')?

(I think there was a missing ? on the second line which I've included to allow :s :p :o ~:e .)

is a problem for multiple reifiers/annotation blocks. annotation* is concrete-language ambiguous.

~:e {| |} can be first clause, with ? empty then a second clause, or it can be first clause with non-empty ?.

afs commented 2 months ago

To move forward I suggest moving this PR out of draft so as to merge it to get everything else into the doc even if the grammar isn't final.

Create a follow-up issue, or issues, for specific points in the grammar.

gkellogg commented 2 months ago

Squashed and force-pushed to rebase to main.

gkellogg commented 2 months ago

GitHub suddenly told me that this was merged... an hour ago. sigh

I'm thinking it will be faster/easier for you to put these into a new PR than for me, but I can do it if it's a burden.

Sure, I can incorporate this into a followup PR.

pchampin commented 1 week ago

This was discussed during the rdf-star meeting on 26 September 2024.

View the transcript

syntax for reifiers

<doerthe> I have to leave, sorry

ora: I think the main point of contention is whether this is prefix or postfix

gkellogg_: that, and tilda versus pipe or other characters.

ora: AndyS, you make a point about ease of parsing

AndyS: not just that. The pipe is already used in SPARQL, although there are ways around that.
… Enrico made the point that in N-Triples, the reifier comes first (in the subject position).
… I think that internal consistency in Turtle is more important.

tl: I made a few proposals, including the use of pipe everywhere, and replacing the curly brackets in the annotation syntax.
… I think we should have looked at the problem that way.
… I find Enrico's argument about the position in N-Triples irrelevant.

ora: you are saying this is a usability issue.

tl: yes, it is the interface, it is important to get this right.

niklasl: I agree, affordances are important, that's why the pipe is tricky because of its use in SPARQL.

pchampin: I agree that this is turnning into a broad discussion we can't do in a short amount of time.
… We can consider suffix vs. prefix and separately the tokens used.

niklasl: agreed, long prefix makes things hard to read

<ora> STRAWPOLL: Postfix?

<ora> +1

<gkellogg_> +1

<pchampin> +1

<tl> +1

<niklasl> +1

<pfps> 0

<Dominik_T> 0

<gtw> +1

<TallTed> +1

<AndyS> +1

<eBremer> +1

<ktk> +1

<ktk> Tpt: are you around?

<Tpt> I am back

Ora: there is still the question of which character we choose

ora: There are arguments against |

ora: There will re reifirers without annotations blocks and annotation blocks without reifiers

ora: if you see an annotation block after a reifier, it is related to this reifier so there is some memory needed

<tl> my 5cents on syntax: https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Sep/0073.html

AndyS: it's easier than doing RDF list

gtw: Have we a concised summary of the various syntaxes?

<AndyS> https://github.com/w3c/rdf-turtle/blob/main/spec/turtle.bnf

<pchampin> << :s :p :o ~ :r >>.

<tl> Souri asked for that

<niklasl> I tried to have a bunch of variants appear "naturally" in https://niklasl.github.io/rdf-docs/presentations/RDF-reifiers-1/ Slide 19 uses that form.

tl: I would like to point this syntax proposal but I thought we would do syntax later : https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Sep/0073.html

<pchampin> :s :p :o ~ :r1 ~ :r2 {| :a :b |}.

gkellogg: you can insert more than 1 annotation or refifier

<pchampin> :s :p :o ~ :r1 ~ :r2 {| :a :b |} ~ :r3.

gkellogg: in any order

pchampin: if there is no reifier before annotations, the reifier is a blank node

AndyS: what I find odd is that the annotation block have to have at least one predicate object inside

AndyS: it makes generating this kind of syntax from a program more complicated

<niklasl> That empty annotation blocks weren't allowed did trip me up in my introductory slide (8) for annotation sugar.

ora: Both Turtle and SPARQL use predicateObjectList+

<niklasl> So +1 from me for allowing it. Makes it easier to save hand-edited, unfinished turtle...

<tl> from my proposal: { :s1 :p :o . :s2 :p :o | :r1 } [| :a :b |] .

<AndyS> :s :p :o ~ :r1 ~ :r2 {| :a :b |} {| :c :d |}

<niklasl> <s> :p <o> ~ <r1> {| a :Named |} . <s> :p <o> ~ <r1> ~ {| a :NotNamed |} .

AndyS: Are you suggesting we have an empty annotation block to "cancel" the preceding reifier?

<niklasl> See above line. :)

gkellogg: you can do "~ {|" to get a blank node

<tl> from my proposal: <| :s :p :o | :r |> :a :b .

tl: We should keep {} for group of statements, not annotations

tl: If we change the abstract reified triple to <<| we use pipes everywhere

tl: That way the pipe would be everywhere we use RDF-*

gkellogg: I am afraid it collide with N3 where they use | for object paths

gkellogg: the triple object can be a path, and I believe it can include "|"

gkellogg: This would be against a bare pipe

<Dominik_T> gkellogg can you provide a link or an example where in N3 pipe can be used?

pchampin: I would like to come back to the previous topic, my personal opinion is that ~ without identifier is a bit strange. I would argue it's not ncessary required we can still write ~ []

gkellogg: A [] now means bnode property list

gkellogg: If we allow empty annotation blocks, it's also a way to avoid the empty ~

<gtw> I believe per the current Turtle draft spec, [] would be valid per the reifier rule: `reifier::='~' (iri | BlankNode)?` (via BlankNode)

AndyS: I think it's a bit confusing because it would be the only place where you can have [] but not [ propertyObjectList ]

ora: If we confuse users it's not going to lead to anything good

ora: We have this think with multiple reifiers and annotations. Is it really relevant?

ora: I don't want for people to start to write things and getting it wrong

<niklasl> Pro/con: <s> :p <o> ~ [ :date "2024" ] . # Pro: Regularity, same syntax for bnodes. Con: may be odd in combination with the naming-and-describing pairing mechanism.

ora: Syntax discussions are often more difficult semantic discussions

<niklasl> +1 for syntax being more difficult (also: "there is only syntax")

ora: It would be nice if we can break this up into a series of decisions

ora: would be nice if somebody take the trouble to figure out which decisions we have to make, we would have examples of the variants

pchampin: if we keep "<<" we need to keep it consistent with what people expect from the CG

tl: << has been used also for asserted things

tl: what part of history do we refer to when we talk about user assumptions?

<pchampin> q.

pchampin: To be clear I said "if we keep" the <<, getting ride of it alltogether is a way to solve the problem

gkellogg: It would be nice to make a decision, everything depends on it

ora: it's unfortunate that the syntax PR has been opened for such a long time with not enough attention

ora: People often take tiny interest on syntax, way less than it is warenteed

ora: I am open to suggestions how to do this

AndyS: we should take this offline

ora: agree we do this offline, in a way we ended up in a place I did not wanted to end, fighting over these things

ora: I suggest chairs will pick this up and will go from there

<pfps> which PR?

<pchampin> w3c/rdf-turtle#51

<gb> MERGED Pull Request 51 Grammar updates for triple terms and occurrences. (by gkellogg) [spec:substantive]

pchampin: In the interest of splitting into multiple decisions, I think we can bundle the brackets for triple term, unasserted triples and annotations


rat10 commented 1 week ago

@afs

If we go postfix, then ~ vs | is pure choice

I hope this is still valid, and it is good to know.

and ~ disconnects from earlier writings and, for me, that is a resoanble decision to make.

I beg to differ: we may have been working for years on this, but we're still not in the situation where we have to cater for an installed base. We can still do what we want, and we should strive for a design that is coherent and compelling. Updating our examples or getting confused in discussions by examples from different periods is a minor problem compared to users of the finished spec having to deal with the side effects of some tactical decisions forever.