w3c / rdf-star-wg

RDF-star Working Group
Other
25 stars 8 forks source link

Determine naming syntax for reifiers #116

Closed niklasl closed 3 months ago

niklasl commented 6 months ago

The currently proposed naming notation doesn't work for annotations in SPARQL. It collides with the use of | as the AlternativePath operator:

SELECT * { ?s ?p ?o {| dct:issued | dct:modified "2023" |} . }

The above already means "select any triples with annotations described as dct:issued or dct:modified in 2023", and thus cannot also mean "select any triples with the annotation denoted by dct:issued, which was dct:modified in 2023". (Try it out in the Jena-based SPARQL validator.)

Turtle is not SPARQL, but SPARQL has been designed in part with the intent of being easy to copy-paste Turtle into. Besides, having this notation mean completely different things in so similar syntaxes will hurt comprehension a lot.

(See also original mail about this.)

gkellogg commented 6 months ago

IMO, this can be resolved by a parser greedily choosing dct:issued | as the identifier of that annotation.

If an AnnotationPattern (or AnnotationPatternPath) is defined as follows:

AnnotationPattern       ::= '{|' ( ( VarOrIri | BlankNode ) '|' )? PropertyListNotEmpty '|}'

A parser can take the IRI (Var or Blank Node) as the identifier, rather than a path component, leaving the rest for PropertyListNotEmpty. The PropertyListNotEmpty (or PropertyListPathNotEmpty) branch can be distinguished by surrounding the path with (). While not context-free, it is unambiguous.

Note that backtracking is necessary if there is no | following the presumed identifier. but this is unrelated to this particular issue. Similar backtrackig would be needed for a Reifier (ReifierData and ExprReifier) rule, replacing QuotedTriple, QuotedTripleData, and ExprQuotedTriple. productions, but these don't have the same confusion with leading path compoments.

There is certainly room for confusion, and we would need to highlight the potential ambiguity on annotations. Alternatively, a syntax that uses something other than | to separate the optional identifier from the ProeprtyList would make it less ambiguous, but may be less aesthetically pleasing.

niklasl commented 6 months ago

Of course, but we would change the current meaning of the long-established alternate path operator by doing so. That would overload it as naming operator, so relying on disambiguation here should be a last resort. We haven't even considered any other option yet. If all other options are less aesthetically pleasing, it would be more convincing.

Since I both read and write SPARQL most every week, using this operator fairly often, I immediately reacted to this choice. I can't say how long it would take to get used to for me or other users of SPARQL, especially occasional ones (a fairly common case). But I can say that it would not be immediately obvious; and more crucially could lead to very confusing, wrong results. The example query would yield no results, since I'd accidentally queried for a reifier named dct:issued instead of the intended either-or date.

afs commented 6 months ago

We should take a broad view as to the best syntax and thoroughly investigate choice to try to make changes once for Turtle and SPARQL.

We also need to reaffirm that annotation syntax still is wanted given multi-triple reifiers, as well as other choices that have been made on the journey.

Alternatively, a syntax that uses something other than | to separate the optional identifier from the ProeprtyList would make it less ambiguous, but may be less aesthetically pleasing.

It would be more robust to getting it right. For example, a naming form with a leading marker, e.g. ~ :edge |, (~ is used as an example here, nothing more) in a position that can not be a property path may work better. There are more character choices at that point.

afs commented 6 months ago

We need consistent appearance.

<< :e | :s :p :o >> also uses | although in this case it there is no conflict with AlternativePath.

There are two categories of syntax approaches:

niklasl commented 6 months ago

I think @ or :: can work; perhaps as a "pseudo-predicate".

N3 used to have the :- "iso" operator as such. The :- is problematic, for the reasons you give, but a variation might work (if we want to open up that possibility, which could also be used to "name" blankNodePropertyLists [ @ :e ; :p :o ]). Such a "naming predicate" should be syntactically restricted to the first pair.

If we weren't long down the {| ... |} road, I'd certainly consider a bigger redesign. Well, I think we could, but can understand if that is not a shared sentiment. The rest of this comment is an attempt anyway.

Given that triples are "tagged" with a reifier, denoted by a regular IRI or bnode, an annotation form like {[ dct:issued "2023" ]} and, when named, as {_:e}, {:e} or {<e>} could make sense (then describing the last two explicitly named reifiers is done separately, as in regular Turtle). The EBNF would be: '{' (iri | blank | blankNodePropertyList)+ '}' (+ if allowing repeated names within the brackets).

That might step on the toes of graph literals though. (Since the annotation comes after the object position, it's not really a collision; but could be hard to read if graph literals are eventually allowed?)

Some years ago an actual star * was suggested. It might work as a bare prefix too (*[ ... ]), unless it's too easy to miss. Double **:e might be too harsh? Surrounding "earmuffs" *[ ... ]* to odd?

Since the single triple reifiers follow a different shape (one single triple), consistency of appearance is tricky; but certainly a consistency of design is important (thus taking a step back is wise). Not sure if << {:e} :s :p :o >> or << *:e* :s :p :o >> would fly.

Again, I could see a reconsideration of design here too; e.g. "quoting" the object to form an unasserted single triple, and use the annotation syntax to name and/or describe that): :s :p << :o >> *[ :date "2023" ]*. But we're even longer down that particular road (albeit with shifts in meaning along the way).

TallTed commented 6 months ago

@afs — I think there are a few typos remaining in https://github.com/w3c/rdf-star-wg/issues/116#issuecomment-2072020602, after your edits to date. These make this already-niggling thread even more challenging to follow.

In the third bullet, for instance, you now have —

or :- (minus can not appear at the start of a local name; :s :-123. is currently legal as subject : , predicate : and object -123

I think this bullet should read —

or :- (minus can not appear at the start of a local name); :s :-123. is currently legal as subject :s , predicate :, and object -123.

(I've inserted a close-paren, an s, a comma, and a full-stop.)

The last sentence of the second bullet now reads —

IMO we're a long down the {| |}` road.

Again, I think that should be —

IMO we're a long way down the {| |} road.

(That just got an inserted way and a back-tick.)

I do not feel certain that the remainder of that comment currently appears as intended. Hopefully a close review of the rendered comment will reveal any remaining errors to you.

niklasl commented 4 months ago

The wrapped "naming" unit appears to have some particular benefits.

Take the following example, based on a UCR case:

ex:Ioannes_68 a crm:E21_Person ,
        ex:Gender_Eunuch {| |ex:Gender_Assignment_Eunuch| a crm:E17_Type_Assignment ;
                crm:P14_carried_out_by ex:Paphlagonian_family ;
                rdfs:label "Castration gender assignment" |} ;
    rdfs:label "John the Orphanotrophos" .

<< |ex:Gender_Assignment_Male_By_Decree| ex:Ioannes_68 a ex:Gender_Male >> a crm:E17_Type_Assignment ;
    crm:P14_carried_out_by ex:emperor ;
    crm:P182_inverse_starts_after_or_with_the_end_of ex:Gender_Assignment_Eunuch ;
    rdfs:label "Gender assignment by decree".

<< ex:Ioannes_68 a ex:Gender_Male >> a crm:E17_Type_Assignment ;
    crm:P14_carried_out_by ex:Paphlagonian_family ;
    crm:P183_ends_before_the_start_of ex:Gender_Assignment_Eunuch ;
    rdfs:label "Birth gender assignment" .

(This is considered as expanding to rdf:reifies with transparent triple terms (minimal baseline).)

The benefits compared to just a leading or trailing delimiter (or a pseudo-predicate) appear to be:

  1. It is immediately obvious in the "unasserted triple reifier" that the name is not the subject. Otherwise readers would have to scan possibly long IRIs to check whether the IRI is the name or the subject.
  2. Likewise it is clear where the name ends.
  3. In annotations, the name can stay on the same line with the first predicate-object in the annotation. If it had been a pseudo-predicate ending with ;, pretty-printers would probably append a newline after that. With this naming unit it stands out more.

Adding this kind of "embedded naming" to allow [..] brackets for named nodes with nesting is reasonably not within our charter. But it could be done in the future, so it is prudent to check how it would work. For example, it could be useful in SPARQL (for convenience, not necessarily for readability). This is a "gnarly" example which I think works (but perhaps some would find too cryptic):

SELECT * WHERE {
  [ :author ?x ; :relatedTo | :partOf [ |?y| :author ?x ] ] :translationOf [ :relatedTo | :partOf ?y ]
}

(While wrapping with pipes |...| may need more readability testing than the above, it is also well-known in other contexts (e.g. Ruby and Rust) who also have pipe as a binary operator.)

afs commented 4 months ago

The title of this issue should be changed. The discussion seems to have moved away from annotation syntax and given generalized reification, annotation syntax may, or may not, be included in RDF 1.2.

A low risk extension to "agree syntax" is reification declaration so that there is no need to use the reification identifier. This would reduce the need to write the text rdf:reifies in Turtle.

<< :r | :s :p :o >> .

which is

:r rdf:reifies<<(:s :p :o )>> .

When generating data, separating declaration from use makes for consistency. It is also useful because it otherwise at least one place must declare as well as use. But there is no natural single place and so no (loosely) canonical forms. Writers might even end up moving the declaration around on each serialization of the same data fragment (the hash map ordering effect).

<< :r | :s :p :o >> .
:r :date "2024-07-17 .
:r :source <http://somewhere/> .
niklasl commented 4 months ago

@afs Good points. I've renamed the title; please adjust it further if needed.

afs commented 4 months ago

Comment above corrected: removed "reification blocks".

Reification blocks don't work and suggest something that isn't the case.

The triples are not in the graph, a number of triple terms are

<< :r | :s :p1 :z . :z :p2 :o  >> .

is

<< :r | :s :p1 :z >> .
<< :r | :z :p2 :o >> .

There isn't a unit of :s :p1 :z . :z :p2 :o. being reified by :r. It's not indivisible or closed.

That makes a case for reification declarations which (separate declaration and usage) then use :r in graph triples for whatever the app wishes to say.

afs commented 3 months ago

The syntax discussion : https://github.com/w3c/rdf-turtle/pull/51#issuecomment-2238898582

TallTed commented 3 months ago

@afs — In https://github.com/w3c/rdf-star-wg/issues/116#issuecomment-2234357279

The discussion seems to have moved away from annotation syntax and given generalized reification, annotation syntax may be included in RDF 1.2.

I think (if I understand the rest of this thread) you meant to say (emphasis mine), "annotation syntax may not be included"...

Am I correct in my thinking?

afs commented 3 months ago

Am I correct in my thinking?

Corrected.

With things more recently discussed, there has been more interest in annotation use cases. My long note on https://github.com/w3c/rdf-turtle/pull/51 comes from investigating syntax in ways that align annotation and description cases (names not decided).

afs commented 3 months ago

Close - work continues on RDF Turtle document.