w3c / rdf-concepts

https://w3c.github.io/rdf-concepts/
Other
12 stars 4 forks source link

where are triple terms allowed #80

Closed pfps closed 6 months ago

pfps commented 8 months ago

Some recent documents only show triple terms in object positions. But several use cases have what would be triple terms in subject positions.

https://github.com/w3c/rdf-ucr/wiki/Capturing-triple-origin-in-SPARQL-star https://github.com/w3c/rdf-ucr/wiki/Describing-a-Union-of-Changes-to-a-Named-Graph https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-CIDOC-CRM-events https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-contextualizing-historical-assertions https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-labelled-property-graphs https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Wikidata

afs commented 8 months ago

I am confused: looking the use cases, they seem to be about (named) occurrences, often annotation syntax.

Please could we have an example of the use of a triple term in the subject position?

To pick one of the UC:

<< <7d5d0d651caa> :subject <semantics> >> :suggestedBy <classifyer> .

I would have thought that was an occurrence .

(this comment in no way is an opinion about whether they should be / should not be allowed)

pfps commented 8 months ago

Yes, these should probably now be occurrences, which could result in them not requiring triple terms as subjects.

TallTed commented 8 months ago

I see now that the seeking-consensus table only includes TRIPLE_TERM in the object position. (I also see that it doesn't include OCCURRENCE at all.)

I feel strongly that this is an error, and that TRIPLE_TERM should also be allowed in the subject position. It certainly breaks with my intuition of how TRIPLE_TERM should be handled.

Perhaps we now need an option 3b (treating the existing option 3 as 3a)?

Why is the same markup, << <7d5d0d651caa> :subject <semantics> >>, in —

<< <7d5d0d651caa> :subject <semantics> >> :suggestedBy <classifyer> .

— an OCCURRENCE, while in —

<classifyer> :suggested << <7d5d0d651caa> :subject <semantics> >>.

— it's a TRIPLE_TERM?

I would like to know why people feel that TRIPLE_TERM should not be allowed in the subject position — i.e., what would this (be likely to) break? Is this restriction an optimization to avoid an anticipated performance hit, before that hit has been realized through implementation experimentation?

gkellogg commented 8 months ago

Use cases are satisfied using Occurrences, not Triple Terms. A Triple Term exists as the object of a triple relating the occurrence to the triple term via rdf:nameOf (or suitable replacement). There is no use case for using a Triple Term in the subject position. Ideally, we could have an option where a Triple Term isn't necessary at all, but it is necessary to describe what triple an occurrence is for.

The fact that this point keeps getting either confused, or misunderstood, may not bode well for the general design, as without thorough rationale, these issues will keep coming up in the wider community.

hartig commented 8 months ago

@TallTed

Why is the same markup, << <7d5d0d651caa> :subject <semantics> >>, in —

<< <7d5d0d651caa> :subject <semantics> >> :suggestedBy <classifyer> .

— an OCCURRENCE, while in —

<classifyer> :suggested << <7d5d0d651caa> :subject <semantics> >>.

— it's a TRIPLE_TERM?

It is not. Instead, in both cases it is an occurrence.

... and in both cases it is syntactic sugar that expands to a triple with a corresponding triple term in the object position and rdf:nameOf as predicate (or some other suitable IRI—it is still under discussion what this predicate IRI should be).

That is, the first of your examples is a shorthand notation for the following two triples (written in the abstract syntax as defined in PR #78, assuming b is a fresh blank node and all the other components are properly expanded IRIs).

( b,  rdf:nameOf,  (<7d5d0d651caa>, :subject, <semantics>) )
( b,  :suggestedBy,  <classifyer> )

Note that the object of rdf:nameOf triple is a triple term.

Your second example expands to the following two triples.

( b,  rdf:nameOf,  (<7d5d0d651caa>, :subject, <semantics>) )
( <classifyer>,  :suggestedBy,  b )

Note that in both cases the rdf:nameOf triple is the same.

Now you may also ask how these pairs of triples would look like in N-Triples format or in Turtle. The currently-considered markup for triple terms when written in these formats is <<( ... )>>. Hence, for your first example, this would look as follows (where _:xyz is just a randomly picked blank node identifier for the blank node b).

_:xyz   rdf:nameOf    <<( <7d5d0d651caa> :subject <semantics> )>> .
_:xyz   :suggestedBy  <classifyer> .

And for your second example:

_:xyz          rdf:nameOf     <<( <7d5d0d651caa> :subject <semantics> )>> .
<classifyer>   :suggestedBy   _:xyz.
afs commented 7 months ago

It seems from the comments discussion above, all use cases work using occurrences.

As this came up in the telecon of 2024-03-07 for follow-up, to move this forward, can we establish a example.

@TallTed, do we have a counter-example to idea that "all use cases work using occurrences", where a triple term is used, not triple occurrence, in the subject position. Even better, where an occurrence can not be used.

@pfps, are you still happy with your comment

Yes, these should probably now be occurrences, which could result in them not requiring triple terms as subjects.

TallTed commented 7 months ago

@afs

@TallTed, do we have a counter-example to idea that "all use cases work using occurrences", where a triple term is used, not triple occurrence, in the subject position. Even better, where an occurrence can not be used.

I don't have such a counter-example.

Partly, because I still do not understand what the definitions of triple term and triple occurrence are; hence, do not understand what the difference(s) between these definitions might be; hence, cannot say what use cases might be (un)satisfied by either in any position.

To date, the only difference I'm aware of between the definitions is simply that a triple term (recently a/k/a rdf:triple) is defined to be forbidden to be an rdf:subject. I do not know the reason for this forbiddance.

gkellogg commented 7 months ago

The RDF-star use cases are satisfied using the reification triples, of which triple term is a component. A well formed RDF-star graph will have a triple term only being the object in a triple which also uses the rdf:reifies predicate and an IRI or Blank Node subject. Triple Term is necessary to create an atomic term which contains everything about that triple. The reification is an identified reference to a triple term. Using a triple term outside of this "macro" (or equivalent expansion) is not well formed and has no use case.

afs commented 7 months ago

@gregg

Using a triple term outside of this "macro" (or equivalent expansion) is not well formed and has no use case.

Agreed. We are focusing on the "reification well-formed" (= RWF.) case.

@TallTed

I do not know the reason for this forbiddance.

(no advocacy summary of how we got here follows ...)

RWF gives a formal account of the meaning of occurrences (possible usage of a triple in a graph).

The use cases the working group is addressing are covered by "reification well-formed".

The RDF data model (RDF abstract syntax) does not enforced the well-formed condition, e.g. a single triple :e rdf:reifies <<( :s :p :o )>> is a legal RDF graph. It is legal as N-triples.

Currently, the state is that the proposed RDF data model does not allow triple terms in the subject position. That was the outcome of discussions prior to the introductionof RWF. It wasn't necessary to allow triple terms in the subject position and the group wished to encourage/emphasise the use of "occurrences" (new form reification) as tokens.

There is some opinion within the working group that triple terms in the subject position should be allowed. See, for example, the RDF Abstract syntax (the RDF data model) in RWF. It does not affect the reification well-formed condition.

The RDF collections (lists) situation is similar but with the difference that lists are given by vocabulary. They are not mentioned in the RDF data model, or "RDF Concepts". Non-well-formed lists are very rare in my experience - they mainly come up in making sure Turtle/Trig writers can cope with data with non-well-formed lists, not in data in the wild.

pchampin commented 6 months ago

This was discussed during today's meeting: https://www.w3.org/2024/04/04-rdf-star-minutes.html#t07

IS4Code commented 2 months ago

@Gregg

Using a triple term outside of this "macro" (or equivalent expansion) is not well formed and has no use case.

I am somewhat confused by this assertion. If using rdf:reifies is a requirement every time a triple term is to be described, what is the point of having triple terms in the first place? Where is all that compactness and efficiency of triple terms when they always have to be linked through another node? You could just use the old reification vocabulary and be done with it then, since the only difference is using 3 triples instead of a single one with rdf:reifies.

Now, I get why rdf:reifies is useful and why it is beneficial to direct people to use it, but I don't see how making it mandatory helps anything. There is no precedent for this even ‒ compare literals: a literal value is identified by a value in the lexical space of a datatype, serialized as a literal term. There is nothing wrong using a literal value directly: :s :p "l"; there is nothing wrong using a literal value indirectly: :s :p [ rdf:value "l" ] (e.g. with rdf:direction and anything else that is relevant).

No use case? What about this?

_:graph :asserts <<(:s :p :o)>> , <<(:s2 :p2 :o2)>> , <<(:s3 :p3 :o3)>> .

You don't need to pinpoint a particular occurrence of a triple just to link it to a graph. Sure, :asserts is really just :contains/rdf:reifies, but what is wrong with using the triple directly there? You don't interfere with any other one's notion of such a triple by linking it that way. You certainly wouldn't think _:graph :containsLiteral "l". is wrong. You don't mandate to identify the occurrence of such literal every time you use it.

Likewise, this feels completely natural to me:

[
  owl:sameAs <<(:s :p :o)>> ;
  rdf:subject :s ;
  rdf:predicate :p ;
  rdf:object :o
] .

You don't describe a single occurrence of that triple, you don't want to describe a single occurrence of that triple there. Same with literals:

[
  owl:sameAs "literal" ;
  :length 7
] .

I like rdf:reifies. I see it as a sub-property of skos:broader, linking a narrow concept to an essentially same broader concept. I am okay with restricting triple terms to the object position of a triple ‒ literals get the same treatment (even though one could certainly imagine them in the other positions as well). But I don't agree with the note in this section:

Every triple with a triple term as its object SHOULD use http://www.w3.org/1999/02/22-rdf-syntax-ns#reifies (rdf:reifies) as its predicate. Every triple whose object is not a triple term SHOULD NOT use http://www.w3.org/1999/02/22-rdf-syntax-ns#reifies (rdf:reifies) as its predicate.

RDF is out of its place to mandate semantics there ‒ leave it to RDFS and OWL. No other term is restricted in that way.

IS4Code commented 2 months ago

In more practical terms, following one of the examples in Turtle:

PREFIX : <http://example.com/>
:a :name "Alice" .
<< :a :name "Alice" ~ :t >> :statedBy :bob ;
                            :recorded "2021-07-07"^^xsd:date .

Yes, this refers to a particular statement Bob has made, and it must be reified here. However, when speaking in general, why would one need to reify it?

:bob :said <<( :a :name "Alice" )>> .

You don't really need to know that Bob once said something that was a reification of a triple, you are interested in whether he has said anything like that at all. You could narrow it down later if you wish.

Continuing with the analogy with literals, you could run into the same "mistake" as what rdf:reifies solves if literals were allowed in subjects:

:bob :name "Bob" .
"Bob" :shortFormOf "Robert" .
"Bob" :givenBy :bob\'sMother .

This is the sort of situation where you'd want to reify a literal:

:bob :name [
  rdf:value "Bob" ;
  :shortFormOf "Robert" ;
  :givenBy :bob\'sMother
] .

But you don't need to link every literal using rdf:value. Why would you?

gkellogg commented 2 months ago

No use case? What about this?

_:graph :asserts <<(:s :p :o)>> , <<(:s2 :p2 :o2)>> , <<(:s3 :p3 :o3)>> .

I don't want to belabor this, as the indirection of a Triple Term through a reifier is the result of months of discussions in the woking group. There is discussion of creating an rdf:asserts sub-property of rdf:reifies to be able to distinguish about reifications of a triple which is a component of the graph vs. those that are not (necessarily) components of the graph. Generally, statements are made about the reifier, vs the related triple term, because there could be many such statements made, which need to be kept separate. Triple Terms should be an artifact of the abstract syntax and N-Triples/Quads and not generally appear in concrete syntaxes, which should either use the "reified triple" syntax, or annotation syntax.

I like rdf:reifies. I see it as a sub-property of skos:broader, linking a narrow concept to an essentially same broader concept. I am okay with restricting triple terms to the object position of a triple ‒ literals get the same treatment (even though one could certainly imagine them in the other positions as well). But I don't agree with the note in this section:

Every triple with a triple term as its object SHOULD use http://www.w3.org/1999/02/22-rdf-syntax-ns#reifies (rdf:reifies) as its predicate. Every triple whose object is not a triple term SHOULD NOT use http://www.w3.org/1999/02/22-rdf-syntax-ns#reifies (rdf:reifies) as its predicate.

RDF is out of its place to mandate semantics there ‒ leave it to RDFS and OWL. No other term is restricted in that way.

That's why this is informative language (SHOULD), wellformedness constraints belong in RDF Semantics. But, this language shows the general intention of the predicate and triple terms with respect to the data model. Not saying anything about this in RDF Concepts leaves a hole for people who might not delve into RDF Semantics.

IS4Code commented 2 months ago

That's why this is informative language (SHOULD), wellformedness constraints belong in RDF Semantics. But, this language shows the general intention of the predicate and triple terms with respect to the data model. Not saying anything about this in RDF Concepts leaves a hole for people who might not delve into RDF Semantics.

I see, thank you for the clarification and the reassurance my interpretation of the purpose is accurate. Just to be clear, I understand why in (I guess most) practical cases one needs a reifier (not to "pollute" the universally-existing triple term), and I definitely appreciate that approach. To be fair I was even confused about the reason to have triples in the original RDF-star as well when there is rdf:Statement, and now I must say I really appreciate this new distinction since it makes it trivial to switch between rdf:Statement and rdf:reifies with no concerns of semantic wrongdoing.

I also apologize for misinterpreting SHOULD in that paragraph as I had remembered it being stricter than RECOMMENDED, which I now understand from the linked RFC to be synonymous with SHOULD. To provide constructive feedback then, I would suggest rewriting that paragraph to something like this, to make it clearer:

It is RECOMMENDED to use a triple term as the object of a triple when and only when it has http://www.w3.org/1999/02/22-rdf-syntax-ns#reifies (rdf:reifies) as its predicate. Not doing so has implications on the interpretation of such a triple and risks producing data conflicting with other graphs referring to that triple term.

Maybe followed by mentioning that properties with rdfs:range rdf:TripleTerm are fine too. On that note, I am definitely looking forward to seeing triple terms described in RDF 1.2 Schema!