w3c / rdf-star

RDF-star specification
https://w3c.github.io/rdf-star/
Other
119 stars 23 forks source link

Occurrences vocabulary #169

Open rat10 opened 3 years ago

rat10 commented 3 years ago

After long debate it was decided that embedded triples refer to an ("abstract") triple, not to any specific occurrence. Use cases like the seminal provenance use case [0] however need to refer to a specific occurrence to document e.g. when a certain triple was inserted into a certain graph. The draft report so far contains an example [1] that shows how an occurrence can be derived from a triple, using a made up property in the notorious example.org namespace.

The question now is: do we properly define this property as part of the RDF-star vocabulary?

Pro: this is an important use case and we should provide more than just an non-committal example. Con: the property alone is not enough. A complete solution would also have to provide a means to describe the graph in which the triple occurs. Otherwise the provided solution is just as underspecified as RDF standard reification. This however is relatively uncharted territory and touches e.g. the mined area of named graphs.

[0] https://w3c.github.io/rdf-star/cg-spec/2021-04-13.html#the-seminal-example [1] https://w3c.github.io/rdf-star/cg-spec/2021-04-13.html#occurrences-example

rat10 commented 3 years ago

Maybe "uncharted territory" is too pessimistic. It is well defined that an RDF graph is a set of RDF triples. We could provide two properties:

occurrenceOf
inGraph

Even in the absence of a proper term in the RDF vocabulary that denotes a graph we could informally advise that the range of inGraph is an IRI pointing to a set of triples e.g. in a document or a named graph. We are just addressing a set of triples via a containment relation, so no need to get into discussions about what that set means or entails or about what a dataset that eventually contains it means or entails etc etc. What could possibly go wrong?!

hartig commented 3 years ago

I am in favor of adding such a vocabulary.

However, @rat10, you end your description of this issue with the question: "What could possibly go wrong?!" Now, this makes me wonder what the purpose of adding this question is. Is this meant to be a rhetorical question? Do you forsee anything that might "go wrong" if we define such a vocabulary?

rat10 commented 3 years ago

@hartig The sentence "What could possibly go wrong?!" is rethorical and meant to express that "I'm not sure if I have thought this through sufficiently. Right now I don't see any problems but this area is notorious for non-obvious problems."

For example, as I come to think of it: how do we address a triple in the default graph of a dataset? Maybe:

_:x rdfx:occurrenceOf << :a :b :c >>;
    rdfx:inGraph :SomeDatasetIRI .

Would that be correct?

afs commented 3 years ago

As this is likely a common need: << :a :b :c >> rdfx:occursIn :SomeGraphIRI .

rat10 commented 3 years ago

@afs Renaming the property rdfx:inGraph to rdfx:occurrsIn we get

_:x rdfx:occurrenceOf << :a :b :c >>;
    rdfx:occursIn :SomeGraphIRI .

and

<< :a :b :c >> rdfx:occursIn :SomeGraphIRI .

So the rdfs:domain of rdfx:occursIn can be a triple term as well as an IRI or balnk node. Wouldn't that cover both use cases?

[EDIT:] However this could lead to misunderstandings as

_:x rdfx:occurrenceOf << :a :b :c >> .
<< :a :b :c >> rdfx:occursIn :SomeGraphIRI .

might give the impression that _:x occurs in :SomeGraphIRI athough it does neither confirm nor refute such an assumption.

afs commented 3 years ago

Multiple rdfs:domain are combined as "and", not "or".
A nuisance (and schema.org differs by design.)

We could leave it unstated otherwise what is the domain of rdfx:occurrenceOf? Does it include the type of triple terms or is it "usage of"?

My suggestion is not to replace the pair - it is to have a way of directly stating a common case without the blank node being needed.

rat10 commented 3 years ago

@afs

Multiple rdfs:domain are combined as "and", not "or". A nuisance (and schema.org differs by design.)

Hm, didn't know that... So we couldn't properly define the domain anyway as it is blank node or IRI.

We could leave it unstated otherwise what is the domain of rdfx:occurrenceOf? Does it include the type of triple terms or is it "usage of"?

I don't understand. In my understanding the domain of rdfx:occurrenceOfcan only be a triple term. What do you mean with "or is it 'usage of'"?

My suggestion is not to replace the pair - it is to have a way of directly stating a common case without the blank node being needed.

This argument I don't get. Your use case is still covered by my modification and still doesn't need a blank node. In fact I didn't touch your use case at all but changed only the other use case of defining an identifier for the occurrence. That doesn't necessarily need a blank node but some sort of identifier (obviously, as defining such identifier to be able to say things about such an occurrence is the whole purpose).

afs commented 3 years ago

So we couldn't properly define the domain anyway as it is blank node or IRI.

It does not matter about blank nodes or IRIs.

:p rdfs:domain :A .
:p rdfs:domain :B .

then the subject of :p must be both an A and a B not because of rdfs:domain but because it's two asserted statements.

gkellogg commented 3 years ago

Multiple rdfs:domain are combined as "and", not "or". A nuisance (and schema.org differs by design.)

This can be addressed with OWL, although it seems increasingly left to the pedantic to do so. Still, I think it's appropriate to have predicates that define their domain/range as being some kind of embedded triple. (RDF should have created a type to allow a resource used as a graph name to have a range of Graph, too, IMHO).

We could leave it unstated otherwise what is the domain of rdfx:occurrenceOf? Does it include the type of triple terms or is it "usage of"?

My suggestion is not to replace the pair - it is to have a way of directly stating a common case without the blank node being needed.

No opinion.

gkellogg commented 3 years ago

So we couldn't properly define the domain anyway as it is blank node or IRI.

It does not matter about blank nodes or IRIs.

:p rdfs:domain :A .
:p rdfs:domain :B .

then the subject of :p must be both an A and a B not because of rdfs:domain but because it's two asserted statements.

:p rdfs:domain [ a owl:unionOf (:A :B) ] .

Could be used for some private class, but, in this case, if you had :p rdfs:domain :A you could extend the type of a given value be a union of :A and some other class. But, this modeling could be missed by people creating graphs using the property. Using schema:domainIncludes avoids these problems, at the loss of some inference. Maybe the RDFS vocabulary needs such properties.

pchampin commented 3 years ago

First of all, I am in favour of introducing such a vocabulary.

However, the example above is flawed. A graph does not contain occurrences. It is itself a mathematical abstraction, and contains (abstract) triples. Two graphs containing a triple in common contain the same triple; two graphs containing exactly the same triples are actually one and the same graph.

The RDF 1.1 Concepts spec has a dedicated section where it defines the notion of RDF source, which is, in my view, a better candidate for containing triple occurrences.

Actually, maybe we could take this opportunity to mint IRIs for these concepts as well. Something like:

x:Graph            a rdfs:Class.
x:Source           a rdfs:Class.
x:Triple           a rdfs:Class.
x:TripleOccurrence a rdfs:Class.

x:inGraph       a           rdf:Property;
                rdfs:domain x:Triple;           
                rdfs:range  x:Graph.
x:inSource      a           rdf:Property;
                rdfs:domain x:TripleOccurrence;
                rdfs:range  x:Source.
x:hasState      a           rdf:Property;
                rdfs:domain x:Source;
                rdfs:range  x:Graph.
x:hasOccurrence a           rdf:Property;
                rdfs:domain x:Triple;
                rdfs:range  x:TripleOccurrence.
rat10 commented 3 years ago

@pchampin I had thought that this is exactly the kind of semantic ratholes that we don't want to go into. In this context I couldn't care less if a graph is understood as a mathemetical abstraction or as a snippet of RDF in some Turtle file. If I can refer to it by an IRI or a blank node it is up for grabs and I can describe that it contains a given triple, as an occurrence.

I wonder if I can close the Pandora box again that I opened with my careless talk of domain and range. Fact is I made a basic mistake anyway: there are no terms in the RDF vocabulary for blank nodes nor IRIs. So I think we should just leave the domain and range formally undefined and be done with it. The informal description is: every set of triples that is adressable by blank node or IRI is fair game.

Regarding the vocabulary that you propose: we wouldn't want to define so many terms related to the RDF core in the x namespace, and we also wouldn't want to define them in the RDF namespace as that would seem rather encroaching. I propose to leave this extension very low key: two properties, an informal description, and be done with it for this round.

rat10 commented 2 years ago

As suggested by @hartig I'm moving the following discussion here: in #209 @hartig introduces a concrete proposal how users can indicate that a property is a so-called transparency-enforcing property (i.e., quoted triples are meant to be referentially transparent when used in nested triples with such a property; see example in the new text). That proposal works on properties. However as evidenced e.g. by the use cases most of the time we need referentially transparent occurrences. It seems like mixing two orthgonal approaches if to define a reference to a referentially transparent occurrence one has to work both on occurrences and on properties. It's not impossible but it seems twisted. It also introduces the possibility of undesired effects if one wants to use said property on both referentially opaque and transparent occurrences. IMO it would at least in some (probably most) use cases be better if referential transparency could be defined per occurrence. A property (please ignore for now the clumsy wording)

:referentiallyTransparentOccurrenceOf

could define an occurrence as being referentially transparent if, per Olaf's suggestion, the property was declared as transparency enabling:

:referentiallyTransparentOccurrenceOf rdf:type rdf-star:TransparencyEnablingProperty .

Extending the occurrence vocabulary in this sense and adding the type declaration to the axiomatic triples of RDF-star seems like a good idea to me.

While the example

_:a :occurrenceOf << :s :p :o >> ;
    :in <file1.ttl> ;
    dct:creator :alice.

refers to an occurrence of the quoted triple the following example would refer to the interpreted representation of the quoted triple:

_:b :referentiallyTransparentOccurrenceOf << :s :p :o >> ;
    :in <file1.ttl> ;
    dct:creator :alice.

Under OWL entailment and in presence of another statement

:s owl:sameAs :s2

we would be able to entail that

_:b  :referentiallyTransparentOccurrenceOf << :s :p :o >> ;
    :referentiallyTransparentOccurrenceOf << :s2 :p :o >> ;
    :in <file1.ttl> ;
    dct:creator :alice.
hartig commented 2 years ago

@rat10 when carrying over your comment from PR #209 (i.e., https://github.com/w3c/rdf-star/pull/209#issuecomment-926141392) to here, you forgot to include the proposal to extend the vocabulary with the following statement, which I agree would be a natural thing to state.

:referentiallyTransparentOccurrenceOf rdfs:subPropertyOf :occurrenceOf . 

However, one thing that is still missing in your proposal is a definition of the semantics of the :occurrenceOf property. Do you have a proposal for that one or do you suggest we leave it undefined?

rat10 commented 2 years ago

I didn't forget but found it premature. My first goal was to establish if and how individual occurrences can be declared as referentially transparent. We seem to agree on a mechanism to achieve that. Now the fine tuning begins. The term occurrence is used in RDF to refer to a referentially transparent statement as described by the standard reification vocabulary. As this IMO is also a plausible semantics I'd like to leave it that way, introducing as little disruption as possible. But a reference to quoted occurrences seems desirable too. A possible solution would be to define :occurrenceOf as TEP and introduce a further property :quoteOf to refer to referentially opaque occurrences, both as per EXAMPLE 8 to be used together with a second statement using the :in (or maybe :inSource, but that's a further discussion) property to describe their location. Another possibility would be to leave the referential semantics of :occurrenceOf unspecified and define two subproperties :quoteOf and :interpretationOf but that seems a but much...

hartig commented 2 years ago

The term occurrence is used in RDF to refer to a referentially transparent statement as described by the standard reification vocabulary. As this IMO is also a plausible semantics I'd like to leave it that way, introducing as little disruption as possible. But a reference to quoted occurrences seems desirable too. A possible solution would be to define :occurrenceOf as TEP and introduce a further property :quoteOf to refer to referentially opaque occurrences, both as per EXAMPLE 8 to be used together with a second statement using the :in (or maybe :inSource, but that's a further discussion) property to describe their location.

Can you make a concrete proposal for these definitions?

pchampin commented 2 years ago

FWIW, I would be in favor of leaving the semantics of :occurenceOf unspecified (I can't think of any semantic constraint that should be imposed on them).

I find the naming :quoteOf a little odd, since _:x :quoteOf <<:s :p :o>> would mean "_:x is the quote of a quoted triple"...

I don't mind defining a transparency-enabling version of :occurenceOf, which I would propose to call :statingOf [1]. I am not sure this should be a subproperty of :occurenceOf, though, as statings and triple occurences are different beasts IMO.

[1] https://lists.w3.org/Archives/Public/www-rdf-interest/1999Dec/0068.html

rat10 commented 2 years ago

FWIW, I would be in favor of leaving the semantics of :occurenceOf unspecified (I can't think of any semantic constraint that should be imposed on them).

I find the property relatively useless if its semantics had such a gaping hole and it would be a shame to waste the term on it.

I find the naming :quoteOf a little odd, since _:x :quoteOf <<:s :p :o>> would mean "_:x is the quote of a quoted triple"...

Yes, that's right. The naming is not yet perfect. <<:s :p :o>> is a quoted triple and the property name :quoteOf doesn't capture the transformation from triple to occurrence.

I don't mind defining a transparency-enabling version of :occurenceOf, which I would propose to call :statingOf [1]. I am not sure this should be a subproperty of :occurenceOf, though, as statings and triple occurences are different beasts IMO.

The term stating does in my intuition have the connotation of asserting (as in asserted vs unasserted statements) and thus could easily be confused with that orthogonal aspect.

[1] https://lists.w3.org/Archives/Public/www-rdf-interest/1999Dec/0068.html

That is quite some hair splitting going on there ;-) IIUC this discusses yet another aspect - the speach act of asserting an assertion as an event on its own right - and I don't think we should go there.

Let's see what we have so far: we have triples (as types) and occurrences, and we have referential transparency and opacity. The embedded triple per the proposed semantics is a referentially opaque triple (as type). The occurrences we talk about are either referentially transparent or opaque. Instead of the technical terms referential opacity and referential transparency we can also use the more figurative terms quoted and interpreted. As a result we could define two semantically well specified subproperties of :occurrenceOf:

:quotedOccurrenceOf
:interpretedOccurrenceOf

These are not yet a nicely succinct names but at least they seem to capture with enough precision what we are talking about.

As I said above I don't see the need for a semantically underspecified :occurenceOfproperty. Instead we could define the semantics of :occurrenceOf as interpreted, referentially transparent. Thus it would:

Maybe we should leave it at that as actually I'm not sure I see the need for quoted occurrences. OTOH I'm not sure about their uselessness either ;-) So I'm still trying to come up with a better term. What about

:citationOf

as a reference to the quoted, referentially opaque occurrence? Citing something captures both that it actually happened (otherwise it would not be citation but a newly created assertion) and that it is represented verbatim. Looks good to me...

pchampin commented 2 years ago

About "stating"... I agree that it might seem to imply some form of assertion, which of course is not intended. Note however that the same could be argued about rdf:type rdf:Statement in standard reification -- and that's, I guess, the reason for Dan using that verb in the first place.

I could leave live with "occurrence" (ref-transparent) and "citation" (ref-opaque).

I would prefer, however, to have the properties in the opposite direction, i.e. from the quoted triple to the occurrence ("hasOccurrence", "hasCitation"...), because that makes it easier to use with the annotation syntax:

:lizTaylor :marriedTo :richardBurton {| rdf-star:hasOccurrence
    [ :in 1964^^xsd:gYear ],
    [ :in 1975^^xsd:gYear ]
|}.
rat10 commented 2 years ago

About "stating"... I agree that it might seem to imply some form of assertion, which of course is not intended. Note however that the same could be argued about rdf:type rdf:Statement in standard reification -- and that's, I guess, the reason for Dan using that verb in the first place.

I could leave with "occurence" (ref-transparent) and "citation" (ref-opaque).

I assume you meant "live" (and then call it a day ;-) But: great!

I would prefer, however, to have the properties in the opposite direction, i.e. from the quoted triple to the occurrence ("hasOccurence", "hasCittation"...), because that makes it easier to use with the annotation syntax:

:lizTaylor :marriedTo :richardBurton {| rdf-star:hasOccurence
    [ :in 1964^^xsd:gYear ],
    [ :in 1975^^xsd:gYear ]
|}.

Good point. But that doesn't make the standard (non-annotation) syntax obsolete. What about having them both?

EDIT: your :hasOccurrence property lacks the :inGraph aspect. One could however let that default to the local graph, like I proposed in my latest comment on #170 w. r. t. an identifier syntax.

pchampin commented 2 years ago

I assume you meant "live" (and then call it a day ;-) But: great!

yes, I meant "live". I'm not leaving anywhere :-)

I would prefer, however, to have the properties in the opposite direction, i.e. from the quoted triple to the occurrence ("hasOccurence", "hasCittation"...), because that makes it easier to use with the annotation syntax: (..) Good point. But that doesn't make the standard (non-annotation) syntax obsolete.*

of course not

What about having them both?

Of course, defining an owl:inverseOf of rdf-star:hasOccurrence and friends is alwats possible, if we don't mind having a larger vocabulary with some redundancy. My point was: if we keep only one direction, there is a practical argument for keeping that one.

EDIT: your :hasOccurrence property lacks the :in aspect. One could however let it default to the local graph, like I proposed in my comment #170 w. r. t. an identifier syntax.

cf. my proposal above where x:inGraph and x:inSource play exactly that role.

rat10 commented 2 years ago

Maybe I'm missing something but IIUC your proposal for a property optimized for annotation syntax property only works when the referent to the occurrence is defined via one sole property. Therefor if that was to be an occurrence the definition would be either incomplete or could refer to the local graph.

Thinking of it: why not define the annotation syntax as refering to the referentially transparent occurrence in the local graph?

hartig commented 2 years ago

Regarding the following example:

:lizTaylor :marriedTo :richardBurton {| rdf-star:hasOccurence
   [ :in 1964^^xsd:gYear ],
   [ :in 1975^^xsd:gYear ]
|}.

While I agree that the property should be defined in the direction such that it can be used with the annotation syntax, I am highly confused about the example per se. My interpretation of this snippet of Turtle-star is that the triple (:lizTaylor, :marriedTo, :richardBurton) has two occurrences where one of them is in the year 1964 and the other one is in the year 1975. Now, what does it mean for a triple to occur in a year??

Probably my confusion has to do with the fact that it is not entirely clear what the notion of an "occurrence" of a triple actually is; at least, it is not totally clear to me.

@rat10 is this example how you were envisioning how the property rdf-star:hasOccurrence would be used?

hartig commented 2 years ago

EDIT: your :hasOccurrence property lacks the :in aspect. One could however let it default to the local graph, like I proposed in my comment #170 w. r. t. an identifier syntax.

cf. my proposal above where x:inGraph and x:inSource play exactly that role.

I am getting more and more confused by the minute. Can someone define exactly what you mean by "occurrence"; i.e., by the types of things that are meant to be used in the object position of a triple with the predicate rdf-star:hasOccurrence. @rat10 since you are the main proponent of doing something about such "occurrences", can you give me such a definition?

rat10 commented 2 years ago

Regarding the following example:

:lizTaylor :marriedTo :richardBurton {| rdf-star:hasOccurence
   [ :in 1964^^xsd:gYear ],
   [ :in 1975^^xsd:gYear ]
|}.

While I agree that the property should be defined in the direction such that it can be used with the annotation syntax, I am highly confused about the example per se. My interpretation of this snippet of Turtle-star is that the triple (:lizTaylor, :marriedTo, :richardBurton) has two occurrences where one of them is in the year 1964 and the other one is in the year 1975. Now, what does it mean for a triple to occur in a year??

Probably my confusion has to do with the fact that it is not entirely clear what the notion of an "occurrence" of a triple actually is; at least, it is not totally clear to me.

@rat10 is this example how you were envisioning how the property rdf-star:hasOccurrence would be used?

I'm AFK right now, so just trying to clear up confusion but taking the risk that I may well create more....

I think you are correct with your observation wrt to PAs use of the :in property and I glossed over that as I thought it's an obvious glitch. If this use of :in was indeed meant to refer to the :in as defined alongside :occurrenceOf - if it is not meant to refer to ex:in but rdf-star:in so to say - then it is indeed used wrongly and maybe PA can replace it by something like ex:during. Only under that assumption my following comments on PA can be understood: that the occurrence is not completely specified and can not be specified by simple inverseOfs of :occurrenceOf and :in but only by a combination of the two in which :in defaults to a predefined value, preferably the local graph

hartig commented 2 years ago

But shouldn't such an ex:during property be a property of the quoted triple rather than of the occurrence of the triple? (and I am assuming here that ex:during can be a TEP)

pchampin commented 2 years ago

"In" was a poor choice of term, because it seems to be part of the locution "occurrence in ..." , which was not my intention. It was not at all related to rdf-star:inGraph or anything like that.

Another source of confusion is that I was using hasOccurrence as a TEP here, as suggested by @rat10 above, and contrarily to my original use of it.

Consider this new example, which hopefully is clearer:

:lizTaylor :marriedTo :richardBurton {| rdf-star:hasOccurrence
    [ :since "1964"^^xsd:gYear; :until "1974"^^xsd:gYear  ],
    [ :since "1975"^^xsd:gYear ]
|}.

Now to try and answer @hartig's question above: what kind of thing is denoted by the two blank nodes in this example? My answer would be: "(the fact of) Liz Taylor being married to Richard Burton".

I am aware that one might interpret them subtly differently, e.g. as "(the claim of) Liz Taylor being married to Richard Burton", which would make the graph above non-sensical (or at least mean something totally different).

The way I see it, we may 1) define several variants of the "transparent occurrence" property to account for those subtle differences; 2) define only one such property, and document the fact that its semantics is purposefully broad; 3) refuse all together the rathole of option 1 and the fuzziness of option 2, and leave it to the community to define their own properties for their own use-cases

hartig commented 2 years ago

Now to try and answer @hartig's question above: what kind of thing is denoted by the two blank nodes in this example? My answer would be: "(the fact of) Liz Taylor being married to Richard Burton".

But isn't the (stated/claimed) fact of "Liz Taylor being married to Richard Burton" actually be captured by the triple (:lizTaylor, :marriedTo, :richardBurton)? I have the suspicion that to you, @pchampin, and @rat10 have a different understanding of what an "occurrence" is (or, maybe it's just me who doesn't get it ;) It seems to me that your understanding is about occurrences of relationships or facts that are stated/claimed by a triple whereas @rat10 wants to be able to talk about occurrences of the triples themselves (e.g., the triple (:lizTaylor, :marriedTo, :richardBurton) that is in a particular Turtle file). I see these as different things, but I may be wrong. @rat10 can you clarify what you mean by "occurrence"; i.e., by the types of things that are meant to be used in the object position of a triple with the predicate rdf-star:hasOccurrence.

rat10 commented 2 years ago

I thought we have a common understanding of what an occurrence is and @hartig you are describing my understanding correctly. That is also the way in which the RDF specs use the term and the way it was used when we discussed this terminology with @pfps last year. I'm also puzzled by @pchampin's concerns. However it was a hectic exchange and maybe we got a little ahaed of ourselves today. I'll have to think through the options layed out above again and then maybe make a rounded proposal. My working hypotheses is that it would be great and make perfect sense in more than one way if the annotation syntax would refer to the local transparent occurrence but I fear I'll be again alone with this (which of course somehow limits my willingness to put work into this - we'll see... ).

pchampin commented 2 years ago

EDIT: NB: I wrote this before reading @rat10's answer above.

My initial understanding of "occurence" was indeed: "the occurrence of a triple in an RDF source (turtle file, Triple store...)". As such, "hasOccurence" and "occurenceOf" would be non-TEPs, because an occurrence of ":superman :can :fly" must be distinguished from an occurrence of ":clark :can :fly".

In the comment above (near the end), @rat10 proposed to define

I replied that I could live with it. That's the agreement that I thought we had found, but I must say that @rat10's comment above makes me doubt: I don't understand how "occurrences of facts" should be specified as being in a graph... The thing denoted by the blank node in my example, which involves Liz Taylor and Richard Burton, and started in 1964, did not occur in a graph, it occurred in Montreal.

TallTed commented 2 years ago

RDF allows for "Schema Last," but problems arise when considering example data without considering what its schema might be. At a minimum, some care must be taken about entity types, relationship types, and relationship values.

A wedding is an occurrence, an event, with a datestamp, if not a timestamp. Of course, it also has a duration, but this is typically measured in hours if not minutes. Liz and Richard had two of these with each other, in 1975 and 1964.

A marriage is less of an event, and more a state of being, with a start, a duration, and an end, either by divorce or death (both of which are events, as was the wedding and each party's birth). Liz and Richard had two of these with each other, running from 1964-1974, and 1975-1976.

Today, it would not be sensible to say "Liz is married to Richard," but it would be to say "Liz was married to Richard", though this state did not pertain at the time of either of their deaths.

All of which is to say -- sometimes, what seems a simple example is not -- and likewise what seems a complex example may not be so! Keeping these sorts of things straight will help a great deal in discussing examples which are meant to bring clarity to complex discussions.

rat10 commented 2 years ago

EDIT: NB: I wrote this before reading @rat10's answer above.

My initial understanding of "occurence" was indeed: "the occurrence of a triple in an RDF source (turtle file, Triple store...)". As such, "hasOccurence" and "occurenceOf" would be non-TEPs, because an occurrence of ":superman :can :fly" must be distinguished from an occurrence of ":clark :can :fly".

In the comment above (near the end), @rat10 proposed to define

* "`occurenceOf` as interpreted, referentially transparent" → I read this as "an occurrence of the fact described by the triple"

* "`citationOf` as a reference to the quoted, referentially opaque occurrence" → I read this as "an occurrence of the triple" as described above.

I replied that I could live with it. That's the agreement that I thought we had found, but I must say that @rat10's last comment makes me doubt: I don't understand how "occurrences of facts" should be specified as being in a graph... The thing denoted by the blank node in my example, which involves Liz Taylor and Richard Burton, and started in 1964, did not occur in a graph, it occurred in Montreal.

It seems to me that you introduce an aspect that we have so far not been discussing at all: the long known problem of identification semantics in RDF i.e. disambiguating if a URI is used to indicate a (web) resource or denote what that (web) resource refers to, a.k.a. "social meaning", "the identity crisis of the semantic web", httpRange-14, Cool URIs etc. I would think that that problem is out of scope. The distinction I make between :occurrenceOf and :citationOf is that the latter is referentially opaque while the former is referentially transparent, and nothing else. Vulgo, the latter doesn't support any entailments, the former does. In both cases the referend is a statement not as a type but as it occurs in a graph. Practically the source of the confusion might be that your example about Liz Taylor and Richard Burton uses :inin a totally different way than Example 8. I'd suggest that we change the name of the :in property to :inGraph or :inSource to avoid such confusion in the future as :inis just too broad and thereby invites misunderstandings. I'd prefer :inGraph but :inSource seems to be the more prudent approach and more capable of securing a majority.

In my understanding we already have a clear path from referentially opaque types to referentially transparent occurrences: << :s :p :o >> is a referentially opaque type. Now paraphrasing Example 8:

_:b :occurrenceOf << :s :p :o >> ;
    :inSource :someSnippetOfRDF .

_:b is a reference to the referentially transparent occurrence of << :s :p :o >>in :someSnippetOfRDF. In other words _:brefers to the meaning of << :s :p :o >> - if there is eg an owl:sameAs statement relating :sand :s2then _:brefers to << :s2 :p :o >>as well. In contrast the proposed :citationOfwould always refer only to << :s :p :o >>, not to any entailed co-denotations like << :s2 :p :o >>. That should all be rather boring and all too clear by now, I hope.

The question at hand now is how the annotation syntax fits into this picture. IMO it would be wise to define the annotation syntax as refering to the referentially transparent occurrence. I suggest that for example:

:s :p :o {| :v :w ;  
            :y :z |}

would be expanded to

:s :p :o .
_:b :occurrenceOf << :s :p :o >> ;
    :inSource <> .   
    :v :w ;
    :y :z .

assuming that

The relation between the two syntaxes and their different semantics is clearly defined by the syntax sketched in Example 8 and the TEP semantics of :occurrenceOf. Defaulting to the local graph is IMO the only sensible design for the shortcut syntax.

Applying this to the example from above we can get rid of some blank nodes as those are contained in the expansion. Rather unsurprisingly

:lizTaylor :marriedTo :richardBurton {| 
    :since "1964"^^xsd:gYear ; 
    :until "1974"^^xsd:gYear ;
    :since "1975"^^xsd:gYear
|}.

wouldn't meet the intended meaning as it would expand to

:lizTaylor :marriedTo :richardBurton .
_:ltrb :occurrenceOf << :lizTaylor :marriedTo :richardBurton >> 
       :inSource <> ;
       :since "1964"^^xsd:gYear ; 
       :until "1974"^^xsd:gYear ;
       :since "1975"^^xsd:gYear .

However the following snippet of annotation syntax

:lizTaylor :marriedTo :richardBurton {| 
    :since "1964"^^xsd:gYear ; 
    :until "1974"^^xsd:gYear
|}.
:lizTaylor :marriedTo :richardBurton {| 
    :since "1975"^^xsd:gYear
|}.

would expand to the intended result

:lizTaylor :marriedTo :richardBurton .
_:ltrb1 :occurrenceOf << :lizTaylor :marriedTo :richardBurton >> 
       :inSource <> ;
       :since "1964"^^xsd:gYear ; 
       :until "1974"^^xsd:gYear .
_:ltrb2 :occurrenceOf << :lizTaylor :marriedTo :richardBurton >> 
       :inSource <> ;
       :since "1975"^^xsd:gYear .

and IMO this is exactly how it should be and the best way to implement multisets in RDF.

Please note that this specific problem of the same statement with different annotations is the hardest problem of all that we have to solve as it has to dance around the set based semantics of RDF (and does it the same way as RDF standard reification).

This would realign RDF-star with the semantics inherent in the seminal example and a majority of the use cases. In other words:

This would extremely mitigate a risk that I see w.r.t. to the current state of the proposed semantics: namely that the proposed semantics gets ignored by a lot of published RDF-star because it makes it so hard to express the main stream use cases. I know we have different opinions on the severity of that problem but I hope we can agree that the above proposal would make it much easier to avoid. IMO this might be good enough to prevent users from ignoring the proposed semantics(still: fingers crossed...).

TallTed commented 2 years ago

Addressing nothing else here --

[@rat10] I'd suggest that we change the name of the :inproperty to :inGraphor :inSource to avoid such confusion in the future as :inis just too broad and thereby invites misunderstandings. I'd prefer :inGraphbut :inSource seems to be the more prudent approach and more capable of securing a majority.

I agree with your conclusion, :inSource. The reason is that :inGraph feels restrictive and implies a Named Graph, which may or may not be in play, while :inSource is clearly more flexible and allows for any RDF Source. Whether or not this predicate propagates beyond this discussion, :inSource will be applicable to more usage scenarios.

pchampin commented 2 years ago

The distinction I make between :occurrenceOfand :citationOfis that the latter is referentially opaque while the former is referentially transparent, and nothing else.

Ok, fine by me. That corresponds to option 2 at the end of my comment above. So many different things can be considered an "transparent occurrence" of a triple (events, state of beings, statings...).

But that contradicts your proposal to automatically include an :inSource property when expanding the annotation syntax, because not many different things can occur in an RDF source. Take your example above:

_:ltrb1 :occurrenceOf << :lizTaylor :marriedTo :richardBurton >> 
       :inSource <> ;
       :since "1964"^^xsd:gYear ; 
       :until "1974"^^xsd:gYear .

What does _:ltrb1 denote? Something that started in 1964, ended in 1974, and occurred in an RDF source?? That cannot be Liz and Richard's first marriage (it did not occur in any RDF source). That cannot be the stating of this triple either (no RDF source existed in 1964). From where I stand this graph is inconsistent.

More formally:

pchampin commented 2 years ago

@TallTed

I agree with your conclusion, :inSource. The reason is that :inGraph feels restrictive and implies a Named Graph

I agree that :inSource is better, but for a very different reason ;-) :inGraph does not refer, for me, to Named Graph, but to RDF graph, which is a very abstract thing that we do not interact with directly. We interact with, for example, Turtle files or RDF/XML HTTP resources, which are RDF sources.

TallTed commented 2 years ago

@rat10 and others --

Note that the second marriage of Liz & Richard ended the year after it began, so the examples above should have an :until "1976"^^xsd:gYear, e,g. --

_:ltrb2 :occurrenceOf << :lizTaylor :marriedTo :richardBurton >> 
       :inSource <> ;
       :since "1975"^^xsd:gYear ;
       :until "1976"^^xsd:gYear .
rat10 commented 2 years ago

The distinction I make between :occurrenceOfand :citationOfis that the latter is referentially opaque while the former is referentially transparent, and nothing else.

Ok, fine by me. That corresponds to option 2 at the end of my comment above. So many different things can be considered an "transparent occurrence" of a triple (events, state of beings, statings...).

But that contradicts your proposal to automatically include an :inSource property when expanding the annotation syntax, because not many different things can occur in an RDF source. Take your example above:

_:ltrb1 :occurrenceOf << :lizTaylor :marriedTo :richardBurton >> 
       :inSource <> ;
       :since "1964"^^xsd:gYear ; 
       :until "1974"^^xsd:gYear .

What does _:ltrb1 denote? Something that started in 1964, ended in 1974, and occurred in an RDF source?? That cannot be Liz and Richard's first marriage (it did not occur in any RDF source). That cannot be the stating of this triple either (no RDF source existed in 1964). From where I stand this graph is inconsistent.

More formally:

* I would expect that `:since` and `:until` have a common domain (call it `:StateOfBeing`)

* I would expect that `:inSource` has another domain (call it `:Stating`, or `:Statement`, or `:Assertion`)

* I would expect that both domains are disjoint, which leads to a contradiction.

You are just repeating your argument. Would you please comment on the first paragraph of my answer, i.e. that the problem you point out is well known under various monikers like "identity crisis", "httpRange-14" etc and that I consider it out of scope. Solutions have been proposed i.e. in CoolURIs and such solutions would be applicable to references to occurrences like _:ltrb1 in the example above. In practice to my knowledge they are seldomly used as other means like vocabularies do provide good enough disambiguation in practice. If all such means are insufficient more explicit modelling involving some specific vcabulary can be employed.

An example of how my proposal above can be extended with more precise identification:

_:ltrb1 :occurrenceOf << :lizTaylor :marriedTo :richardBurton >> 
       :inSource <> ;
       :denotes [ :since "1964"^^xsd:gYear ; 
                  :until "1974"^^xsd:gYear ] ;
       :indicates [ :source :wikipedia ] .

But this is beyond the current task to define a way to refer to some occurrence by means of the annotation syntax and is indeed based on it. We may even decide to define subproperties of :occurrenceOf like :indicatedByOccurrenecOfand :denotedByOccurrenceOf, but nonetheless we will have to define the basic mechanism of how to refer to an occurrence in the annoattion syntax first. My proposal for that is based on the standard syntax to which the problem you point out applies just as well. I would like you to comment on my proposal above as what it is and tries to be. If you want to discuss the problem of identification semantics in RDF please do so in a separate issue as it concerns some more areas of RDF-star.

hartig commented 2 years ago

@rat10 The aim to define a vocabulary for describing occurrences of triples/statements (?) based on RDF-star (i.e., the topic of this issue) and your idea to re-purpose the Turtle-star annotation syntax for capturing such descriptions more succinctly are separate things. In other words, the discussion of the vocabulary should not be intertwined with matters related to the annotation syntax.

So, to continue talking about the vocabulary, I am with @pchampin.There is still no clearly articulated understanding of what exactly the type of thing is that is meant to be used in the subject position of a triple with the predicate rdf-star:occurrence. @pchampin's recent comment makes clear that there is ambiguity in the examples. I don't think that this ambiguity has anything today with the httpRange-14 issue or Cool URIs (after all, the discussion here is completely orthogonal to the fact that HTTP URIs can be used as Web addresses; in fact, the examples here do not even use URIs in the place that our discussion is concerned with). Even if it would, this does not mean we should throw our hands up and simply introduce the property rdf-star:occurrence without saying what its intended domain is. Unless we have a definition of what this domain is, I don't see any value in introducing this property. @rat10 I don't think you have provided such a definition in this thread, but rather referred to mailing list discussions, etc. For the purpose of making progress here, can you please provide a concrete proposal of what the definition of the domain of rdf-star:occurrence should look like.

rat10 commented 2 years ago

@hartig

@rat10 The aim to define a vocabulary for describing occurrences of triples/statements (?) based on RDF-star (i.e., the topic of this issue) and your idea to re-purpose the Turtle-star annotation syntax for capturing such descriptions more succinctly are separate things. In other words, the discussion of the vocabulary

The vocabulary terms :occurrenceOf and :in have been part of the draft report for a few months now. Please explain what you think is still missing.

should not be intertwined with matters related to the annotation syntax.

However the question how occurrences are refered to in the annotation syntax came up recently and that is what my recent proposal addresses.

So, to continue talking about the vocabulary, I am with @pchampin.There is still no clearly articulated understanding of what exactly the type of thing is that is meant to be used in the subject position of a triple with the predicate rdf-star:occurrence.

I assume that you mean rdf-star:occurrenceOf. If not I wouldn't know what you refer to.

@pchampin's recent comment makes clear that there is ambiguity in the examples. I don't think that this ambiguity has anything today with the httpRange-14 issue or Cool URIs (after all, the discussion here is completely orthogonal to the fact that HTTP URIs can be used as Web addresses; in fact, the examples here do not even use URIs in the place that our discussion is concerned with). Even if it would,

Trust me, it does. And your concern w.r.t. blank nodes is unfounded.

this does not mean we should throw our hands up and simply introduce the property rdf-star:occurrence without saying what its intended domain is. Unless we have a definition of what this domain is, I don't see any value in introducing this property. @rat10 I don't think you have provided such a definition in this thread, but rather referred to mailing list discussions,

Did I? I rather thought I had presented a reasonably succinct and self-contaimed proposal w.r.t. to the annotation syntax and the whole topic of annotating occurrences. OTOH I'm not aware of any other attempt to reconcile the original purpose of RDF* and what the proposed semantics transformed it into in an equally balanced fashion. So maybe you should take a second look at it.

etc. For the purpose of making progress here, can you please provide a concrete proposal of what the definition of the domain of rdf-star:occurrence should look like.

The domain of rdf-star:occurrenceOf is rdf:Statement. I hope that answers your question and allows us to make progress.

pchampin commented 2 years ago

The domain of rdf-star:occurrenceOf is rdf:Statement. I hope that answers your question

I think it does. Now, referring to RDF11-MT (https://www.w3.org/TR/rdf11-mt/#reification):

The subject of a reification [i.e. the subject of rdf:type rdf:Statement] is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax.

Can we agree that the marriage of two persons is not "a concrete realization of an RDF triple, such as a document in a surface syntax"? In which case my example above would be wrong. I agree that this could be fixed by adding yet another intermediary node (as you proposed above), but I thought you found it already too cumbersome with one such intermediary node... And what would be the benefit of inserting automatically the first intermediary node with the annotation syntax, if the user still needed to add one?

In other words, if one has to write


:lizTaylor :marriedTo :richardBurton {|
       :denotes [ :since "1964"^^xsd:gYear ; 
                  :until "1974"^^xsd:gYear ] ;
       :indicates [ :source :wikipedia ] 
|}
``
why not let `:denotes` and `:indicates` apply directly on quoted triples, each with their own semantics and opacity/transparency?
rat10 commented 2 years ago

The domain of rdf-star:occurrenceOf is rdf:Statement. I hope that answers your question

I think it does. Now, referring to RDF11-MT (https://www.w3.org/TR/rdf11-mt/#reification):

The subject of a reification [i.e. the subject of rdf:type rdf:Statement] is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax.

Your quote is not correct. Where you let it end with a full stop it does actually continue. The correct quote is:

The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object.

and from the last sub-sentence it is clear that the distinction the text wants to make is not between denotation and indication but between triple (as type) and occurrence. Therfore according to the spec rdf:Statement describes rather precisely an occurrence in the sense that I see useful: as occurring in some source but referring to the interpretation, not the literal representation (the latter is how the proposed semantics defines embedded triples). The spec says nothing about the orthogonal distinction between denotation and indication - certainly not in the refication section but also not in any other place IIRC. If @pfps has another take on this I hope he speaks up. Otherwise I consider your concern unfounded.

We could define a class :Occurrence as

to make sure that occurrences derived from embedded triples and equipped with a notion of source are disambiguated from occurrences that are declared via the standard reification vocabulary and unfortunately lack the syntax to declare their source (and are therefore 'underspecified' as @pfps called it IIRC).

By symmetry we could define a class :Citation as

to declare referentially opaque occurrences. I'm a bit undecided about this one right now - maybe it makes things too complicated. OTOH I see no better way to cite what somebody said in a specific moment, which is indeed a not uncommon use case. Sometimes it's good to follow the principle of symmetry even if one can't envision any application. It might however not be trivial to define the formal model-theoretic semantics and I can't be of help there.

Can we agree that the marriage of two persons is not "a concrete realization of an RDF triple, such as a document in a surface syntax"? In which case my example above would be wrong. I agree that this could be fixed by adding yet another intermediary node (as you proposed above), but I thought you found it already too cumbersome with one such intermediary node... And what would be the benefit of inserting automatically the first intermediary node with the annotation syntax, if the user still needed to add one?

In other words, if one has to write

:lizTaylor :marriedTo :richardBurton {|
       :denotes [ :since "1964"^^xsd:gYear ; 
                  :until "1974"^^xsd:gYear ] ;
       :indicates [ :source :wikipedia ] 
|}

why not let :denotes and :indicates apply directly on quoted triples, each with their own semantics and opacity/transparency?

[I changed the quote marks to what I assume was your intention]

@pchampin, you have been introducing this issue, not me. I was just trying to point at a more appropriate approach to tackling it. If you think the RDF-star vocabulary should be extended to accomodate differentiations between indication and denotation, go ahead, feel free to take my stub or anything else, and make a proposal.

However I would advice against it as the problem is much bigger than references to occurrences alone. Please note that the above code snippet doesn't make any declarations if the IRIs :lizTaylor and :richardBurton (let's assume they belong to the well-known ex namespace) are denotng or indicating their subject. So we might have stated that two webpages were married before the web existed. In that (non)sense all of RDF, including all our examples, is full of contradictions and it is a miracle that the semantic web achieved anything at all. Or rather it is the advantage of engineers over logicians that they are trained at disambiguating theoretical problems from practical ones.

To anyone still not convinced that identification on the semantic web - which is not disambiguating indication and denotation - is a very thorny problem at the very heart of the semantic web (and indeed one that you could throw at any RDF-related proposal to accuse it of creating contradictions, including the RDF specs themselves), may I suggest to read up on it in a lively treatment by Harry Halpin in his dissertation, "Social Semantics - The Search for Meaning on the Web" from 2012, Section 4.1.

@pchampin, you and me discussed this issue a few years ago on semantic-web@w3.org so I know for sure that you are very well informed about its nature and the deepness of the problem. Perhaps not surprisingly I was as unconvinced about the solution you proposed then (create two different identifiers for everything, one indiacting, the other denoting), as I am about the idea of creating a second property for all properties that you support now as the easy way from referentially opaque types to referentially transparent occurrences. I am surprised though that you bring this problem up at this very moment when I'm proposing a solution that reconciles the proposed semantics with the "expected" semantics (the latter being the one that is inherent in all examples predating this CG) by letting each have its own syntax, complemented by a clear path from one to the other via the occurrence vocabulary. It still needs some fleshing out but wouldn't you agree that this apprach promises to benefit everybody?

pchampin commented 2 years ago

Your quote is not correct. Where you let it end with a full stop it does actually continue. The correct quote is:

The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object.

and from the last sub-sentence it is clear that the distinction the text wants to make is not between denotation and indication but between triple (as type) and occurrence.

Indeed. (btw, I consider that the type-occurrence distinction is not a topic of disagreement, that's why I omitted that part).

Therfore according to the spec rdf:Statement describes rather precisely an occurrence

As opposed to a type, yes.

in the sense that I see useful:

... and this is where I am not following you. See below.

as occurring in some source but referring to the interpretation, not the literal representation (the latter is how the proposed semantics defines embedded triples).

this might be your reading of "conrete realization of an RDF triple", but it is not mine. That part of the text does not refer to any interpretation (which would be required to consider the denotation of a triple, because the same triple may have different denotations in different interpretations...). OTOH the term "syntax" is explicitly used (in the following), as well as the reference to "RDF triple", which is an element of the abstract syntax.

Let's look at the following sentence from the spec: "This supports use cases where properties such as dates of composition or provenance information are applied to the reified triple." Yes, those use cases!...

The spec says nothing about the orthogonal distinction between denotation and indication

Neither did I, by the way. As far as I can tell, you came up with the term "indication" without providing a definition.

(...) Otherwise I consider your concern unfounded.

Let's agree to disagree, then.

rat10 commented 2 years ago

Your quote is not correct. Where you let it end with a full stop it does actually continue. The correct quote is: The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object. and from the last sub-sentence it is clear that the distinction the text wants to make is not between denotation and indication but between triple (as type) and occurrence.

Indeed. (btw, I consider that the type-occurrence distinction is not a topic of disagreement, that's why I omitted that part).

Yeah, one has to be careful with quotes. Pulling them out of context or shortening them without proper indication, even changing punctuation, can lead to very misleading mis-representations.

Therfore according to the spec rdf:Statement describes rather precisely an occurrence

As opposed to a type, yes.

in the sense that I see useful:

... and this is where I am not following you. See below.

as occurring in some source but referring to the interpretation, not the literal representation (the latter is how the proposed semantics defines embedded triples).

this might be your reading of "conrete realization of an RDF triple", but it is not mine. That part of the text does not refer to any interpretation (which would be required to consider the denotation of a triple, because the same triple may have different denotations in different interpretations...). OTOH the term "syntax" is explicitly used (in the following), as well as the reference to "RDF triple", which is an element of the abstract syntax.

RDF 1.1 Semantics, D.1 Reification confirms my reading, any other references to syntax that the spec makes notwithstanding:

Reification is not a form of quotation. Rather, the reification describes the relationship between a token of a triple and the resources that the triple refers to.

And it continues:

The value of the rdf:subject property is not the subject IRI itself but the thing it denotes, and similarly for rdf:predicate and rdf:object. For example, if the referent of ex:a is Mount Everest, then the subject of the reified triple is also the mountain, not the IRI which refers to it.

It couldn't be any clearer.

Let's look at the following sentence from the spec: "This supports use cases where properties such as dates of composition or provenance information are applied to the reified triple." Yes, those use cases!...

I have no idea what you want to express with this quote, and comment. What I note is that it isn't specific about the referent of such provenance information: the triple itself or what it refers to - again the same ambiguity inherent to identification on teh semantic web. What I would advice to not read into it is that refication can only be used for provenance. We had a discussion with Pat Hayes last year on either the CG mailing list or semantic-web@w3.org (I don't remember precisely but I can look that up if necessary) where he cautioned that what we can't do is annotate statements to the effect of re-voking them as that would run against the monotonicity of RDF. Everything else is fair game. [EDIT: I found the mail, and I remembered it wrongly. Pat does describe the issue in more prudent terms, saying that provenance annotations are unproblematic but everything that might change the truth value of the statement being annotated has to be treated with great care.]

The spec says nothing about the orthogonal distinction between denotation and indication

Neither did I, by the way. As far as I can tell, you came up with the term "indication" without providing a definition.

You introduced the topic, I introduced the terms "indicate" and "denote" only to discuss it, and I made it very clear that I find the concern you voiced unfounded and the discussion out of place.

(...) Otherwise I consider your concern unfounded.

Let's agree to disagree, then.

That would presuppose some real discussion. So far you've not provided any useful arguments w.r.t. the topic at hand: my proposal for a semantics for the annotation syntax.

pchampin commented 2 years ago

This was discussed during today's call: https://w3c.github.io/rdf-star/Minutes/2021-10-29.html#r01