w3c / rdf-star

RDF-star specification
https://w3c.github.io/rdf-star/
Other
120 stars 23 forks source link

Embedded Quads: Should RDF* allow terms to be included in the graph position? #49

Closed blake-regalia closed 3 years ago

blake-regalia commented 3 years ago

The purpose of this issue is to consider what are the merits and what are the drawbacks of allowing a term to appear in the graph position of an embedded triple. In other words, should RDF* deal explicitly in triples, or should embedded triples be generalized to embedded quads?

For example:

<< :a :b :c :graph >> :p1 :o1 .

Now I can certainly imagine legitimate use cases for embedded quads, but have not dedicated the time to really think this over. So perhaps instead, I will just open the discussion with some basic implications and others can chime in whether they see utility or hinderance:

klinovp commented 3 years ago

I support this and in fact the implementation in Stardog uses annotations on quads, not triples. I mentioned the motivation in https://github.com/w3c/rdf-star/issues/33 but that issue was primarily about PG representation, not named graphs. The main motivation for quads is to allow RDF* graphs like

:g { << :a :p :b >> :label "hello" }

This is a valid TriG in Stardog which means that the annotation on :a :p :b is a quad in :g. For us it's important to be able to store RDF triples in NGs, just as regular RDF triples.

But then since Stardog implements the PG semantics the question, of course, becomes: what's the graph for :a :p :b (since it's asserted)? We made the simplifying assumption that it's also :g so the TriG* snippet above is a sugar for

:g { :a :p :b . << :a :p :b >> :label "hello" }

That works pretty well for us so far. Every named graph in RDF is treated like a container for RDF triples, some of which can be over embedded triples. We can move embedded triples with annotations freely across graphs and that's pretty great. Not allowing people to move some triples to an NG only because they use annotations would be a severe usability limitation.

I can see reasonable arguments that it might be useful to store :a :p :b in :g1 but << :a :p :b >> :label "hello" in :g2. We just didn't have enough demand for that use case to extend the syntax even further (e.g. << :a :p :b :g1 >> :label "hello" like you suggested). That'd require similar changes to SPARQL* syntax too.

I can imagine that named graphs could create problems for the reification-based semantics. That has not been a concern for us at all.


UPD: This comment isn't really about embedded quads but more about storing RDF* triples in named graphs. See the next comment by @pchampin for the correction.

pchampin commented 3 years ago

Just for the record: I don't think that the requirementrs expressed by @klinovp require to extend the notion of embedded triples into embedded quads. In fact, I believe that embedded quads would create problems with those requirements.

More precisely:

:g { << :a :p :b >> :label "hello" }

This is a valid TriG* in Stardog which means that the annotation on :a :p :b is a quad in :g.

Correct; the subject of this quad, though, is an embedded triple.

For us it's important to be able to store RDF* triples in NGs, just as regular RDF triples.

Which the current abstract syntax allows (https://w3c.github.io/rdf-star/rdf-star-cg-spec.html#dfn-dataset).

But then since Stardog implements the PG semantics the question, of course, becomes: what's the graph for :a :p :b (since it's asserted)?

I think the question is incorrectly phrased. By definition, a triple has a subject, a predicate and an object, but no graph. Asking what its graph is makes no sense. The question should be "in which graph is :a :p :b asserted?" where :a :p :b is still a triple.

Allowing RDF* to have embedded quads would be problematic in PG mode.

:g1 { << :a :p :b :g2 >> :label "hello" }

In that example, in which graph would :a :p :b be asserted? :g2 only? :g1 only? both?

I can see reasonable arguments that it might be useful to store :a :p :b in :g1 but << :a :p :b >> :label "hello" in :g2.

This is trivial in SA mode, of course, still with embedded triples.

:g1 { :a :p :b  }
:g2 { << :a :p :b >> :label "hello" }
pchampin commented 3 years ago

I understand how this might look like a natural generalization of embedded triples, but I think this is a much more disruptive change.

In RDF in general, I don't see quads as a generalization of triples, they have a very different nature.

So while I can envision the use-cases that << :a :b :c >> :d :e allows to solve, I really have no idea what use I could make of << :a :b :c :d >> :e :f.

I would go even further: with embedded triples, we have an opportunity to make explicit the relationship between a graph name and the triples in this graph, something like:

:g1 :rel1 << :a :b :c >>, << :d :e :f >>.
:g2 :rel2 << :a :b :c >>, << :h :i :j >>.
klinovp commented 3 years ago

Yes, thanks, I meant exactly "in which graph is :a :p :b asserted?". So perhaps this issue is not the right place for my comment, sorry about that! I care a lot more about asserting RDF* triples in named graphs than embedded quads.

rat10 commented 3 years ago

Pierre-Antoine, I'll use a comment on your comment to chime in ;-)

I was very vocal about the need for a graph identifier or at least the notion of a default graph that an embedded triple refers to (which would of course be the local graph that the embedded triple occurs in). The reason being that one of the main usecases of RDF is provenance. Recording provenance of a statement however only makes sense w.r.t. to the occurrence of a statement in a graph. It doesn't make sense for all statements of a certain type or for the type itself (the latter would be a special case with some merit, but not what generally is associated with statement provenance). In this respect I totally stand to what I said before: RDF embedded triples should have an optional fourth element, the graph identifier. However, I don't stand to the second part anymore that it should be defined that in absence of such an identifier the embedded statement is refering to a statement occurrence in the same, local, graph, because there are other usecases as well.

Pierre-Antoine, you made a comment during the last call that got me thinking. You rightly pointed out that an embedded triple might be understood as an IRI that means the same wherever it occurs. And indeed, while this view doesn't support the provenance usecase, it does support a very important other usecase: the prioratization of relations in complex n-ary relations. Many relations can't be adequately expressed by a simple relation between some A and some B. They need more detail and more elaborate structures. Modelling n-ary relations in RDF quickly gets unwieldy: lots of blank nodes and indirections, no clear start and end of an information construct, no boundaries except the coarse grained graph formalism. This is where Property Graphs really shine: the primary relation between A and B as the main topic of discourse clearly stands out, immediatly recognizable, whereas all secondary detail is subdued. However, in contrast to a star-shaped structure in RDF, the secondary information is very close by, and very distinctly related to the primary relation.

This support of a common modelling need a big plus of Property Graphs, and it has very little to do with reification. It is a syntactic means to disambiguate a primary feature from secondary detail. Figuratively speaking, it is very much acting on the inside. In this case a graph identfier wouldn't help much, it might even be harmful. Reification is, more abstractly seen, usually introducing a meta level where one speaks about something else from a new standpoint, from a very orthogonal perspective, figuratively speaking: from the outside. Here the graph identifier is important.

So we might need embedded triples AND embedded quads. An embedded quad could refer to the local graph by the ususal '<>' syntax so that it doesn't have to spell out the full graph name every time, like so: << :a :b :c <> >> but an embedded triple would refer to the triple type, just like any IRI refers to something in the interpretation domain and is supposed to mean the same everywhere it is used.

I understand how this might look like a natural generalization of embedded triples, but I think this is a much more disruptive change.

In RDF in general, I don't see quads as a generalization of triples, they have a very different nature.

* As the building block of graphs, triples are covered by  RDF's semantics. Each triple (_s,p,o_) has a meaning, it makes a _statement_ ("the thing named _s_ is in the relation named _p_ with the thing named _o_").

* Quads, on the other hand, make no statement. Depending on the [chosen semantics for datasets](https://www.w3.org/TR/rdf11-datasets/), a quad (_s,p,o,g_) may or may not entail the triple (_s,p,o_), it may mean that that statement (_s,p,o_) was made at the address _g_, or made about the thing named _g_...

Right, named graphs have no semantics in RDF but we could without much effort define a sensible one, couldn't we? You give 3 examples of which the first is covered by the semantics for unasserted embedded triples in RDF*, the third is quite an outlier and of little use given that we can do the same with ordinary triples, and the second would make perfect sense for us. There are more, I know, but most of them are rather arcane.

So while I can envision the use-cases that << :a :b :c >> :d :e allows to solve, I really have no idea what use I could make of << :a :b :c :d >> :e :f.

Oh, come on: this is the original provenance usecase.

I would go even further: with embedded triples, we have an opportunity to make explicit the relationship between a graph name and the triples in this graph, something like:

:g1 :rel1 << :a :b :c >>, << :d :e :f >>.
:g2 :rel2 << :a :b :c >>, << :h :i :j >>.

One might do that, of course, though I don't really see the purpose.

One last note: all this doesn't help much with the WikiData usecase which is quite orthogonal as long as we don't use nested graphs - which is and should be out of the picture here.

pchampin commented 3 years ago

@rat10

I really have no idea what use I could make of << :a :b :c :d >> :e :f. Oh, come on: this is the original provenance usecase.

No, genuinely, I don't. You seem to assume that the 4th element of a quad unambiguously means provenance, but it doesn't. There are many scenarios where it means something else. What exactly, the quad itself does not say.

TallTed commented 3 years ago

Potentially helpful background --

pchampin commented 3 years ago

I know N-quads, but it is only a syntax for putting triples into "boxes", it does not say anything about the meaning of these boxes.

But, granted, I could use embedded quads for saying at least something like that:

    <file.nq> :contains << :alice :likes :bob :g >>.
TallTed commented 3 years ago

@pchampin - N-Quads 1.1 does at least say those boxes are graphs within datasets within the sphere of RDF. That's something (rather a lot, from some perspectives, and certinaly much more than the contentless "context" of N-Quads "original draft") about the meaning of the boxes.

<file.nq> a ex:filesystem_document, <http://www.w3.org/ns/ldp#RDFSource> ; ex:format ex:nquads would be some meaningful additions to your sample assertion -- not dictated by but fully available within N-Quads.

(Which is not intended to support @rat-10 assuming those or any other statements beyond what is automatic with N-Quads or any other RDF serialization being contemplated ... which is itself different than unserialized RDF.)

rat10 commented 3 years ago

@rat10

I really have no idea what use I could make of << :a :b :c :d >> :e :f. Oh, come on: this is the original provenance usecase.

No, genuinely, I don't. You seem to assume that the 4th element of a quad unambiguously means provenance, but it doesn't. There are many scenarios where it means something else. What exactly, the quad itself does not say.

No, that's not what I'm assuming. The fourth element in a quad identifies the graph in which the statement occurs and is a prerequisite to make any assertions about a specific triple occurrence. Provenance is just one, albeit quite popular, example for annotations that typically refer to a specific occurrence. I would not have thought that I have to spell all this out, again, but well.

pchampin commented 3 years ago

The fourth element in a quad identifies the graph in which the statement occurs

It does not identify it the same way that an IRI or a literal identifies something in RDF. This is a much looser notion of "identifying", as explained in the introduction of https://www.w3.org/TR/rdf11-datasets/.

Provenance is just one, albeit quite popular, example for annotations that typically refer to a specific occurrence.

A quad is not a triple occurrence. A quad is an element of a dataset, and since several datasets can contain the same quad, a quad itself has several occurrences. As such, quads don't solve your problem.

In other words, in one dataset :s :p :o :g means ":s :p :o occurs in graph :g", some another one it means ":s :p :o was asserted by author :g", in yet another one, it means ":s :p :o describes the thing :g"...

I would not have thought that I have to spell all this out, again, but well.

ditto

rat10 commented 3 years ago

@pchampin You keep arguing about technicalities.

The fourth element in a quad identifies the graph in which the statement occurs

It does not identify it the same way that an IRI or a literal identifies something in RDF. This is a much looser notion of "identifying", as explained in the introduction of https://www.w3.org/TR/rdf11-datasets/.

Okay, it addresses. SPARQL gets by very well without graphs having a formal model-theoretic semantics as it uses the graph name to address the graph. Its FROM syntax guarantees that a graph name works as intended even if the same IRI is used also for some other, very different "thing". The same is true for a :g in fourth position. The syntax defines that it points to the graph in which the statement in question occurs.

Provenance is just one, albeit quite popular, example for annotations that typically refer to a specific occurrence.

A quad is not a triple occurrence. A quad is an element of a dataset, and since several datasets can contain the same quad, a quad itself has several occurrences. As such, quads don't solve your problem.

You are nitpicking on minor issues that are solvable though e.g. some conventions that a "cool graph name" should ba an IRI starting with the address of the DataSet, appended by the name of the graph etc. And even if all that didn't work out and the :gcould address only Dataset-local graphs it would still be a major improvement. The difference is that it at least tries to address occurrences - and arguably not totally without success - whereas your proposed semantics completely ingnores this important aspect.

In other words, in one dataset :s :p :o :g means ":s :p :o occurs in graph :g", some another one it means ":s :p :o was asserted by author :g", in yet another one, it means ":s :p :o describes the thing :g"...

We don't need to have this discussion about meaning if all we need is to refer to a triple occurrence. As there is no standardized meaning of the fourth element we will have to convey any such meaning through further triples. Addressing works fine nonetheless (and enables those further statements).

pchampin commented 3 years ago

@blake-regalia would you object to closing this issue? (or alternatively tagging it as a 'discussion', which does not mandate an immediate change in the report)