w3c / rdf-ucr

https://w3c.github.io/rdf-ucr/
Other
5 stars 1 forks source link

Talking About Multiple Triples at Once #26

Open niklasl opened 9 months ago

niklasl commented 9 months ago

Talking About Multiple Triples at Once

Contact information

Name: Niklas Lindström (@niklasl) Organization: National Library of Sweden

Brief Description

I want to describe a bunch of triples together — often describing one resource or a chain thereof — succinctly, for instance to assign when and where their occurrences where discovered.

What you want to be able to do

Assert provenance (and possibly other marginalia) about multiple triples from a common source. Often, as in the case of RDF lists or blank nodes, these triples share a subject or are chained together, comprising an "integral subgraph", if you will (or a rooted tree in graph theoretical terms).

What is the role of RDF-star quoted triples in your use case

It is at odds with current practises of using named graphs for this. It theoretically will provide missing semantics, which is promising. But in its current design (in the CG report) it becomes unwieldy, both syntactially, and in its reliance upon types, not tokens, for what is expressed.

Since a triple term denotes itself, any connection to an occurrence must be through an explicit relation, and not be a fact about the abstract triple itself, which is mathematically platonic in nature.

Also, using blank nodes is not uncommon in these cases, which raises other questions (e.g. how to quote an RDF list, or a "person named Alice born in 1852"). Representing that as disjoint quoted triples quickly becomes as untenable for humans as is reading NTriples.

Why it is hard or impossible to do what you want to do without quoted triples

It is not impossible, using named graphs. But the semantics thereof are undefined, and storing this as multiple named graphs today is cumbersome, implementation-dependent and requires assumptions of interpretations to hold.

How you want quoted triples to behave in your use case

I have not seen any practical cases where opacity is required for a combination of asserted and quoted, i.e. annotated data. For unasserted "suggestions" in our real use cases we would require transparent semantics (to be able to navigate to and understand the suggestions).

I would ideally be able to quote all constituent parts of the blank node expressions below. Otherwise, only the arc with the blank node would be quoted, and lots of "dangling triples" would be in the asserted graph.

The problem of quoted bnodes with lots of "dangling, asserted facts" might be handled by user convention, along the lines of "all bnodes only linked to from a quoted triple are to be practically taken as belonging to the quote". But that is cumbersome and brittle.

It is conceivable that other use cases would prefer to "quarantine" chunks from external sources or automatically computed suggestions (e.g. using machine learning). We would use actual literals for that, probably in combination with blank nodes (thus increasing the number of triples in the chunk). But if named graphs where to have conditional "opacity" (if they are "accepted" or treated separately from the active interpretation), this would be a useful alternative. (Literals of course allow for quoting only certain subjects or objects, for instance.)

Example 1: annotating a description of something unknown

To quote something described but unknown, you can do this in Notation 3:

<charlesdodgson> :says { [] :name "Alice" ; :birthDate "1852" } .

This in TriG:

<charlesdodgson> :says _:g1 .
_:g1 { [] :name "Alice" ; :birthDate "1852" }

But in Turtle-star, you have to do this:

<charlesdodgson> :says << _:b1 :name "Alice" >> , << _:b1  :birthDate "1852" >> .

Example 2: Annotating Chunks of Triples

This is bad practise (since an abstract triple is not an occurrence in itself):

  << _:b1 :givenName "Alice" >> dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
    dc:date "2023-10-23" .
  << _:b1 :familyName "Liddell" >> dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
    dc:date "2023-10-23" .
  << _:b1 :birthDate "1852-05-04" >> dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
    dc:date "2023-10-23" .

This is more correct:

<< _:b1 :givenName "Alice" >> rdfg:subGraphOf _:d1 .
<< _:b1 :familyName "Liddell" >> rdfg:subGraphOf _:d1 .
<< _:b1 :birthDate "1852-05-04" >> rdfg:subGraphOf _:d1 .
_:d1 dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
  dc:date "2023-10-23" .

Given RDF 1.1 Semantics, which defines:

A subgraph of an RDF graph is a subset of the triples in the graph. A triple is identified with the singleton set containing it, so that each triple in a graph is considered to be a subgraph.

The above is OKish but not 1:1, since a triple identified does not (necessarily) mean denoted. Cf. (from the same section):

For example, an IRI used as a graph name identifying a named graph in an RDF dataset may refer to something different from the graph it identifies.

This is already possible, but means something else(?):

_:d1 {
  [] :givenName "Alice" ;
    :familyName "Liddell" ;
    :birthDate "1852-05-04" .
}
_:d1 dc:source <https://en.wikipedia.org/wiki/Alice_Liddell> ;
  dc:date "2023-10-23" .

Example 3: RDF Lists

Unsurprisingly the cons nature of ordered lists as triples unravel in the seams here.

You cannot easily quote the entire list, just its association. So this:

<report> bibo:authorList (<a> <b> <c>) {| dc:source <a> |} .

Means this:

<report> bibo:authorList _:l0 .
<< <report> bibo:authorList _:l0 >> dc:source <a> .
_:l0 rdf:first <a>; rdf:rest (<b> <c>) .

Instead of the preferred:

<report> bibo:authorList (<a> <b> <c>) .
_:g1 { <report> bibo:authorList (<a> <b> <c>) }
_:g1 dc:source <a> .

Here is a combo of one "chosen" list and a "suggested" list, using suggested new syntax for unasserted, annotated triples:

<report> bibo:authorList (<a> <b> <c>) {| dc:source <a> ; ex:disputedBy <c> |} ,
  -- (<c> <b> <a>) {| dc:source <c> |} |} .

Preferably meaning:

<report> bibo:authorList (<a> <b> <c>) .
_:g1 { <report> bibo:authorList (<a> <b> <c>) }
_:g1 dc:source <a> ; ex:disputedBy <c> .
_:g2 { <report> bibo:authorList (<c> <b> <a>) }
_:g2 dc:source <c> .
marcelotto commented 4 months ago

I want to second that use case, which I have faced multiple times, which led me to the creation of the RDF Triple Compounds (RTC) vocabulary (https://w3id.org/rtc) for exactly this purpose and an implementation in Elixir which allows treating such compounds like ordinary graphs, abstracting away the RDF-star annotations (https://github.com/rtc-org/rtc-ex).