w3c / rdf-ucr

https://w3c.github.io/rdf-ucr/
Other
5 stars 1 forks source link

RDF-star for Wikibase/Wikidata #24

Open Tpt opened 10 months ago

Tpt commented 10 months ago

See https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Wikidata for a single document on this use case.

Contact information:

Note that I am not representing Wikibase/Wikidata/Wikimedia in any way, I just wanted to describe this use case.

Brief Description of your use case:

Wikibase is the sofware that powers Wikidata. Wikibase is using its own data model but provides a RDF mapping. Wikibase contains a native reification system. Each main "snak" (aka triple) like "USA president JoeBiden" can be annotated with "qualifiers" like "start date January 20th 2021" or "predecessor DonaldTrump", "references" (i.e. blank nodes describing a source) and a "rank" (a processing annotation that can have three values "preferred"/"normal"/"deprecated"). Wikibase calls this full construction a "statement".

The current RDF encoding uses a specific RDF node to encode each statement. For example (Wikibase uses opaque identifiers, I have tweaked the RDF to make it more readable):

wd:USA a wikibase:Item ;
    p:president wd:JoeBidenPresidencyStatement wd:DonaldTrumpPresidencyStatement . # p:X are relations between a subject and a statement. The statement subject is the triple subject (here "USA) and the statement predicate is the relation predicate (here "president")

wds:JoeBidenPresidencyStatement a wikibase:Statement  ;
     ps:president wd:JoeBiden ; # ps:X are relations between a statement and an object. The statement object is the triple object (here "JoeBiden") and the statement predicate is the relation predicate (here "president")
     wikibase:rank wikibase:PreferredRank ;
     pq:start_date "2021-01-20"^^xsd:dateTime ; # A qualifier
     pq:predecessor wd:DonaldTrump ; # A qualifier
     prov:wasDerivedFrom wdref:a_reference , wdref:an_other_reference .

wds:DonaldTrumpPresidencyStatement a wikibase:Statement  ;
     ps:president wd:DonaldTrump ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "2017-01-20"^^xsd:dateTime ;
     pq:start_date "2021-01-20"^^xsd:dateTime .

wd:USA wdt:president wd:JoeBien . # For statements with the "best" rank a direct edges is inserted in the RDF with the "wdt:" prefix.

Note that in the previous example the wd:USA wdt:president wd:JoeBien direct triple have been generated because the statement rank is "preferred". Statements about the older presidencies also exists but have only the "normal" rank such that the direct triples are not generated.

Paper about Wikibase RDF encoding design Reifying RDF: What Works Well With Wikidata?

What you want to be able to do:

It would be great to provide a way to have nice RDF syntax to encode this use cases.

What is the role of RDF-star quoted triples in your use case:

They might be used to simplify the RDF encoding. For example one might hope to write:

<< wd:USA wd:president wd:JoeBiden >>  a wikibase:Statement  ;
     wikibase:rank wikibase:PreferredRank ;
     pq:start_date "2021-01-20"^^xsd:dateTime ;
     pq:predecessor wd:DonaldTrump ;
     prov:wasDerivedFrom wdref:a_reference , wdref:an_other_reference .

<< wd:USA wd:president wd:DonaldTrump >>  a wikibase:Statement  ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "2017-01-20"^^xsd:dateTime ;
     pq:start_date "2021-01-20"^^xsd:dateTime .

wd:USA wd:president wd:JoeBien .

Why it is hard or impossible to do what you want to do without quoted triples:

Wikidata needs reification to encode statements.

How you want quoted triples to behave in your use case:

(For example, do you want the precise syntax of subjects, predictes, and objects in quoted triples to be important?)

The RDF-star encoding written above is only valid if the existance of a quoted triple does not implies the assertion of the triple itself. Indeed we would like this to be in Wikidata RDF graph:

<< wd:USA wd:president wd:DonaldTrump >>  a wikibase:Statement  ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "2017-01-20"^^xsd:dateTime ;
     pq:end_date "2021-01-20"^^xsd:dateTime .

But the triple wd:USA wd:president wd:DonaldTrump should not be in the graph.

We also need to be able to distinguish two statements on the same base triple. We can't merge the following two statements because it would make the start date, end date pairs meaningless:

<< wd:Russia wd:president wd:VladimirPutin >>  a wikibase:Statement  ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "1999-12-31"^^xsd:dateTime ;
     pq:end_date "2008-05-07"^^xsd:dateTime .

<< wd:Russia wd:president wd:VladimirPutin >>  a wikibase:Statement  ;
     wikibase:rank wikibase:PreferredRank ;
     pq:start_date "2012-05-07"^^xsd:dateTime ;

An example RDF graph that shows part of your use case:

The Wikidata graph exposed by the Wikidata Query Service.

pfps commented 10 months ago

One question is how much of Wikidata to include. As you say, there are ranks but there are also no-values and some-values.

niklasl commented 10 months ago

This is a good case, similarly considered in the Wikidata example of Detailed Provenance in Cooperative Union Cataloguing .

I don't think opacity is necessary though? That the triple is unasserted in the graph doesn't necessarily preclude that its suggested sense is invisible to entailment? But I presume the question here is what would the common intersection of semantic expectations by Wikidata editors and consumers be?

(Note: I've experimented a bit with making Wikidata RDF "more readable" for these kinds of illustrative examples. See: https://github.com/Kungbib/wikidatalab/ )

Tpt commented 10 months ago

@pfps Thank you!

One question is how much of Wikidata to include. As you say, there are ranks but there are also no-values and some-values.

I believe some/no values are not affecting the expected semantic with respect to RDF-star, opposite to ranks. Indeed some-value can be written << wd:s wd:p _:my_blank_node >> and no-value << wd:s wd:p wdno:p>> with wdno:p defined elsewhere as:

 wdno:p a owl:Class ;
    owl:complementOf [ a owl:Restriction ; owl:onProperty wdt:p ; owl:someValuesFrom owl:Thing ] .

My initial use case was missing the multiple statements on the same base triple problem. I have updated the "How you want quoted triples to behave in your use case" section to reflect it.

Tpt commented 10 months ago

@niklasl

This is a good case, similarly considered in the Wikidata example of Detailed Provenance in Cooperative Union Cataloguing .

Thank you! I wanted to cover Wikidata/Wikibase as a stand alone usecase to be able to fully describe its needs.

I don't think opacity is necessary though? That the triple is unasserted in the graph doesn't necessarily preclude that its suggested sense is invisible to entailment? But I presume the question here is what would the common intersection of semantic expectations by Wikidata editors and consumers be?

Sorry, I got confused by the namings and thaught that "referential transparency" was about the quoted triple stand-alone assertion. Indeed, referential transparency seems to work well with Wikidata (and probably better than opacity). I have updated the use case description.

(Note: I've experimented a bit with making Wikidata RDF "more readable" for these kinds of illustrative examples. See: https://github.com/Kungbib/wikidatalab/ )

Nice!

pfps commented 10 months ago

I created a wiki page to hold a clean version of this use case. See https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Wikidata