w3c / rdf-star

RDF-star specification
https://w3c.github.io/rdf-star/
Other
119 stars 23 forks source link

UniProt: attributed/evidenced triples #47

Closed JervenBolleman closed 3 years ago

JervenBolleman commented 3 years ago

As an UniProt developer, I want an easier way to talk about triples we have asserted. So that querying and parsing data-models from evidenced triples becomes simpler.

<P26948> up:annotation <P26948#SIPDB6A831D8E2E2D2A> .
<#_kb.P26948_up.annotation_A144DC8D56EA0928> rdf:type rdf:Statement ;
  rdf:subject <P26948> ;
  rdf:predicate up:annotation ;
  rdf:object <P26948#SIPDB6A831D8E2E2D2A> ;
  up:attribution <P26948#attribution-89AC1B682EEB440D50C4AEBB24FCA860> .

Is a lot of bytes to type., and even worse the five joins are very expensive to perform.

Our use-case for RDFstar etc. is to allow us to talk about triples as we do now, but with a higher performance and lower barrier to entry.

We currently depend on the RDF/XML rdf:ID to easily parse this in our inhouse custom RDF parsers, and would like to keep this option open.

At the same time we deal with a lot of renaming (IRIs) for the same thing. e.g. a related database might use http://identifiers.org/uniprot/P05067 instead of http://purl.uniprot.org/uniprot/P05067. And a owl:sameAs is used to merge these datasets. Our attributions/evidences should be found no matter which IRI is used.

ericprud commented 3 years ago
<P26948> up:annotation <P26948#SIPDB6A831D8E2E2D2A> .
<#_kb.P26948_up.annotation_A144DC8D56EA0928> rdf:type rdf:Statement ;
  rdf:subject <P26948> ;
  rdf:predicate up:annotation ;
  rdf:object <P26948#SIPDB6A831D8E2E2D2A> ;
  up:attribution <P26948#attribution-89AC1B682EEB440D50C4AEBB24FCA860> .

Is a lot of bytes to type., and even worse the five joins are very expensive to perform.

I assume those joins are on query, and would probably be resolved out in the pre-processor if they have constant predicates and objects. How can you avoid those joins if you have variables in place of e.g. <P26948> and up:annotation above? Would the idea be that an embTriple index would allow the SPARQL parser to recognize A. reified statements or B. an <<>> encoding thereof and turn the three self-joiins into a crawl through an SPO index over your embedded triples?

JervenBolleman commented 3 years ago

@ericprud the assumption is that instead of the current quad tables we would see a "quad+(virtual) triple id column" making the the five fold join of the reification quad just a single join (given a syntax such as << >>). This is possible with the PG constraints, harder for the SA mode to implement. Converting a reification quad into a << >> style query is possible by query rewriting but not needed for our usecase.