rdfjs / dataset-spec

RDF/JS: Dataset specification 1.0 – This specification provides a definition how to store multiple quads in a so-called dataset.
https://rdf.js.org/dataset-spec/
6 stars 5 forks source link

How to count size with RDF* statements? #59

Closed RubenVerborgh closed 4 years ago

RubenVerborgh commented 4 years ago

Via @simonvb:

The store has a notion of size. I'm not sure how the size should be affected when adding RDFstar statements. For example, when someone adds a quad to the store like this Quad(Quad, Iri, Literal). Should we increment the size by one, or by two? It could be by one because only one Quad object gets added It could be by two because the added Quad object represents both the triple a b c and the triple <<a b c>> d e, so 2 triples are added.

Would be interested in particular in @hartig's take on this

hartig commented 4 years ago

(Sorry for the late reply. I am on vacation.)

My short answer is: increment the size by one.

For the longer, more detailed answer, recall that an RDF graph is a set of RDF triples. Therefore, as with any other type of sets, the cardinality of such a set is simply the number of elements that are contained in the set. In other words, the cardinality of an RDF graph is the number of RDF triples that the graph contains, no matter whether any of these triples is a nested triple (i.e., has another RDF triple at its subject position or its object position) or not. For instance, consider an RDF graph G = {t} with t=(t',p,o) where t' is another RDF* triple and p and o are URIs. The cardinality of G is 1 because G contains only one (nested) triple, namely t.

Of course, you may say that there are two RDF triples that are mentioned in G, namely t and t'. So, you may want to define the "size" of G to be 2. However, in the context of sets, the terms "size" and cardinality are typically used synonymous; so, it would be strange for RDF graphs to have their "size" being something else than their cardinality.

On a related note, we currently have two different modes of usage of RDF that are under discussion: the SA mode and the PG mode. It is an open question which of them will be defined in the future spec of RDF. The reason why I mention these modes here is because the are different in terms of the number of statements that are considered to be asserted by an RDF* graph. For instance, under SA mode, the aforementioned example graph asserts a single statement only, whereas under PG mode, it asserts two statements.

RubenVerborgh commented 4 years ago

Thanks so much, @hartig!