tetherless-world / twks

Tetherless World Knowledge Store (TWKS), a provenance-aware RDF store
Apache License 2.0
3 stars 1 forks source link

Keep separate assertions union graph instead of querying all assertion graph parts by name #112

Closed gordom6 closed 4 years ago

gordom6 commented 4 years ago

The latter doesn't scale well. I knew this when I implemented it, but Kris recently ran into a pathological case where an assertions query crossed a lot of named graphs.

gordom6 commented 4 years ago

The redundant assertions union graph can be maintained transactionally. The main problem is deletes, since most (all?) triple stores won't store duplicate triples. If multiple nanopublication assertion parts contain the same triple, only one copy will be in the assertions union graph.

The short-term solution would be to rebuild the assertions union on every nanopublication delete, which should be a relatively rare operation. Mid-term solution would be to have a flag indicating that there are (any) duplicate assertions and a rebuild is required on delete. Long-term solution would be to track which statements in the union are duplicated and how many times.