pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
214 stars 23 forks source link

Add persistent implementation of `dataset::MutableDataset` #22

Open pchampin opened 4 years ago

pchampin commented 4 years ago

In addition to the dataset::inmem module, it would be nice to have a disk-based persistent implementation of dataset::MutableDataset.

pchampin commented 4 years ago

One way to do it could be to use RockDB. We could even try and use the same layout as used by Oxigraph, making it possible to share the same storage across both crates. @Tpt, what do you think? Is that layout documented somewhere?

Tpt commented 4 years ago

That would be great to make Sophia works with Oxigraph storage! The Oxigraph RockDB layout is not stable yet, I am currently tweaking it to make it a bit more compact and allow efficient range queries. I hope to have time finishing a 0.1 Oxigraph release with a stable RocksDB layout in late December or (more realistically) January.

The basic storage approach should not change: I store in RocksDB keys rotations of quads of EncodedTerm and a string store. RocksDB prefix searches are then used to solve triple patterns. The string store is used as an inverse hash lookup, the strings being hashed inside of EncodedTerm. Hashing strings is very useful for heavy SPARQL query evaluations with a lot of joins, it might not be the best approach for just storing and doing simple triple pattern evaluation.

But, I'm not sure that reimplementing Oxigraph storage in Sophia is the best way to do it. I fear than very quickly you might want also to be able to run SPARQL queries on top of it in Sophia, completely duplicating the work already done in Oxigraph. A better way to go would probably to make Oxigraph usable with Sophia.

pchampin commented 4 years ago

A better way to go would probably to make Oxigraph usable with Sophia.

Yes, implementing Sophia's traits above Oxigraph is also a way to go, and probably the fastest one. I would be concerned, though, that converting from Oxigraph's model to Sophia's would induce some overhead, hence my initial proposal... But definitely worth a try, anyway.

pchampin commented 4 years ago

Not published on crates.io yet, but an adapter for Oxigraph is now available at https://github.com/pchampin/sophia_oxigraph.

pchampin commented 2 months ago

Just pushed a PR on Oxigraph to make it implement the relevant Sophia traits (behind a feature gate). Oxigraph could therefore serve as a reference implementation of a persistent dataset.