openworm / owmeta-core

Core library for owmeta
MIT License
3 stars 2 forks source link

Investigate HDT for bundle RDF storage #33

Open mwatts15 opened 2 years ago

mwatts15 commented 2 years ago

owmeta bundle archives store graph data in n-triples files. This format is good for doing line-oriented diffs, but it has a lot of redundancy which makes it less compressible, making archive retrieval and storage more expensive. As an alternative we could use HDT which provides one way to reduce the redundancy. Although it would be possible to do something similar without HDT, many of the questions of what does and doesn't work have probably been worked out by HDT - any deficiency that requires making our own format (e.g., removing or altering features that are only useful for querying) will be clear from this investigation. The query functionality should also be investigated as an alternative to pow_store_zodb for the bundle indexed store (see HDT-FoQ and https://rdflib.dev/rdflib-hdt/hdtstore.html).