threatgrid / asami

A graph store for Clojure and ClojureScript
Eclipse Public License 1.0
634 stars 29 forks source link

Alternative tree implementation #197

Open quoll opened 3 years ago

quoll commented 3 years ago

Durable storage currently uses AVL trees. I believe that these are adequate for triple storage, due to the linear scaling factor of external blocks, but they are a major performance bottleneck for the data pool.

We need fast read operations, with good write capabilities. This is based on insertions, where triples will require all of the elements to be converted to numbers, via tree-lookups. The larger an insert, the more likely data will be found in the tree, and the less likely they will need to be inserted.

The primary structure to consider is one of the B-tree variants, though Hitchhiker trees have been popular recently.

Unfortunately, the implementations that currently use trees assume an item of data at each node, which is not the case for these trees (they have multiple items, and many B-tree implementations only have data at the leaves). This will need to be considered to update the tree protocol.