visallo / vertexium

High-security graph database
http://vertexium.org/
Apache License 2.0
62 stars 34 forks source link

Edge label index #463

Closed joeferner closed 4 years ago

joeferner commented 4 years ago

The gist of this PR is to not store the same label string over and over again and to reduce the number of round trips from string to byte[]. I started this with a simple string intern here https://github.com/visallo/vertexium/blob/master/accumulo/iterators/src/main/java/org/vertexium/accumulo/iterator/model/EdgeInfo.java#L34 but ended up having to keep the byte[] (https://github.com/visallo/vertexium/blob/master/accumulo/iterators/src/main/java/org/vertexium/accumulo/iterator/model/EdgeInfo.java#L12) around which was actually causing most of the memory usage. With this PR it removes the need to keep the byte array around and reduces the number of label string copies to one per unique label.