mikelieberman / blueprints-accumulo-graph

Implementation of the Tinkerpop Blueprints API backed by Accumulo
Other
10 stars 2 forks source link

NOTE: This code is no longer maintained. See https://github.com/JHUAPL/AccumuloGraph for a newer implementation.

Blueprints for Accumulo

This is an implementation of the Tinkerpop Blueprints API backed by Accumulo. The graph is stored in a single table in Accumulo. This implementation has support for key/value indexing and some performance tweaks. If indexing is enabled, the index is stored in a separate table.

How to use it

AccumuloGraphOptions opts = new AccumuloGraphOptions();

opts.setConnectorInfo(instance, zookeepers, username, password);
// OR
opts.setConnector(connector);

opts.setGraphTable(graphTable);

// Optional
opts.setIndexTable(indexTable);
opts.setAutoflush(...);
opts.setReturnRemovedPropertyValues(...);
opts.setMock(...);

AccumuloGraph graph = new AccumuloGraph(opts);

Options are as follows:

Caveats

There are definitely bugs.

Timing issues: There may be a lag time between when you add a vertex/edge, set their properties, etc. and when it is reflected in the backing Accumulo table. This is done for performance reasons, but as a result, if you set values and then immediately read them back, the results may be inconsistent. The same holds for key/value indexes. This isn't a problem if you're doing things like bulk loads, or using the graph as read-only, but otherwise it may be problematic. If this is an issue, this can be mitigated somewhat using the autoflush option, where changes are flushed immediately to Accumulo, at the cost of write performance. I have tried to reduce these timing issues as much as possible, but there may still be issues with this, and it needs more testing.

TODO

Implementation details

The graph is stored in a single table with the following schema.

Row CF CQ Val Purpose
[v id] MVERTEX - - Vertex id
[v id] EOUT [e id] [e label] Vertex out-edge
[v id] EIN [e id] [e label] Vertex in-edge
[e id] MEDGE [e label] - Edge id
[e id] VOUT [v id] - Edge out-vertex
[e id] VIN [v id] - Edge in-vertex
[v/e id] PROP [pname] [pval] Element property

If the index table is enabled, it has the following schema.

Row CF CQ Val Purpose
PVLIST [p name] - - Vertex property list
PELIST [p name] - - Edge property list
[p name] [p val] [v/e id] - Property index

=======

Please contact me if you find any bugs! Thanks!