NOTE: This code is no longer maintained. See https://github.com/JHUAPL/AccumuloGraph for a newer implementation.
This is an implementation of the Tinkerpop Blueprints API backed by Accumulo. The graph is stored in a single table in Accumulo. This implementation has support for key/value indexing and some performance tweaks. If indexing is enabled, the index is stored in a separate table.
AccumuloGraphOptions opts = new AccumuloGraphOptions();
opts.setConnectorInfo(instance, zookeepers, username, password);
// OR
opts.setConnector(connector);
opts.setGraphTable(graphTable);
// Optional
opts.setIndexTable(indexTable);
opts.setAutoflush(...);
opts.setReturnRemovedPropertyValues(...);
opts.setMock(...);
AccumuloGraph graph = new AccumuloGraph(opts);
Options are as follows:
Connector info: Set the information you need to connect to Accumulo. Alternatively, pass in an Accumulo Connector object which represents the connection. If not supplied, mock instance is needed (see below).
Graph table: Where to store the graph.
Index table: Where to store the key/value index.
Autoflush (default: true): Immediately flush changes to Accumulo, rather than waiting for performance reasons. If disabled, may cause timing issues (see caveats).
Return removed property values (default: true): The removeProperty method specifies that the value of the removed property is returned. This potentially requires another read from Accumulo. If you don't care what is returned, disable this to speed things up.
Use mock instance (default: false): If you don't have an Accumulo cluster lying around, but still want to use this, you can use a "mock" instance of Accumulo which runs in memory and simulates a real cluster.
There are definitely bugs.
Timing issues: There may be a lag time between when you add a vertex/edge, set their properties, etc. and when it is reflected in the backing Accumulo table. This is done for performance reasons, but as a result, if you set values and then immediately read them back, the results may be inconsistent. The same holds for key/value indexes. This isn't a problem if you're doing things like bulk loads, or using the graph as read-only, but otherwise it may be problematic. If this is an issue, this can be mitigated somewhat using the autoflush option, where changes are flushed immediately to Accumulo, at the cost of write performance. I have tried to reduce these timing issues as much as possible, but there may still be issues with this, and it needs more testing.
The graph is stored in a single table with the following schema.
Row | CF | CQ | Val | Purpose |
---|---|---|---|---|
[v id] | MVERTEX | - | - | Vertex id |
[v id] | EOUT | [e id] | [e label] | Vertex out-edge |
[v id] | EIN | [e id] | [e label] | Vertex in-edge |
[e id] | MEDGE | [e label] | - | Edge id |
[e id] | VOUT | [v id] | - | Edge out-vertex |
[e id] | VIN | [v id] | - | Edge in-vertex |
[v/e id] | PROP | [pname] | [pval] | Element property |
If the index table is enabled, it has the following schema.
Row | CF | CQ | Val | Purpose |
---|---|---|---|---|
PVLIST | [p name] | - | - | Vertex property list |
PELIST | [p name] | - | - | Edge property list |
[p name] | [p val] | [v/e id] | - | Property index |
=======
Please contact me if you find any bugs! Thanks!