Closed haebin closed 10 years ago
+1
:+1:
Amazing. We spoke about this at length in NYC. This is needed!
+1 indeed. TTL on edges would support an awesome array of distributed graph streaming applications.
Make Titan different to other existing Graph DB on market :+1:
:+1:
So, assuming we/I do this, it seems we will have to deal with different notions of TTL in Cassandra and HBase. HBase supports TTL on column families, indeed, which would make the TTL setting global to a graph. E.g. we could have TTL on all edges. However, Cassandra permits setting of TTL upon the insertion of individual columns, which I think will be more useful in Titan. Often, you will want only certain edges, or certain types of edges, to expire. Since Graph#addEdge
in Blueprints does not provide any means of passing in a TTL at edge / column creation time, it probably makes the most sense to have TTL at the edge label level, i.e. LabelMaker#ttl
. For example, to set a TTL of 5 seconds for all "locatedAt" edges, you would declare:
g.makeLabel("locatedAt").ttl(5).make()
In order to accommodate HBase, we could also have a configuration property such as storage.ttl
which would give a TTL to edges of any label (unless, perhaps, a TTL is specifically overridden via makeLabel
).
I would say that vertex and property TTL are of secondary importance, but also within the realm of possibility (at least, to someone who has not yet tried to implement them).
That's a solid analysis @joshsh. In addition, to graph level TTL and label/key level TTL we could also consider setting the TTL via a dedicated property:
e = v.addEdge('locatedAt',u)
e.setProperty('ttl',5)
However, that puts a lot of burden on the developer and I like the option of keeping it at the schema level better.
TTL at the level of individual edges certainly would give the developer the most control. There are scenarios in which this would be advantageous, including any scenario in which you want TTL bound to data sources as opposed to data types. For example, you might want topic edges for blog posts to survive longer than the equivalent edges for tweets, or for a low-volume source of posts as opposed to a high-volume one, without necessarily creating distinct types of edges.
The problem is that TTL needs to be declared upon insertion, so unless we can buffer the edge until setProperty is called (assuming that no read operations occur in the meantime), and then insert, it's too late.
Note: TTL for Titan / Cassandra has been implemented in this Titan 0.5 branch:
https://github.com/thinkaurelius/titan/tree/ttl
Both per-label TTL and per-edge TTL are supported. An example of per-label TTL:
graph.makeLabel("likes").ttl(60).make();
graph.commit();
graph.addEdge(null, v1, v2, "likes");
graph.commit();
This will give all "likes" edges a time to live of 60 seconds. I.e. if you commit() or rollback() a transaction more than 60 seconds after the commit() which creates the edge, the edge will no longer be be returned by iterators created in that transaction.
Per-edge TTL is possible via setProperty() before commit(), e.g.
e = graph.addEdge(null, v1, v2, "likes");
e.setProperty(Titan.TTL, 10);
graph.commit(); // we don't mutate Cassandra until this point
If both per-label and per-edge TTL are defined, per-edge TTL takes precedence, so the edge above will time out in 10 seconds rather than 60.
Any feedback on / experiences with this feature are welcome. I will look into HBase support next.
Cool :+1:
Thanks to @joshsh @xedin and others, we moved this feature forward into the 0.5 release which supports edge label / property key and vertex label TTL.
Is TTL available on vertices also? The above examples (and all references I could find) are all on edges only. If vertex TTL is supported, is there any documentation on this?
Thanks Praveen
Since C* and HBase both support TTL on column families, it would be possible to expose its TTL setting via edge label creation property. With this, you can expire old edges for storage efficiency and query performance.