thinkaurelius / titan

Distributed Graph Database
http://titandb.io
Apache License 2.0
5.25k stars 1.01k forks source link

Feature Request: Time To Live (TTL) setting for edges. #523

Closed haebin closed 10 years ago

haebin commented 10 years ago

Since C* and HBase both support TTL on column families, it would be possible to expose its TTL setting via edge label creation property. With this, you can expire old edges for storage efficiency and query performance.

peterklipfel commented 10 years ago

+1

anvie commented 10 years ago

:+1:

rahst12 commented 10 years ago

Amazing. We spoke about this at length in NYC. This is needed!

joshsh commented 10 years ago

+1 indeed. TTL on edges would support an awesome array of distributed graph streaming applications.

anvie commented 10 years ago

Make Titan different to other existing Graph DB on market :+1:

cloud-on-prem commented 10 years ago

:+1:

joshsh commented 10 years ago

So, assuming we/I do this, it seems we will have to deal with different notions of TTL in Cassandra and HBase. HBase supports TTL on column families, indeed, which would make the TTL setting global to a graph. E.g. we could have TTL on all edges. However, Cassandra permits setting of TTL upon the insertion of individual columns, which I think will be more useful in Titan. Often, you will want only certain edges, or certain types of edges, to expire. Since Graph#addEdge in Blueprints does not provide any means of passing in a TTL at edge / column creation time, it probably makes the most sense to have TTL at the edge label level, i.e. LabelMaker#ttl. For example, to set a TTL of 5 seconds for all "locatedAt" edges, you would declare:

g.makeLabel("locatedAt").ttl(5).make()

In order to accommodate HBase, we could also have a configuration property such as storage.ttl which would give a TTL to edges of any label (unless, perhaps, a TTL is specifically overridden via makeLabel).

I would say that vertex and property TTL are of secondary importance, but also within the realm of possibility (at least, to someone who has not yet tried to implement them).

mbroecheler commented 10 years ago

That's a solid analysis @joshsh. In addition, to graph level TTL and label/key level TTL we could also consider setting the TTL via a dedicated property:

e = v.addEdge('locatedAt',u)
e.setProperty('ttl',5)

However, that puts a lot of burden on the developer and I like the option of keeping it at the schema level better.

joshsh commented 10 years ago

TTL at the level of individual edges certainly would give the developer the most control. There are scenarios in which this would be advantageous, including any scenario in which you want TTL bound to data sources as opposed to data types. For example, you might want topic edges for blog posts to survive longer than the equivalent edges for tweets, or for a low-volume source of posts as opposed to a high-volume one, without necessarily creating distinct types of edges.

The problem is that TTL needs to be declared upon insertion, so unless we can buffer the edge until setProperty is called (assuming that no read operations occur in the meantime), and then insert, it's too late.

joshsh commented 10 years ago

Note: TTL for Titan / Cassandra has been implemented in this Titan 0.5 branch:

https://github.com/thinkaurelius/titan/tree/ttl

Both per-label TTL and per-edge TTL are supported. An example of per-label TTL:

graph.makeLabel("likes").ttl(60).make();
graph.commit();

graph.addEdge(null, v1, v2, "likes");
graph.commit();

This will give all "likes" edges a time to live of 60 seconds. I.e. if you commit() or rollback() a transaction more than 60 seconds after the commit() which creates the edge, the edge will no longer be be returned by iterators created in that transaction.

Per-edge TTL is possible via setProperty() before commit(), e.g.

e = graph.addEdge(null, v1, v2, "likes");
e.setProperty(Titan.TTL, 10);
graph.commit();  // we don't mutate Cassandra until this point

If both per-label and per-edge TTL are defined, per-edge TTL takes precedence, so the edge above will time out in 10 seconds rather than 60.

Any feedback on / experiences with this feature are welcome. I will look into HBase support next.

anvie commented 10 years ago

Cool :+1:

mbroecheler commented 10 years ago

Thanks to @joshsh @xedin and others, we moved this feature forward into the 0.5 release which supports edge label / property key and vertex label TTL.

ppeddi commented 10 years ago

Is TTL available on vertices also? The above examples (and all references I could find) are all on edges only. If vertex TTL is supported, is there any documentation on this?

Thanks Praveen

anvie commented 10 years ago

@ppeddi http://s3.thinkaurelius.com/docs/titan/0.5.0/advanced-schema.html