thinkaurelius / titan

Distributed Graph Database
http://titandb.io
Apache License 2.0
5.25k stars 1.01k forks source link

Should we disable ID flushing #1030

Open mbroecheler opened 9 years ago

mbroecheler commented 9 years ago

Currently, GraphDatabaseConfiguration.IDS_FLUSH default to true, meaning that ids are assigned to elements upon creation. This has the advantage that ids are directly available to the user (and not just after commit) but has the distinct disadvantage that graph partitioning doesn't work well since we cannot reason about optimal placement on a just-created element.

Hence, it should be considered to default this option to FALSE and rewrite those test cases where we make the assumption that ids are immediately available (this shouldn't be the case, but I am pretty sure we have done that). This would educate users to only access ids after commit which is needed when enabling graph partitioning (and expecting results).

dmill-bz commented 9 years ago

Just wanted to point out that this is a headache for non-java languages. It implies keeping track of every object generated from a gremlin-server write and then refreshing their IDs on a transaction commit event (creating a whole other set of database read queries). It's a bit of a nightmare when you're trying to cut down you query time and serve information quickly to you end users.

I totally understand the necessity behind it but maybe considering a cache that could link temp IDs to real IDs would be a good idea (if at all possible). It wouldn't have to last long as most of the time threads on our end last only a few seconds and then elements (and IDs) are refetched in subsequent threads.

Food for thought.

mbroecheler commented 9 years ago

Yeah, I can see how that is a pain. Also, graph partitioning is an advanced enough use case that we can expect those users needing it to disable this configuration option.