inthefabric / Fabric

The collective mind awaits your input.
www.inthefabric.com
Other
5 stars 0 forks source link

Separate Indexing/Lookup Database #11

Closed zachkinstner closed 11 years ago

zachkinstner commented 11 years ago

Based on this discussion and general experience with graph databases in the past several months, I think Fabric will need to implement an external database/cache. This database will be responsible for indexing data for global and sub-global (e.g. all "Factor" nodes) searches.

zachkinstner commented 11 years ago

This would be used for several purposes. At the broadest level, this database/cache would provide Fabric with a means for performing standard RDBMS operations with "tables", counts, sorting options, etc. For example:

zachkinstner commented 11 years ago

Traversal queries could have a hook where this database/cache would take over for certain lookups. The lookup response would provide a list of node IDs, and the traversal query could use a retain step to filter based on those IDs.

Next steps are to investigate options for this database/cache. Initial ideas are ElasticSearch, Redis, or a separate Cassandra cluster/keyspace.

zachkinstner commented 11 years ago

This idea may be premature. Investigate Titan indexing more closely. It seems (especially with the Elastic Search integration) that there are better ways to handle Fabric's various indexing needs. Any way I go about it, there will be lots of denormalization. I like the idea of using Titan's transaction capabilities to ensure all related pieces (including all related direct/indirect indexes) are created correctly during an insert.

zachkinstner commented 11 years ago

Some additional thoughts on this:

zachkinstner commented 11 years ago

For now, Fabric will proceed without this additional complexity.