Revise/Restrict Traversal Service

zachkinstner commented 11 years ago

There are infinite ways to traverse the Fabric database. As the collection of data grows, supernodes become more common. These will break many user traversals, as it will be very difficult for them to know when and where filtering should occur to avoid these issues.

The probable solution? Restrict the current Traversal service. This would be done by removing several existing traversal links, then replacing them with more well-defined "search" style functions. With the possible traversal pathways restricted, Fabric can optimize (via indexing, VCI, caching, etc.), and users will not fall into the supernode traps.

The primary scenario which highlighted this issue was a simple question:

What Factors did Member M create for Artifact A?

The traversal for this could start at either M or A. Starting at M seems to be the better option, since A could be a supernode (like the current "noun" FabClass). But this traversal is also a problem if M has created hundreds+ Factors.

By replacing this traversal route with a function (which forces certain criteria) at M like CreatesFactors(ArtifactId, IsPrimaryArtifact), Fabric can optimize for this scenario. In this case, it will probably involve adding some VCI properties and/or denormalized properties/edges to always perform this traversal efficiently.

zachkinstner commented 11 years ago

My initial draft for the restricted Traversal service:

Removes almost all the list-returning traversal links
Includes a few specialized "search" functions
Includes some "complex" data-fetching functions
Includes some new filtering functions

I'm not sure yet, but it seems the existing Traversal functionality could remain, but simply be prevented from use (for now). It is a very powerful tool, however, so I don't want to lose it. Perhaps it would become available to apps who choose some kind of premium option -- probably one related to purchasing dedicated servers (or server time) within the Fabric cluster. That way, apps can traverse as desired, but they have a big incentive to make those traversals fast and efficient.

zachkinstner commented 11 years ago

It's worth noting that the measures described here are all in support of real-time usage.

At this time, I'm thinking that the broader (database-wide) analytical usage would happen as an offline, scheduled, premium service. Analytical queries/traversals would have the capability to do much more with their traversals, and include things like in-query calculations, counts, and other statistics/computations.

inthefabric / Fabric

Revise/Restrict Traversal Service #33