inthefabric / Fabric

The collective mind awaits your input.
www.inthefabric.com
Other
5 stars 0 forks source link

Improve BatchCreateFactor #25

Closed zachkinstner closed 11 years ago

zachkinstner commented 11 years ago

The BatchCreateFactor function needs to be improved. Accomplishing this will not be trivial, and may require RexConnect and/or Weaver changes, so I'm creating this issue to keep things organized.

zachkinstner commented 11 years ago

For each Factor, the following tasks need to occur (in roughly this order):

Upon each Artifact "verify" step, if the Artifact does not exist, then all the remaining tasks in the sequence should be skipped, and the entire sequence should return an error/value that indicates the issue.

zachkinstner commented 11 years ago

The original implementation did this:

Notes:

zachkinstner commented 11 years ago

I'm going to invent a theoretical RexConnect feature, which might help solve this problem: conditional command execution. It would work by checking the response of a specified command for a value of false, zero, or null.

zachkinstner commented 11 years ago

New implementation idea (skipping optional edges for now):

Cmd ID Cond? Name Pseudo-Query
Mem0 Get Mem Once m=g.V('MId',_P0).next();
F0.0 Get Prim Art pa=g.V('AId',_P0); if(pa){ pa=pa.next(); }; pa;
F0.1 F0.0 Get Rel Art ra=g.V('AId',_P0); if(ra){ ra=ra.next(); }; ra;
F0.2 F0.1 Add Factor f=g.addVertex([...]); f.id;
F0.3 F0.1 Add Mem Edge ...VCI...; g.addEdge(m,f,'Creates',[...VCI...]);
F0.4 F0.1 Add Prim Edge ...VCI...; g.addEdge(f,pa,'UsesPrimary',[...VCI...]);
F0.5 F0.1 Add Rel Edge ...VCI...; g.addEdge(f,ra,'UsesRelated',[...VCI...]);

Notes:

zachkinstner commented 11 years ago

This could be done without a new RexConnect feature:

  1. Initialize a boolean variable: pass=true;
  2. If any Artifact is missing: pass=false;
  3. Before each query, do: if(pass){ return null; };

This is solution is not very elegant, and it adds an extra command for each Factor. If an Artifact is missing, then this solution still executes several extra commands that will all return null (a minor point).

zachkinstner commented 11 years ago

The query for adding a Factor can change lengths due to many optional parameters. This means re-compilation of many similar queries.

To avoid this, create a property map over several commands:

  1. Initialize props with all mandatory Factor properties
  2. For each applicable element E, add mandatory properties of E to props
  3. For each applicable optional Factor and element property, add property to props
  4. Create new Factor: g.addVertex(props);

This approach minimizes the number of possible (unique, parameterized) query scripts, and makes those queries much shorter. Thus, those queries are faster to compile, and take less memory to cache. Note that only the optional properties are added individually. The mandatory properties are added in bunches (since the order/quantity doesn't change).

zachkinstner commented 11 years ago

This is also a great test case for RexConnect vs. RexProClient performance.

These BatchCreateFactor improvements would be simpler to implement with RexProClient, since each query is executed individually. The logic can occur entirely in the application code (instead of needing conditional RexConnect commands, etc.), so if an Artifact isn't found, the application code can simply set the error response and move on.

The assumed downside to that approach is the quantity of round-trips to the database, and the de/serialization that occurs each time. The performance tests would determine its actual impact.

zachkinstner commented 11 years ago

The integration test has ~300 factors, with 20 factors in each request, and 3-degree parallelism. It executes in 2.5 to 3.0 seconds. I'm curious to see the performance impact on the production environment.