project-codeflare / codeflare

Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.
https://codeflare.dev
Apache License 2.0
218 stars 35 forks source link

Lineage And #25

Open raghukiran1224 opened 3 years ago

raghukiran1224 commented 3 years ago

Overview

AND node semantics computes a full cross product. In grid search cv, an AND node like feature union will require features to be joined in a given input object. For example when performing two fold cross validation on the following pipeline: (PCA (n_components = 5, 10) || Nystrom || Select k-best) && Feature Union. On two-fld CV, we get four objects from PCA node (2x2) and two objects each from Nystrom and Select k-best. A regular AND node will compute 4x2x2 cross product. A lineage and will compute 4 cross products: (pca_5, Nystrom, Select k_best) on the two input objects and (pca_10, Nystrom, select k_best) on the the same two input objects.

Lineage And: Solution select items in the AND node cross product that share the same input object lineage

Acceptance Criteria

Questions

Assumptions

Reference