salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Use Spark job grouping to distinguish steps of the machine learning flow #467

Closed nicodv closed 4 years ago

nicodv commented 4 years ago

Related issues N/A

Describe the proposed solution Leverages Spark's ability to set a "job group" ID to distinguish certain steps of the machine learning. Examples: data IO, model IO, feature engineering, cross-validation.

OpSparkListener is extended to capture which job group is currently active. Also, the new OpStep enum's entry names automatically show up in the Spark UI so that the function of stages can be more easily interpreted.

To this end:

Describe alternatives you've considered Because a main goal is to get the current step into the real-time SparkListener framework, the latter's ability to get hold of the Spark job group was an easy way to accomplish this. Considered but not feasible:

Also considered were the addition of other steps, such as "sanity checker", "scoring" or "metrics". However, these are not included here as this would:

Additional context The extension to OpSparkListener allows for more advanced handling of the metrics that are collected by it, e.g. the metrics can be grouped by OpStep.

codecov[bot] commented 4 years ago

Codecov Report

Merging #467 into master will increase coverage by 0.01%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #467      +/-   ##
=========================================
+ Coverage   86.99%     87%   +0.01%     
=========================================
  Files         344     345       +1     
  Lines       11576   11575       -1     
  Branches      370     593     +223     
=========================================
+ Hits        10070   10071       +1     
+ Misses       1506    1504       -2
Impacted Files Coverage Δ
...main/scala/com/salesforce/op/OpWorkflowModel.scala 93.9% <100%> (-0.15%) :arrow_down:
.../src/main/scala/com/salesforce/op/OpWorkflow.scala 88.11% <100%> (-0.85%) :arrow_down:
...sforce/op/stages/impl/selector/ModelSelector.scala 98.36% <100%> (+0.17%) :arrow_up:
...a/com/salesforce/op/utils/spark/JobGroupUtil.scala 100% <100%> (ø)
.../main/scala/com/salesforce/op/OpWorkflowCore.scala 95.45% <100%> (ø) :arrow_up:
...om/salesforce/op/utils/spark/OpSparkListener.scala 98.63% <100%> (+0.02%) :arrow_up:
...es/src/main/scala/com/salesforce/op/OpParams.scala 89.79% <0%> (+4.08%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8087834...818ce67. Read the comment docs.