This is a long one, so please take your time to review haha
Added an initial data preparation step in training pipeline. This results in a general refactor of the training pipeline:
Dataprep: data preprocessing & train-test split (was previously part of train). Train set output connected as input for training step, test set output connected as input for evaluation step.
Train: model training only. Output (model) connected as input for evaluation step.
Evaluation: model evaluation (was previously part of train) using test set from dataprep & model from training.
Register: checks if model is better than old one (was previously part of evaluation) and registers it.
Full training pipeline looks like this:
Also:
Renamed utils to aml_utilsto avoid name collision.
Added dataset as visible pipeline input (instead of silently accessing it from train step).
Changed registration method so model is linked to the training run in the registry.
Added generic functionality to log images produced during evaluation.
Fixed bug in run canceling behaviour when model should not be registered.
Added >= conditional in is_model_better (was previously returning always True). This way the default behaviour is registering, but we can manually play with the tag values in the model registry if we want to test cancelling behaviour.
This is a long one, so please take your time to review haha
Added an initial data preparation step in training pipeline. This results in a general refactor of the training pipeline:
Full training pipeline looks like this:
Also:
utils
toaml_utils
to avoid name collision.>=
conditional inis_model_better
(was previously returning always True). This way the default behaviour is registering, but we can manually play with the tag values in the model registry if we want to test cancelling behaviour.Closes #27