Open SdgJlbl opened 5 years ago
For performance reasons, we might need to distinguish between cleaning/ preprocessing steps, independent of the dataset, and training steps which need to be repeated every time. scikit-learn provides a good implementation of a cross-validation scheme.
For pipelines producing metrics as their last step and including a train-test-split, we want to get cross-validated metrics.