peopledoc / mlvtools

Public repository for versioning machine learning data
Other
42 stars 7 forks source link

Cross-validation #51

Open SdgJlbl opened 5 years ago

SdgJlbl commented 5 years ago

For pipelines producing metrics as their last step and including a train-test-split, we want to get cross-validated metrics.

SdgJlbl commented 5 years ago

For performance reasons, we might need to distinguish between cleaning/ preprocessing steps, independent of the dataset, and training steps which need to be repeated every time. scikit-learn provides a good implementation of a cross-validation scheme.