mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

[Request] Allow repeated k-fold cross validation #540

Open sbcalaff opened 2 years ago

sbcalaff commented 2 years ago

In order to reduce overfitting, I would like to ask for a new parameter: "n_repetitions". This parameter sets the number of complete sets of folds to compute for repeated k-fold cross-validation.

Cross-validation example:

{
    "validation_type": "kfold",
    "k_folds": 5,
    "n_repetitions": 3, # new
    "shuffle": True,
    "stratify": True,
    "random_seed": 123
}
pplonski commented 2 years ago

@sbcalaff I have good news! It is implemented already. There is parameter called repeats that controls the number of repetitions. Here is code https://github.com/mljar/mljar-supervised/blob/92706af75bd1859805a413768dc261d0572c3e06/supervised/validation/validator_kfold.py#L24-L28

(When repeats is used you cant use stacked ensemble.) Please let me know if it works for you.

I will keep this issue open to update the docs.

sbcalaff commented 2 years ago

I searched the code but I did not find it. Thank you in advance if you finally update the documentation.