Closed Breaddsmall closed 1 year ago
Matbench imposes the 5-fold test split. How you handle the "training" data is up to you, and mostly includes a validation procedure. In essence, for each fold you have training and testing data. You should perform hyper parameter tuning with the "training" data - so you make the split/fold yourself (in a nested fashion).
Just be sure to never use the provided test data - and do all processing including validation with the "training" data.
Thank you for the clarification!
So if it is fair I use the traditional cross validation instead of using the nested cross validation. Will the traditional cross validation lead to the leakage of information and the results will be higher?
As a 5-fold cross validation is applied in the benchmark, all data are either used in the training or tuning process. However, the final result of a task is the averaged MAE on test set from each fold. Therefore, the hyperparameters can not be tuned by hand since there isn't an isolated validation set or test set (for each fold).