I was training a NN classifier using skorch. The f1 score shown in the logging information (both training and evaluation) is ~0.1 higher than the one in my post-hoc evaluation. The roc_auc score shows comparable performance in the logging info and my post-hoc evaluation. Does anyone have an idea of the cause?
Is it because of scoring differences or just model extrapolation problem? I would assume that the model performance on the hold-out data in my post-hoc evaluation will be as well as the validation score. Or maybe the CV in the training step produce data leakage that making the validation score higher?
My data is heavily imbalanced, as background information. And I'm not using weighting for the scoring.
Turn out to be up sampling problem. When you up sample your training data and put it into a cross-validation, the both the training and validation set will have inflated scores.
Hi,
I was training a NN classifier using skorch. The f1 score shown in the logging information (both training and evaluation) is ~0.1 higher than the one in my post-hoc evaluation. The roc_auc score shows comparable performance in the logging info and my post-hoc evaluation. Does anyone have an idea of the cause?
Is it because of scoring differences or just model extrapolation problem? I would assume that the model performance on the hold-out data in my post-hoc evaluation will be as well as the validation score. Or maybe the CV in the training step produce data leakage that making the validation score higher?
My data is heavily imbalanced, as background information. And I'm not using weighting for the scoring.
This includes the complete code to reproduce it: https://github.com/chenyangkang/random/blob/main/test_nn_binomial_class_import%20copy.ipynb
Thanks!
Yangkang