skorch-dev / skorch

A scikit-learn compatible neural network library that wraps PyTorch
BSD 3-Clause "New" or "Revised" License
5.89k stars 391 forks source link

Higher scoring values in logging than post-hoc evaluation #1072

Closed chenyangkang closed 2 weeks ago

chenyangkang commented 2 weeks ago

Hi,

I was training a NN classifier using skorch. The f1 score shown in the logging information (both training and evaluation) is ~0.1 higher than the one in my post-hoc evaluation. The roc_auc score shows comparable performance in the logging info and my post-hoc evaluation. Does anyone have an idea of the cause?

Is it because of scoring differences or just model extrapolation problem? I would assume that the model performance on the hold-out data in my post-hoc evaluation will be as well as the validation score. Or maybe the CV in the training step produce data leakage that making the validation score higher?

My data is heavily imbalanced, as background information. And I'm not using weighting for the scoring.

This includes the complete code to reproduce it: https://github.com/chenyangkang/random/blob/main/test_nn_binomial_class_import%20copy.ipynb

Thanks!

Yangkang

chenyangkang commented 2 weeks ago

Turn out to be up sampling problem. When you up sample your training data and put it into a cross-validation, the both the training and validation set will have inflated scores.