Higher scoring values in logging than post-hoc evaluation

Hi,

I was training a NN classifier using skorch. The f1 score shown in the logging information (both training and evaluation) is ~0.1 higher than the one in my post-hoc evaluation. The roc_auc score shows comparable performance in the logging info and my post-hoc evaluation. Does anyone have an idea of the cause?

Is it because of scoring differences or just model extrapolation problem? I would assume that the model performance on the hold-out data in my post-hoc evaluation will be as well as the validation score. Or maybe the CV in the training step produce data leakage that making the validation score higher?

My data is heavily imbalanced, as background information. And I'm not using weighting for the scoring.

This includes the complete code to reproduce it: https://github.com/chenyangkang/random/blob/main/test_nn_binomial_class_import%20copy.ipynb

Thanks!

Yangkang

skorch-dev / skorch

Higher scoring values in logging than post-hoc evaluation #1072