zzzqzhou / RAM-DSIR

[ECCV'22] Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration
37 stars 3 forks source link

Are the reported results from the best epoch or the last epoch? #7

Closed wangminj closed 7 months ago

wangminj commented 8 months ago

Hi,

Thanks for your great work!

Are the reported results from the best epoch or the last epoch? Thank you.

zzzqzhou commented 8 months ago

Thanks for reaching out. We choose the last epoch checkpoint for evaluation.

wangminj commented 8 months ago

Thank you very much for your prompt response. Additionally, I noticed that the validation results of the final epoch during training differ from the values obtained when testing with model_last. Why does this phenomenon occur, and which values should be considered as the final results? image image

zzzqzhou commented 8 months ago

Sorry for the late reply. This is because we do not freeze batch normalization when testing. It is a trick that can improve the final result on test domain. But in training code, we freeze the batch normalization when evaluating. You can also freeze the batch normalization by adding --freeze_bn argument in testing scripts, then the test result will be same as the result from the final epoch during training.