kbressem / prostate158

MIT License
26 stars 7 forks source link

eval_loss metric #12

Closed majoreks closed 3 months ago

majoreks commented 3 months ago

Hello! Thank you for providing this project, we've been trying to adapt it to our dataset and train it for segmenting prostate cancer. Something that we've noticed however is that our eval_loss metric seems to be behaving strangely. For one of our experiments, the metrics look as follows image what puzzles us is that even though val_mean_dice is improving, or at least changing, the eval_loss is completely flat after few epochs. The model then performs decently on test set that it hasn't seen before.
We have also tried to perform an experiment directly as is in the repository of the project, with metrics that ended up looking quite similar. image Note that for the first experiment (on our dataset) we've done some changes (for example using DataParallel on the model, changing the batch size to accommodate our hardware better and we've changed the spacing to [0,5;0,5;1,5], however the model, loss function and optimizer were untouched.). One of our guesses was that maybe empty images were somehow messing with the metrics, by producing NaNs or infinities, but an experiment without empty masks looked quite similar in the end.
We've been wondering if you maybe have encountered similar issue in your experiments and could point us to a potential problem that we might have introduced or to some direction that could explain this problem. Thanks!

kbressem commented 3 months ago

Its some time since I've trained the models and I do not remember this exact issue. However, empty images can mess with the val loss. Also make sure, the data does not get converted to discrete numbers before the val loss is calculated. In the MONAI workflows the val loss gets treated as a metric, so postprocessing is applied before it is passed to the metrics, meaning values get converted to discrete, messing with the loss calculation and leading to a flat line. So maybe catch the output, before it gets send to the loss function.

majoreks commented 3 months ago

Indeed, removing the AsDiscrete transformations from postprocessing of SupervisedEvaluator seems to have done the trick, thank you a lot for the advice!
To preserve the metrics that are already set in place we've added another ValidationHandler to train_handlers that has only EnsureTyped transformation, it slows down the trianing a bit by executing 2 validation handlers, but that way we seem to have easier time controlling the loss metric and other metrics that need the data transformed as it is in get_val_post_transforms. I suppose it should be possible to modify output_transform of other metrics instead

kbressem commented 3 months ago

Good to hear that it works now! Happy to help