Closed majoreks closed 3 months ago
Its some time since I've trained the models and I do not remember this exact issue. However, empty images can mess with the val loss. Also make sure, the data does not get converted to discrete numbers before the val loss is calculated. In the MONAI workflows the val loss gets treated as a metric, so postprocessing is applied before it is passed to the metrics, meaning values get converted to discrete, messing with the loss calculation and leading to a flat line. So maybe catch the output, before it gets send to the loss function.
Indeed, removing the AsDiscrete
transformations from postprocessing
of SupervisedEvaluator
seems to have done the trick, thank you a lot for the advice!
To preserve the metrics that are already set in place we've added another ValidationHandler
to train_handlers
that has only EnsureTyped
transformation, it slows down the trianing a bit by executing 2 validation handlers, but that way we seem to have easier time controlling the loss metric and other metrics that need the data transformed as it is in get_val_post_transforms
. I suppose it should be possible to modify output_transform
of other metrics instead
Good to hear that it works now! Happy to help
Hello! Thank you for providing this project, we've been trying to adapt it to our dataset and train it for segmenting prostate cancer. Something that we've noticed however is that our
eval_loss
metric seems to be behaving strangely. For one of our experiments, the metrics look as follows what puzzles us is that even thoughval_mean_dice
is improving, or at least changing, theeval_loss
is completely flat after few epochs. The model then performs decently on test set that it hasn't seen before.We have also tried to perform an experiment directly as is in the repository of the project, with metrics that ended up looking quite similar. Note that for the first experiment (on our dataset) we've done some changes (for example using
DataParallel
on the model, changing the batch size to accommodate our hardware better and we've changed the spacing to [0,5;0,5;1,5], however the model, loss function and optimizer were untouched.). One of our guesses was that maybe empty images were somehow messing with the metrics, by producing NaNs or infinities, but an experiment without empty masks looked quite similar in the end.We've been wondering if you maybe have encountered similar issue in your experiments and could point us to a potential problem that we might have introduced or to some direction that could explain this problem. Thanks!