Closed jcohenadad closed 1 year ago
With https://github.com/ivadomed/model_seg_mouse-sc_wm-gm_t1/commit/9c60213ed37a3617113856aa4b8bfdf4b7e2a708, I get better results with only 2 models (39-40, green) vs. 4 models (39-42, blue):
Maybe an issue with the majority voting code?
Here are each individual predictions (one per model):
Looking at the individual predictions, it does make sense that the 39-40 would produce better predictions in this case.
In light of these results, it might be more interesting to create another aggregation method that would take the max across segmentations (as opposed to majority vote).
Five best models: 39, 40, 41, 42, 44
The idea would be to train several models, with randomized train/validation split, and then aggregate the various inferences.
Possibly relevant: https://colab.research.google.com/github/Project-MONAI/tutorials/blob/main/modules/cross_validation_models_ensemble.ipynb#scrollTo=ZWSGnqA12FgX