neheller / kits21

The official repository of the 2021 Kidney and Kidney Tumor Segmentation Challenge
MIT License
173 stars 47 forks source link

Inconsistent results on Baseline model #60

Closed roman-mishchenko closed 1 year ago

roman-mishchenko commented 1 year ago

Hi, I've noticed that my results with Baseline Model (Described here) are inconsistent with those described in this repository.

I used the unchanged KiTS21 Dataset and ran predictions with the 3d_fullres Baseline model without changes in weights. However, after running sample_segmentations and evaluate_predictions, I got different results that are described in this repository and paper. All source code I left unchanged (that counts for seeds for randomization).

Do you know any particular reason why this can happen?

Dice_kidney Dice_masses Dice_tumor Dice_average SurfDice_kidney SurfDice_masses SurfDice_tumor SurfDice_average
3d_fullres 0.9666 0.8618 0.8493 0.8926 0.9336 0.7532 0.7371 0.8080
my 3d_fullres 0.9732 0.9149 0.9188 0,9356 0.9454 0.8450 0.8485 0.8796
neheller commented 1 year ago

@FabianIsensee would be the expert on this. Any ideas?

FabianIsensee commented 1 year ago

The baseline is trained as a 5-fold cross-validation. So when reproducing the results you need to respect the splits: you need to identify which cases where in the validation set of fold 0 and run prediction on those with fold0 only, then move on to fold1 etc. You cannot just run nnUNet_predict beause then you are essentially predicting training data Best, Fabian

roman-mishchenko commented 1 year ago

Got it, and after the merge of predictions of all 5 folds, run evaluate_predictions

FabianIsensee commented 1 year ago

yes, exactly