Inconsistent results on Baseline model

roman-mishchenko commented 1 year ago

Hi, I've noticed that my results with Baseline Model (Described here) are inconsistent with those described in this repository.

I used the unchanged KiTS21 Dataset and ran predictions with the 3d_fullres Baseline model without changes in weights. However, after running sample_segmentations and evaluate_predictions, I got different results that are described in this repository and paper. All source code I left unchanged (that counts for seeds for randomization).

Do you know any particular reason why this can happen?

	Dice_kidney	Dice_masses	Dice_tumor	Dice_average		SurfDice_kidney	SurfDice_masses	SurfDice_tumor	SurfDice_average
3d_fullres	0.9666	0.8618	0.8493	0.8926		0.9336	0.7532	0.7371	0.8080
my 3d_fullres	0.9732	0.9149	0.9188	0,9356		0.9454	0.8450	0.8485	0.8796

neheller commented 1 year ago

@FabianIsensee would be the expert on this. Any ideas?

FabianIsensee commented 1 year ago

The baseline is trained as a 5-fold cross-validation. So when reproducing the results you need to respect the splits: you need to identify which cases where in the validation set of fold 0 and run prediction on those with fold0 only, then move on to fold1 etc. You cannot just run nnUNet_predict beause then you are essentially predicting training data Best, Fabian

roman-mishchenko commented 1 year ago

Got it, and after the merge of predictions of all 5 folds, run evaluate_predictions

FabianIsensee commented 1 year ago

yes, exactly

neheller / kits21

Inconsistent results on Baseline model #60