voreille / hecktor

MIT License
76 stars 25 forks source link

The validation dice is similar when 10% or 100% train datasets were used with the same validation sets. #9

Closed szhang963 closed 3 years ago

szhang963 commented 3 years ago

Hello! I found an uncommon result.The validation results were similar when I used different numbers of patient case in 10% and 100% of training datasets.I have tested different codes, including this repository (the 3D dense_vnet result:23 train case:0.5914 ,180 train case:0.6233).At last, I got the same conclusion,especially in 2D,and the results are almost the same. The case did not happen,when I used the type of dataset,which was randomly split as train and validation set in the way of shuffle all slices of all patient cases, instead of shuffling all patient cases. Why ? The cause of data distribution? I would appreciate you,if you can help me.Thanks!!!

voreille commented 3 years ago

Hi,

Can you be more precise, especially, what do you mean by "The case did not happen,when I used the type of dataset, [...]" ? What did you observe in this case? Thank you.

Best, Valentin

szhang963 commented 3 years ago

Hi,I extracted all 2D slices(only include the slices with nidus) from all train 3D volumes,about 6400+ slices. Then, I randomly split these slices into 80% train set and 20% validation set to train my 2D Unet model.The test results based on above train and validation sets were that the dice of the same validation dataset were 0.71 and 0.86 for pet data(0.58 and 0.86 for ct data) in the same hyperparameters,after 10% and 100% data in train dataset was trained. That's all. Thank you.

voreille commented 3 years ago

Hi,

I am not really sure I understand what you describes here. First, did you mix all the slices together without taking into account the volume they came from ? In that case, I would expect the DSC of the validation to be higher since the model is trained on slices that could be really close to the ones in the validation.

Second, in your first comment, I am not sure I understand what is uncommon about you results, could you clarify it ? Thank you.

Best, Valentin

szhang963 commented 3 years ago

Hi,for the first opinion,I know the result for my type of data split ,but I mainly care the result for 10% train data and 100% train data.They had a clear difference on Dice,which was a normal result that the dice is lower when less train data was trained,because I think the slices split randomly had a consistent data distribution between train dataset and validation dataset. Second,as you said,the model is trained on slices that could be really close to the ones in the validation.Therefore,if I split datasets by patient cases randomly,and then I extract slices from these volumes,just like your dataset_split.csv file,I will get a uncommon result.I only used 10% train data or less,and then I got a higher Dice,which was similar to the result using all train data. In a word,it is right that the less training data you have, the less accuracy you should get.However,the hecktor dataset is not such.I don't know why it is. Thank you.

szhang963 commented 3 years ago

hi ,I realised that the question mentioned above seem to be a common phenomenon in a lot of medical datasets.