jannisborn / covid19_ultrasound

Open source lung ultrasound (LUS) data collection initiative for COVID-19.
https://www.mdpi.com/2076-3417/11/2/672
151 stars 80 forks source link

split data how its work !!! #87

Closed TIKO1993 closed 3 years ago

TIKO1993 commented 3 years ago

hi i would like to thank you guys for this wonderful work , just i have a question about how can i split the dataset , after the using the explaination in read e i got a dataset of 3 classes (covid, pnemounia, regular), i start working with the dataset , and i split the dataset randomly in 80% for training and 20% for test im just asking if this is a good why to split the dataset , it seems that the test will content also the images token from the videos for the training

jannisborn commented 3 years ago

HI @TIKO1993, thanks for your interest in our work. I think the README is pretty clear about this (see https://github.com/jannisborn/covid19_ultrasound/tree/master/pocovidnet#cross-validation-splitting).

If you randomly split the images into 80/20 train/test your split will be meaningless. Your performance will be almost perfect but it will mean nothing. You have to split on the video level, not on the image level, that's why we provide the code to split the data correctly for you.

TIKO1993 commented 3 years ago

sir could you please explain more, so i split the to 5 folders i will train the model on 4 folders and i keep 1 folder for test ???

jannisborn commented 3 years ago

If you follow the readme, things are being set up for you correctly.

TIKO1993 commented 3 years ago

it's ok thank you