lindawangg / COVID-Net

COVID-Net Open Source Initiative
Other
1.15k stars 480 forks source link

Confirmation on the data splits and benchmark results with {train,test}_COVIDx2.txt #53

Closed saimunur closed 4 years ago

saimunur commented 4 years ago

Hi Linda,

Thanks for providing nice guidelines for the COVIDx dataset and COVID-Net. I recently compiled the dataset using the guideline provided in https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md. However, I noticed that the test class distribution is slightly different than the one presented in https://github.com/lindawangg/COVID-Net#results. I have used the https://github.com/lindawangg/COVID-Net/blob/master/create_COVIDx_v3.ipynb script, train_COVIDx2.txt and test_COVIDx2.txt files. For your reference, I observed the following data distribution:

COVID19 -> Train (223 images), Test (31 images) Normal -> Train (7966 images), Test (885 images) Prenumonia -> Train (5451 images), Test (594 images)

Kindly confirm whether the distribution is correct.

Furthermore, do you have any benchmark result with the above data distribution? The benchmark presented in https://github.com/lindawangg/COVID-Net#results is with the lesser test samples. What version of data distribution do you recommend for comparison with the COVID-Net? Kindly advise.

I look forward to your answers. Thank you.

Regards, Saimun

saimunur commented 4 years ago

Hi Linda,

Sorry for posting the above. I just found that I have been using the {train,test}_split_v3.txt produced by the dataset generator script which caused all the confusion. Please ignore this post. I am marking it as 'closed'.

Thank you.