lindawangg / COVID-Net

COVID-Net Open Source Initiative
Other
1.15k stars 482 forks source link

Updated patient count for COVIDx8 and fixed bug in eval script #154

Closed mayaliliya closed 3 years ago

mayaliliya commented 3 years ago

Pull Request Template

Did a couple changes:

  1. Eval script printed sensitivity instead of PPV so fixed that
  2. Patient count for COVIDx8
  3. Resampled one of the pneumonia image in test COVIDx8 to ensure full test set is from RSNA
  4. Updated CXR-2 model numbers accordingly

Description

Please include a summary of the change. Please also include relevant motivation and context. List any additional libraries that will be affected. List any developers that will be affected or those who you had merge conflicts with.

Context of change

Please add options that are relevant and mark any boxes that apply.

Type of change

Please mark any boxes that apply.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

Checklist:

Please mark any boxes that have been completed.

haydengunraj commented 3 years ago

I built the dataset from fresh downloads and I get the same test results you did, but I get very different image counts when creating the dataset. Specifically, I get:

Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 4649}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 274}
Total length of train:  18443
Total length of test:  1773
Length of final test set :  374

whereas the notebook in the repo has:

Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 2158}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 291}
Total length of train:  15952
Total length of test:  1790
Length of final test set :  391

I'm not sure where all the extra COVID-19 images are coming from, and I don't love the fact that both notebooks report "Length of final test set" < 400 even though the test set has exactly 400 images.

mayaliliya commented 3 years ago

I built the dataset from fresh downloads and I get the same test results you did, but I get very different image counts when creating the dataset. Specifically, I get:

Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 4649}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 274}
Total length of train:  18443
Total length of test:  1773
Length of final test set :  374

whereas the notebook in the repo has:

Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 2158}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 291}
Total length of train:  15952
Total length of test:  1790
Length of final test set :  391

I'm not sure where all the extra COVID-19 images are coming from, and I don't love the fact that both notebooks report "Length of final test set" < 400 even though the test set has exactly 400 images.

Noted: The difference in COVID-19 image numbers is the result of the update in the SIRM dataset which now includes data from BMICV. Images from BMICV are not accounted for in COVIDx8 but will in future COVIDx releases.

As of right now, the updated dataset numbers do not affect the train and test set curation for COVIDx8 and can proceed per the usual timeline and be updated in the next COVIDx version.