mayaliliya commented 3 years ago

Pull Request Template

Did a couple changes:

Eval script printed sensitivity instead of PPV so fixed that
Patient count for COVIDx8
Resampled one of the pneumonia image in test COVIDx8 to ensure full test set is from RSNA
Updated CXR-2 model numbers accordingly

Description

Please include a summary of the change. Please also include relevant motivation and context. List any additional libraries that will be affected. List any developers that will be affected or those who you had merge conflicts with.

Context of change

Please add options that are relevant and mark any boxes that apply.

[ ] Software (software that runs on the PC)
[ ] Library (library that runs on the PC)
[ ] Tool (tool that assists coding development)
[ ] Other

Type of change

Please mark any boxes that apply.

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

[ ] Test A
[ ] Test B

Checklist:

Please mark any boxes that have been completed.

[ ] I have performed a self-review of my own code.
[ ] I have commented my code, particularly in hard-to-understand areas.
[ ] I have made corresponding changes to the documentation.
[ ] My changes generate no new warnings.
[ ] Any dependent changes have been merged and published in downstream modules.

haydengunraj commented 3 years ago

I built the dataset from fresh downloads and I get the same test results you did, but I get very different image counts when creating the dataset. Specifically, I get:

Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 4649}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 274}
Total length of train:  18443
Total length of test:  1773
Length of final test set :  374

whereas the notebook in the repo has:

Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 2158}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 291}
Total length of train:  15952
Total length of test:  1790
Length of final test set :  391

I'm not sure where all the extra COVID-19 images are coming from, and I don't love the fact that both notebooks report "Length of final test set" < 400 even though the test set has exactly 400 images.

mayaliliya commented 3 years ago

I built the dataset from fresh downloads and I get the same test results you did, but I get very different image counts when creating the dataset. Specifically, I get:
Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 4649}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 274}
Total length of train:  18443
Total length of test:  1773
Length of final test set :  374
whereas the notebook in the repo has:
Final stats
Train count:  {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 2158}
Test count:  {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 291}
Total length of train:  15952
Total length of test:  1790
Length of final test set :  391
I'm not sure where all the extra COVID-19 images are coming from, and I don't love the fact that both notebooks report "Length of final test set" < 400 even though the test set has exactly 400 images.

Noted: The difference in COVID-19 image numbers is the result of the update in the SIRM dataset which now includes data from BMICV. Images from BMICV are not accounted for in COVIDx8 but will in future COVIDx releases.

As of right now, the updated dataset numbers do not affect the train and test set curation for COVIDx8 and can proceed per the usual timeline and be updated in the next COVIDx version.

lindawangg / COVID-Net