Closed mayaliliya closed 3 years ago
I built the dataset from fresh downloads and I get the same test results you did, but I get very different image counts when creating the dataset. Specifically, I get:
Final stats
Train count: {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 4649}
Test count: {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 274}
Total length of train: 18443
Total length of test: 1773
Length of final test set : 374
whereas the notebook in the repo has:
Final stats
Train count: {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 2158}
Test count: {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 291}
Total length of train: 15952
Total length of test: 1790
Length of final test set : 391
I'm not sure where all the extra COVID-19 images are coming from, and I don't love the fact that both notebooks report "Length of final test set" < 400 even though the test set has exactly 400 images.
I built the dataset from fresh downloads and I get the same test results you did, but I get very different image counts when creating the dataset. Specifically, I get:
Final stats Train count: {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 4649} Test count: {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 274} Total length of train: 18443 Total length of test: 1773 Length of final test set : 374
whereas the notebook in the repo has:
Final stats Train count: {'negative': 353, 'normal': 7966, 'pneumonia': 5475, 'COVID-19': 2158} Test count: {'negative': 20, 'normal': 885, 'pneumonia': 594, 'COVID-19': 291} Total length of train: 15952 Total length of test: 1790 Length of final test set : 391
I'm not sure where all the extra COVID-19 images are coming from, and I don't love the fact that both notebooks report "Length of final test set" < 400 even though the test set has exactly 400 images.
Noted: The difference in COVID-19 image numbers is the result of the update in the SIRM dataset which now includes data from BMICV. Images from BMICV are not accounted for in COVIDx8 but will in future COVIDx releases.
As of right now, the updated dataset numbers do not affect the train and test set curation for COVIDx8 and can proceed per the usual timeline and be updated in the next COVIDx version.
Pull Request Template
Did a couple changes:
Description
Please include a summary of the change. Please also include relevant motivation and context. List any additional libraries that will be affected. List any developers that will be affected or those who you had merge conflicts with.
Context of change
Please add options that are relevant and mark any boxes that apply.
Type of change
Please mark any boxes that apply.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
Checklist:
Please mark any boxes that have been completed.