lindawangg / COVID-Net

COVID-Net Open Source Initiative
Other
1.15k stars 477 forks source link

create_covidx no longer compatible with old datasets #127

Closed Electro1111 closed 3 years ago

Electro1111 commented 3 years ago

in addition to the issue I posted here: https://github.com/lindawangg/COVID-Net/issues/126#issue-802926486

the create_covidx.ipynb also doesnt seem to work for anything earlier like covidx5.txt because there are files in the train_covidx5.txt and test_covidx5.txt files that are missing in data/test and data/train after running create_covidx.ipynb

specifically

{'COVID-19(216).png', 'COVID-19(106).png', 'COVID-19(94).png', 'COVID-19(107).png', 'COVID-19(77).png', 'COVID-19(129).png', 'COVID-19(119).png', 'COVID-19(215).png', 'COVID-19(213).png', 'COVID-19(116).png', 'COVID-19(214).png', 'COVID-19(95).png', 'COVID-19(70).png', 'COVID-19(81).png', 'COVID-19(87).png', 'COVID-19(131).png', 'COVID-19(72).png'}

are missing from the data/test folder and

{'COVID-19(69).png', 'COVID-19(99).png', 'COVID-19(76).png', 'COVID-19(108).png', 'COVID-19(71).png', 'COVID-19(85).png', 'COVID-19(83).png', 'COVID-19(74).png', 'COVID-19(130).png', 'COVID-19(92).png', 'COVID-19(82).png', 'COVID-19(84).png', 'COVID-19(132).png', 'COVID-19(80).png', 'COVID-19(75).png', 'COVID-19(93).png', 'COVID-19(120).png', 'COVID-19(118).png', 'COVID-19(89).png', 'COVID-19(91).png', 'COVID-19(109).png', 'COVID-19(79).png', 'COVID-19(114).png', 'COVID-19(90).png', 'COVID-19(98).png', 'COVID-19(78).png', 'COVID-19(121).png', 'COVID-19(133).png', 'COVID-19(115).png'}

are missing from data/train.

Electro1111 commented 3 years ago

ok so I figured out the issue but not sure what to do about it

it seems that when the images are re-saved into the data folder they are renamed:

in COVIDx7A and in the data folder generated from the current version of create_covidx.ipynb the names of these files are like this missing the '-' that they had in COVIDx5 and earlier versions of the code:

{'COVID(106).png', 'COVID(107).png', 'COVID(116).png', 'COVID(119).png', 'COVID(129).png', 'COVID(131).png', 'COVID(213).png', 'COVID(214).png', 'COVID(215).png', 'COVID(216).png', 'COVID(70).png', 'COVID(72).png', 'COVID(77).png', 'COVID(81).png', 'COVID(87).png', 'COVID(94).png', 'COVID(95).png'}

baranaldemir commented 3 years ago

@Electro1111 Did you find any solution for this problem I have the same issue? I mean except for renaming the files ofc

Electro1111 commented 3 years ago

Hello!

the issue seems to be with the covid kaggle dataset versions. So the easiest solution is to download the old version of the covid kaggle dataset. Version 1 of the kaggle dataset should work for Covidx5 and earlier I believe since COVIDx5 was commit on 10/24, and version 2 of the kaggle dataset says it came out 3 months ago, which would mean roughly december.

It is also possible that changes to this repo were made, particularly to create_covidx.ipynb to make it work with the new names (but I am not sure), but to be safe it might be better to revert to an old version of this repo as well. I think the Commits on Nov 3, 2020 would be the one you want.

to summarize,

download version 1 of the covid kaggle dataset instead of version 4 (which is the current version)

clone the Nov 3, 2020 version of this repository.

let me know if this helps!

haydengunraj commented 3 years ago

Closing this now, and also adding that the current dataset is available in a prepared form on Kaggle.