lindawangg / COVID-Net

COVID-Net Open Source Initiative
Other
1.15k stars 482 forks source link

New COVIDx8B dataset is created in a way where files like COVID(1).png or COVID(2).png are in the labels file, but not in the actual data. #158

Closed chododom closed 3 years ago

chododom commented 3 years ago

It seems that the real name of file COVID(1).png is COVID-1.png and analogously for the others.

chododom commented 3 years ago

I have solved this by renaming the files in my scripts followingly:

labels_file = open(labels_file_path, 'r')
while True:
    line = labels_file.readLine()
    if not line:
        break
    split = line.split(' ')
    if split[-3].startswith('COVID(')
        name = split[-3].replace('(', '-')
        corrected_filename = name.replace(')', '')

All other files seem to be present.

mayaliliya commented 3 years ago

Hi @chododom, the COVIDx8 dataset was created with Version 3 of the COVID-19 Radiography Database which can be downloaded from here: https://www.kaggle.com/tawsifurrahman/covid19-radiography-database/version/3

Please use this version of the dataset for COVIDx8 as the images do not map directly between Versions 3 and 4. We will make note of this versioning in the COVIDx docs moving forward, thank you for bringing it up!

chododom commented 3 years ago

Ohh okay, thank you @mayaliliya , maybe it would be good to include this in the guide to setting up the dataset, cloning the repo gets the latest version :)

khanbhai0078 commented 3 years ago

In what place you have added this code?