Closed AFAgarap closed 3 years ago
I got the similar issues trying to run create_COVIDx code.
First, the xlrd package, used by Pandas to open Excel files, does not support .xlsx file format anymore, so the code to open metadata file in the 3rd cell does not work. The xlrd developers suggest to install openpyxl package and adapt the read function to use this engine, like below:
sirm_csv = pd.read_excel(sirm_csvpath, engine='openpyxl')
I also got an error running the line:
imagename = patientid.split('(')[0] + ' (' + patientid.split('(')[1] + '.' + row['FORMAT'].lower()
As could inspect, the metadata file 'COVID.metadata.xlsx' has the user ID string without the parenthesis, while the actual image filenames have them. The code above try to split the string based on a parenthesis character that does not exist in the metadata file.
@AFAgarap I don't believe that removing the line you mentioned would help you (or us). That line removes the extra space characters from the file names in order to open them (the actual file names have this space between 'COVID-19' string and the user ID). I think that this is your issue running the 6th cell, removing the filename editing function, you will not be able to open the images later.
Anyway, there are some changes to do in the code, and I also suggest to update them.
Sorry, I wasn't able to update here. But I finally got it working. I changed something in their code, particularly in the filename.
The issue is caused by an update of the Kaggle dataset.
I solved changing the name of the files into "COVID-19" (instead of the new "COVID") so that the enumeration with parenthesis is given by the O.S.
Also I edited the "COVID.metadata.xlsx" using =CONCATENATE("COVID-19(", E2,")") where column E is 1 to 1199 in order to use the old scripts
Exactly as @LucasPMoreira said
And installed Pillow to remove the cv2.cvtColor error
Cell 4 was the actual problem for me, I solved it by removing the line I referenced, and then replaced it with the following,
imagename = "COVID ({}).png".format(imagename.rsplit(".png")[0].split("COVID ")[1])
That's the only part I changed. It's for both the generation of binary and multi classification datasets notebooks.
And installed Pillow to remove the cv2.cvtColor error
I didn't experience any problem with OpenCV with regards to Pillow since I already had Pillow installed even before this.
This issue has been resolved with the release of the COVIDx V7A and V7B datasets, where in addition to a larger patient cohort the generation scripts have been modified based on changes to file structures in the other databases.
Resolved in 11635f7662284ca7b3075e814b33fd93bc94c127
Issue Template
Before posting, have you looked at the FAQ page?
Yes. My question is not addressed there.
Description
Please include a summary of the issue. The dataset generation notebooks might be out-of-date (
create_COVIDx.ipynb
andcreate_COVIDx_binary.ipynb
). When I ran the notebooks, they both have failed even though I changed the directory of the dataset folders.Please include the steps to reproduce. I followed the steps in COVIDx.md.
List any additional libraries that are affected. None
Steps to Reproduce
I followed the steps in data generation.
Expected behavior
The one in the notebooks
Actual behavior
When I remove the following line,
The 4th cell of
create_COVIDx_binary.ipynb
passes with the following output,This is okay, right? But when I get to the 6th cell, this is the output,
Environment