mikevoets / jama16-retina-replication

JAMA 2016; 316(22) Replication Study
https://doi.org/10.1371/journal.pone.0217541
MIT License
110 stars 37 forks source link

Regarding EyePACS test image true labels #19

Closed gayalkuruppu closed 1 year ago

gayalkuruppu commented 1 year ago

In preprocess_eyepacs.py, the code uses a testLabels.csv file which is not available in the Kaggle dataset. There is a new csv file called sampleSubmission.csv instead which does not contain true labels for the images(all labels are class 0). But in the discussion tab of Kaggle, they have provided a file called retinopathy_solution.csv, which seems like true labels. But some comments are negative about the labels. So I am having doubts about the authenticity about the labels. Can you provide the true labels for the test images?

A preview of sampleSubmission.csv &

Screenshot 2023-02-13 at 10 44 59

A preview of retinopathy_solution.csv

Screenshot 2023-02-13 at 10 46 57

mikevoets commented 1 year ago

Hi there @gayalkuruppu, first of all, thanks for the interest in this project! Yeah, we did get access to the test labels from the EyePACS data set, and they are placed in https://github.com/mikevoets/jama16-retina-replication/tree/master/vendor/eyepacs.

Note that you only need to place the EyePACS data set in the data directory as-is from Kaggle (i.e. without the test labels). When you run the eyepacs.sh Bash script, the script will grab the test labels from the vendor folder, and include these test labels in the TFRecord distribution files for training & testing:

  1. Run $ ./eyepacs.sh to decompress and preprocess the Kaggle EyePACS data set, and redistribute this set into a training and test set. Run with the --only_gradable flag if you want to train and evaluate with gradable images only. NB: This is a large data set, so this may take hours to finish.

Hope that helps you!