tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.28k stars 1.53k forks source link

Trouble Loading Diabetic Retinopathy Detection Dataset #558

Closed dwan1545 closed 5 years ago

dwan1545 commented 5 years ago

What I need help with / What I was wondering I am confused on how to access the Diabetic Retinopathy Detection dataset from tensorflow datasets.

What I've tried so far I have created the Builder object as follows:

builder = tfds.builder("diabetic_retinopathy_detection")

But then when I go to the next step:

builder.download_and_prepare()

I get a slew of errors about wrong directories. I have attached the error log when I run the above command. log.txt

It would be nice if... It's not clear what additional steps need to be taken to access each dataset (e.g., download particular files, place them into particular directory structures, etc.)

Environment information (if applicable)

Conchylicultor commented 5 years ago

The KaggleAPI was implemented before tfds supported Kaggle API to download Kaggle datasets. Indeed the current solution is badly documented. You should download the files from Kaggle and extract them into ~/tensorflow_datasets/downloads/manual/diabetic_retinopathy_detection/

The folder should contains files like:

sample/...
synthetic/...
train/...
test/...
sampleSubmission.csv
trainLabels.csv
dwan1545 commented 5 years ago

Thank you for the response, I was able to make that work. For those interested in future, I used the Kaggle API to download all the zip files into a directory of my choice (used external hard drive). I then used 7zip to unzip all the files. I then placed all the files in my_dir/manual/diabetic_retinopathy_detection and used the file structure given by Conchylicultor. Then I ran the following commands to build the dataset:

builder = tfds.builder("diabetic_retinopathy_detection") builder.download_and_prepare(download_dir=my_dir)

One more question before I close this - the download_and_prepare() function starts building a huge dataset on my machine in ~/tensorflow_datasets/diabetic_retinopathy_detection/original. Is there anyway I can change the location of this to my external hard drive, so it doesn't destroy my Mac's storage? I'm looking through the tfds documentation, and can't seem to find the appropriate variable to change.

dwan1545 commented 5 years ago

Ah, I answered my question. The key is to initialize the builder with the right data_dir as follows:

builder = tfds.builder(name="diabetic_retinopathy_detection", data_dir=my_dir)

Then when calling download_and_prepare, the function will work in the specified directory.

Closing this now, thanks for the help!