tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.29k stars 1.54k forks source link

curated_breast_imaging_ddsm download and use #1566

Closed JobCollins closed 4 years ago

JobCollins commented 4 years ago

What I need help with / What I was wondering I have tried following instructions at https://www.tensorflow.org/datasets/catalog/curated_breast_imaging_ddsm to be able to download the curated breast imaging datasets but I am not able. For one, the download link does not work and the documentation on the same is scanty.

What I've tried so far I have tried loading the data directly

(train_examples, validation_examples), info = tfds.load( 'curated_breast_imaging_ddsm', with_info=True, as_supervised=True, split=['train[:80%]', 'train[80%:]'], )

but I got an assertion error

AssertionError: Manual directory /root/tensorflow_datasets/downloads/manual/curated_breast_imaging_ddsm does not exist. Create it and download/extract dataset artifacts in there. Additional instructions: You can download the images from https://wiki.cancerimagingarchive.net/display/Public/CBIS-DDSM Please look at the source file (cbis_ddsm.py) to see the instructions on how to conver them into png (using dcmj2pnm). the download link above does not work.

It would be nice if... There was a clear documentation of how to go about the process of manually downloading and setting up the dataset for upload. Also, I would appreciate if someone could help me understand how to load this or a pointer to a similar tf dataset that does not require manual download.

Eshan-Agarwal commented 4 years ago

This data-set require manual downloading you can download it here

Or for dataset that do no require manual downloading you can use this to load data import tensorflow_datasets as tfds dataset, info = tfds.load(name="mnist", with_info=True, split="train") print(info)

JobCollins commented 4 years ago

Thanks @Eshan-Agarwal ! So I am realizing that the link works on a different network from the one I used before. The size of the data is a mammoth 163GB. How can I use the curated_breast_imaging_ddsm.py file to download a portion of the data?

Eshan-Agarwal commented 4 years ago

I think curated_breast_imaging_ddsm.py is for testing or generating some fake examples after you manually download data. Is it mention anywhere that you can use the curated_breast_imaging_ddsm.py file to download a portion of the data?

vijayphoenix commented 4 years ago

curated_breast_imaging_ddsm.py is only for preparing the manually downloaded data, so that we can use the data with tfds pipeline. @JobCollins I found this link on their site. Download a subset of the data from the above link and follow the instructions at https://www.tensorflow.org/datasets/catalog/curated_breast_imaging_ddsm to prepare the data

JobCollins commented 4 years ago

Awesome @vijayphoenix ! let me take a look at the link and then I will tell whether it worked for me or not. Thanks!!

Eshan-Agarwal commented 4 years ago

@JobCollins please close this issue if your problem get solved.

Conchylicultor commented 4 years ago

Closing this. Please re-open if the issue wasn't solved

JobCollins commented 4 years ago

Sorry that this escaped me. Thank you @vijayphoenix I was able to download the data. Currently on conversion. I will re-open this when I need more help.

Mahabuburr commented 3 years ago

Please help argent, I need a mammogram dataset which includes train, test, validation dataset but I can not found any dataset. when I go (https://www.tensorflow.org/datasets/catalog/curated_breast_imaging_ddsm#curated_breast_imaging_ddsmpatches_default_config) this side I can not understand anything for download train, test validation dataset. I need different types of datasets for the mammogram dataset which include test, train, and validation. Please help to download this dataset urgently

Conchylicultor commented 3 years ago

Please follow manual instructions from https://www.tensorflow.org/datasets/catalog/curated_breast_imaging_ddsm

Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/): You can download the images from https://wiki.cancerimagingarchive.net/display/Public/CBIS-DDSM

Because special software and libraries are needed to download and read the images contained in the dataset, TFDS assumes that the user has downloaded the original DCIM files and converted them to PNG.

The following commands (or equivalent) should be used to generate the PNG files, in order to guarantee reproducible results:

  find $DATASET_DCIM_DIR -name '*.dcm' | \\
  xargs -n1 -P8 -I{} bash -c 'f={}; dcmj2pnm $f | convert - ${f/.dcm/.png}'

Resulting images should be put in manual_dir, like: <manual_dir>/Mass-Training_P_01981_RIGHT_MLO_1/1.3.6.../000000.png.