tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.31k stars 1.54k forks source link

Tensorflow Datasets Loading to GCS fails #740

Open captain-pool opened 5 years ago

captain-pool commented 5 years ago

Short description TFDS is failing while loading dataset on GCS. For smaller datasets like MNIST, there's no error and the code just runs fine. however for bigger datasets like "Coco2014", "imagenet" and "Open Images v4", the process stops, and gives the following error in the JSON Response.

Error: 500 Backend Error. Error Uploading to gs://bucket_name/path/to/store. Retrying Upload.

Environment information

Reproduction instructions

import tensorflow_datasets as tfds
ds = tfds.load("open_images_v4", data_dir="gs://bucket/path") # Fails
ds = tfds.load("coco2014", data_dir="gs://bucket/path") # Fails
ds = tfds.load("imagenet", data_dir="gs://bucket/path") # Fails
ds = tfds.load("mnist", data_dir="gs://bucket/path") # Works fine

Logs

2019-07-05 03:26:19.998078: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.35809 seconds (attempt 1 out of 10), caused by: Unavailable: Upload to gs://rickdey1998/data_dir/downloads/extracted/ZIP.images.cocodataset.org_zips_val2014_pvoFgUgScNHF-B32eNKpggUpVZ5-ATNBD48vuO5_eA.zip.incomplete_71bc721e170542848983123a225ba268/val2014/COCO_val2014_000000047149.jpg failed, caused by: Not found: Error executing an HTTP request: HTTP response code 410 with body '{

 "error": {

  "errors": [

   {

    "domain": "global",

    "reason": "backendError",

    "message": "Backend Error"

   }

  ],

  "code": 500,

  "message": "Backend Error"

 }

}

'

         when resuming upload gs://rickdey1998/data_dir/downloads/extracted/ZIP.images.cocodataset.org_zips_val2014_pvoFgUgScNHF-B32eNKpggUpVZ5-ATNBD48vuO5_eA.zip.incomplete_71bc721e170542848983123a225ba268/val2014/COCO_val2014_000000047149.jpg

Expected behavior No error with larger datasets, and the dataset gets loaded on GCS, without any error, like any other smaller datasets.

Steps Already Taken

CCs: @vbardiovskyg @srjoglekar246

@rsepassi @Conchylicultor @cyfra

cyfra commented 5 years ago

Thanks for the bug report. As you've noticed: it might be corelated with dataset size or maybe number of files created. The debugging it will take a while (as many people from the teams are on vacation).

The two workarounds I'd suggest are: