Open Acebee opened 4 years ago
Thanks for reporting.
It seems that using tf.io.gfile
with python zipfile
results in corruption of the data. (For some reason Windows only)
Related https://github.com/tensorflow/tensorflow/issues/32975
I am actually seeing the same issue on Ubuntu 20.04 with the same dataset 'cats_vs_dogs'. Fine-tuning EfficientNet B4, getting similar errors. A separate investigation showed 'corrupted' file names change on every run.
Epoch 1/20
75/234 [========>.....................] - ETA: 43s - loss: 0.3529 - accuracy: 0.9261Corrupt JPEG data: 99 extraneous bytes before marker 0xd9
119/234 [==============>...............] - ETA: 31s - loss: 0.3379 - accuracy: 0.9352Corrupt JPEG data: 65 extraneous bytes before marker 0xd9
205/234 [=========================>....] - ETA: 7s - loss: 0.3345 - accuracy: 0.9424Corrupt JPEG data: 2226 extraneous bytes before marker 0xd9
215/234 [==========================>...] - ETA: 5s - loss: 0.3345 - accuracy: 0.9428Corrupt JPEG data: 239 extraneous bytes before marker 0xd9
227/234 [============================>.] - ETA: 1s - loss: 0.3343 - accuracy: 0.9434Corrupt JPEG data: 1153 extraneous bytes before marker 0xd9
229/234 [============================>.] - ETA: 1s - loss: 0.3343 - accuracy: 0.9435Corrupt JPEG data: 228 extraneous bytes before marker 0xd9
234/234 [==============================] - ETA: 0s - loss: 0.3342 - accuracy: 0.9437Corrupt JPEG data: 65 extraneous bytes before marker 0xd9
Short description when I try to use the UCF101 dataset ,the program report something like this tensorflow.python.framework.errors_impl.OutOfRangeError: E:\tfdsdata\datasets\ucf\downloads\thumos14_files_UCF101_videosxm55JXkGdBSDxwckqpN5c7GNr_LXm9dTyoJdpxR_aas.zip; Unknown error
Environment information
Operating System: Win10
Python version: 3.7(Conda)
tensorflow-datasets
/tfds-nightly
version: tensorflow-datasets 3.2.1tensorflow
/tf-nightly
version: tensorflow-gpu 2.3.1Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? yesReproduction instructions
mnist_train = tfds.load(name="ucf101", data_dir="E:\\tfdsdata\\datasets\\ucf")
or just reproduce the problem like this:
Link to logs
Expected behavior I looked into the extractor.py file and fond the reason: It seems that when zipfile.ZipFile() trys to unzip a file which is wrapped by tf.io.gfile.GFile, it throws an exception.
I manage to solve this problem by trying to not use the wrapped file. something like this:
Additional context Add any other context about the problem here.