tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.28k stars 1.53k forks source link

NonMatchingChecksumError while downloading 'multi_news' or 'cnn_dailymail' dataset #5232

Open singhniraj08 opened 8 months ago

singhniraj08 commented 8 months ago

Short description Description of the bug.

getting NonMatchingChecksumError while downloading multi_news or cnn_dailymail datasets.

Environment information

Reproduction instructions

(https://colab.sandbox.google.com/gist/singhniraj08/9f80bc167706b9b351b75e003dcad39c/untitled2.ipynb)

If you share a colab, make sure to update the permissions to share it.

Link to logs

NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=1vRY2wM6rlOZrf9exGTm5pXj5ExlVwJ0C, downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_1vRY2wM6rlOZrf9exGTm5pXj5OT0RBXCg5OWBrYMJXysF1hdrkZtPhK-7JWdYi2HrYYc.tmp.c134b8c8d86c4764bad073c9d79db385/download, has wrong checksum:

Expected behavior What you expected to happen.

Dataset should download without any issues.

Additional context Add any other context about the problem here.

83here commented 8 months ago

Hello @singhniraj08, This is an persisting problem in tfds (#3935) and there is no solutions till now, although you can bypass the issue by just downloading it manually.

Thank you,

Rahulraj0308 commented 7 months ago

@singhniraj08 you can visit link- https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror. For correction and as per my knowledge this issue is not solved yet