tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.23k stars 1.52k forks source link

Amazon dataset URLs are invalid! #5044

Open xei opened 11 months ago

xei commented 11 months ago

Amazon reviews dataset is not accessible from the following URL:

https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Books_v1_02.tsv.gz

So, TensorFlow Dataset cannot load the dataset:

DownloadError: Failed to get url https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Books_v1_02.tsv.gz. HTTP code: 403.
marcenacp commented 11 months ago

@xei Thanks for spotting it!

It looks like the dataset changed and is now accessible at https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/

Do you confirm this? Would you have time to submit a fix? You would need to:

xei commented 11 months ago

@xei Thanks for spotting it!

It looks like the dataset changed and is now accessible at https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/

Do you confirm this? Would you have time to submit a fix? You would need to:

  • Edit tensorflow_datasets/datasets/amazon_us_reviews/amazon_us_reviews_dataset_builder.py to reflect the new changes.
  • Bump the version 0.1.0 -> 2.0.0.

Sure. I've made a pull request (https://github.com/tensorflow/datasets/pull/5047) Could you please check it?