Open marton-avrios opened 2 years ago
Unfortunately, GCS is very slow when extracting many files.
If you can, try to extract locally, then copy to gcs with gsutils -m
to copy files in parallel
From the TFDS side, we should try to parallelize extraction when writing on GCS.
tfds build cnn_dailymail
works.fds build cnn_dailymail --data_dir="gs://my-bucket/tensorflow_datasets"
doesn't. It gets stuck.Ubuntu 18.04, Python 3.6.9, TensoFlow 2.6.2, tfds_nightly 4.5.0.dev202201310107.
Output: