tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Unable to download translate_ende_wmt32k using t2t-datagen #1016

Open pksubbarao opened 6 years ago

pksubbarao commented 6 years ago

Description

Downloading dataset "translate_ende_wmt32k" with t2t-datagen results in following error. tensorflow.python.framework.errors_impl.NotFoundError: /tmp/t2t_datagen/training/news-commentary-v13.de-en.en; No such file or directory I did not have this issue while downloading translate_ende_wmt_bpe32k.

PROBLEM=translate_ende_wmt32k MODEL=transformer HPARAMS=transformer_base_single_gpu

t2t-datagen --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR --problem=$PROBLEM 100% completed Traceback (most recent call last): File "/home/prashant/.local/bin/t2t-datagen", line 27, in tf.app.run() File "/home/prashant/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/prashant/.local/bin/t2t-datagen", line 23, in main t2t_datagen.main(argv) File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/bin/t2t_datagen.py", line 190, in main generate_data_for_registered_problem(problem) File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/bin/t2t_datagen.py", line 240, in generate_data_for_registered_problem problem.generate_data(data_dir, tmp_dir, task_id) File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/data_generators/text_problems.py", line 294, in generate_data self.generate_encoded_samples(data_dir, tmp_dir, split)), paths) File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/data_generators/text_problems.py", line 254, in generate_encoded_samples generator = self.generate_samples(data_dir, tmp_dir, dataset_split) File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/data_generators/translate.py", line 55, in generate_samples tag)) File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/data_generators/translate.py", line 148, in compile_data lang1_filepath, lang2_filepath): File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/data_generators/text_problems.py", line 552, in text2text_txt_iterator txt_line_iterator(source_txt_path), txt_line_iterator(target_txt_path)): File "/home/prashant/.local/lib/python2.7/site-packages/tensor2tensor/data_generators/text_problems.py", line 545, in txt_line_iterator for line in f: File "/home/prashant/.local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 214, in next retval = self.readline() File "/home/prashant/.local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 183, in readline self._preread_check() File "/home/prashant/.local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 85, in _preread_check compat.as_bytes(self.__name), 1024 * 512, status) File "/home/prashant/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: /tmp/t2t_datagen/training/news-commentary-v13.de-en.en; No such file or directory

$TMP_DIR has downloaded files, but the path is different than what script is looking for. ls -l /tmp/t2t_datagen total 110512 drwxrwxr-x 2 prashant prashant 4096 Feb 21 2018 training-parallel-nc-v13 -rw-rw-r-- 1 prashant prashant 113157482 Aug 23 14:17 training-parallel-nc-v13.tgz

ls -la /tmp/t2t_datagen/training-parallel-nc-v13 total 313256 drwxrwxr-x 2 prashant prashant 4096 Feb 21 2018 . drwxrwxr-x 3 prashant prashant 4096 Aug 23 14:17 .. -rw-r--r-- 1 prashant prashant 32894113 Feb 21 2018 news-commentary-v13.cs-en.cs -rw-r--r-- 1 prashant prashant 29823721 Feb 21 2018 news-commentary-v13.cs-en.en -rw-r--r-- 1 prashant prashant 48226262 Feb 21 2018 news-commentary-v13.de-en.de -rw-r--r-- 1 prashant prashant 39610338 Feb 21 2018 news-commentary-v13.de-en.en -rw-r--r-- 1 prashant prashant 34376953 Feb 21 2018 news-commentary-v13.ru-en.en -rw-r--r-- 1 prashant prashant 69178183 Feb 21 2018 news-commentary-v13.ru-en.ru -rw-r--r-- 1 prashant prashant 35525461 Feb 21 2018 news-commentary-v13.zh-en.en -rw-r--r-- 1 prashant prashant 31113639 Feb 21 2018 news-commentary-v13.zh-en.zh

Environment information

OS: Ubuntu 16.04.4 

$ pip freeze | grep tensor
tensor2tensor==1.8.0
tensorboard==1.9.0
tensorflow==1.5.0
tensorflow-gpu==1.9.0
tensorflow-tensorboard==1.5.1

$ python -V
Python 2.7.12

### For bugs: reproduction and error logs

Steps to reproduce:

t2t-datagen --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR --problem=$PROBLEM

Error logs:

...

JulianRMedina commented 6 years ago

I can also reproduce the error. I believe either the file is no longer available for download, or something else was messed up with recent changes for WMT18.

rsepassi commented 6 years ago

Yes, this has been fixed at head and will be part of the next release. Sorry for the break.

EthanPhan commented 6 years ago

I encountered this problem today. Can I ask when is the next release?

rsepassi commented 6 years ago

Release will be this week. Sorry for the delay. In the meantime you can do:

git clone https://github.com/tensorflow/tensor2tensor cd tensor2tensor pip install -e . --user On Mon, Sep 3, 2018 at 5:35 AM EthanPhan notifications@github.com wrote:

I encountered this problem today. Can I ask when is the next release?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/1016#issuecomment-418102717, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEGWzTKMxtyBxTFReq_t72OnADBgSM0ks5uXSItgaJpZM4WKNju .

lachao commented 6 years ago

Hi, Just checked again and face the same problem , will the fixed version will release this week :) thanks for answer and help !