google / trax

Trax — Deep Learning with Clear Code and Speed
Apache License 2.0
8.11k stars 817 forks source link

Reformer imagenet64 gin config - dataset loading failure #1183

Open syzymon opened 4 years ago

syzymon commented 4 years ago

Description

Imagenet64 dataset from tensor2tensor used in this gin config: https://github.com/google/trax/blob/master/trax/supervised/configs/reformer_imagenet64.gin

seems to have some loading issues. I tried to run this config on Google Colab: https://colab.research.google.com/drive/1ysEQYOaIspHPBVu6S9jOxc7BkE2oDrh0

and ran into: tensorflow.python.framework.errors_impl.NotFoundError: /root/tensorflow_datasets/download/train_64x64; No such file or directory (more detailed stack trace provided below).

For reference, gin configs that use different datasets from t2t, like this most recent one: https://github.com/google/trax/blob/master/trax/supervised/configs/transformer_lm_cnndailymail.gin

worked correctly in the same colab. When trying a different gin config with imagenet224 also from t2t failed in a similar way as this imagenet64.

Is this a known issue?

Environment information

OS: google colab

$ pip freeze | grep trax
trax==1.3.6

$ pip freeze | grep tensor
mesh-tensorflow==0.1.17
tensor2tensor==1.15.7
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorboardcolab==0.0.22
tensorflow==2.3.0
tensorflow-addons==0.8.3
tensorflow-datasets==4.0.1
tensorflow-estimator==2.3.0
tensorflow-gan==2.0.0
tensorflow-gcs-config==2.3.0
tensorflow-hub==0.9.0
tensorflow-metadata==0.24.0
tensorflow-privacy==0.2.2
tensorflow-probability==0.7.0
tensorflow-text==2.3.0

$ pip freeze | grep jax
jax==0.2.4
jaxlib==0.1.56+cuda101

$ python -V
Python 3.6.9

For bugs: reproduction and error logs

# Steps to reproduce (also available in attached colab): 
...
python -m trax.trainer --config_file='reformer_imagenet64.gin'
# Error logs:
...
2020-11-03 08:45:54.260503: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

I1103 08:45:54.260992 140307286194048 trainer_lib.py:733] No --output_dir specified
No --output_dir specified
I1103 08:45:54.261220 140307286194048 trainer_lib.py:733] Using default output_dir: /root/trax/ReformerLM_t2t_image_imagenet64_gen_flat_rev_20201103_0845
Using default output_dir: /root/trax/ReformerLM_t2t_image_imagenet64_gen_flat_rev_20201103_0845
2020-11-03 08:45:54.313886: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
W1103 08:45:54.327635 140307286194048 xla_bridge.py:131] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I1103 08:45:54.328048 140307286194048 tf_inputs.py:958] No dataset directory provided. Downloading and generating dataset for t2t_image_imagenet64_gen_flat_rev inside data directory /root/tensorflow_datasets/ For large datasets it is better to prepare datasets manually!
I1103 08:45:55.534647 140307286194048 common_layers.py:57] Running in V2 mode, using Keras layers.
I1103 08:45:57.490262 140307286194048 gym_utils.py:358] Entry Point [tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv] registered with id [T2TEnv-TicTacToeEnv-v0]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/trax/trainer.py", line 171, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/trax/trainer.py", line 165, in main
    trainer_lib.train(output_dir=output_dir)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1078, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/trax/supervised/trainer_lib.py", line 561, in train
    inputs = inputs()
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1078, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/trax/data/inputs.py", line 538, in batcher
    train_stream, eval_stream = data_streams()
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1078, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/trax/data/tf_inputs.py", line 80, in data_streams
    data_dir = download_and_prepare(dataset_name, data_dir)
  File "/usr/local/lib/python3.6/dist-packages/trax/data/tf_inputs.py", line 965, in download_and_prepare
    dataset_name[len('t2t_'):]).generate_data(data_dir, dl_dir)
  File "/usr/local/lib/python3.6/dist-packages/tensor2tensor/data_generators/imagenet.py", line 271, in generate_data
    self.dev_filepaths(data_dir, self.dev_shards, shuffled=True))
  File "/usr/local/lib/python3.6/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 500, in generate_dataset_and_shuffle
    generate_files(train_gen, train_paths)
  File "/usr/local/lib/python3.6/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 174, in generate_files
    for case in generator:
  File "/usr/local/lib/python3.6/dist-packages/tensor2tensor/data_generators/imagenet.py", line 85, in imagenet_pixelrnn_generator
    image_files = tf.gfile.Glob(images_filepath + "/*")
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 350, in get_matching_files
    return get_matching_files_v2(filename)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 409, in get_matching_files_v2
    compat.as_bytes(pattern))
tensorflow.python.framework.errors_impl.NotFoundError: /root/tensorflow_datasets/download/train_64x64; No such file or directory
  In call to configurable 'data_streams' (<function data_streams at 0x7f9b1f88fd90>)
  In call to configurable 'batcher' (<function batcher at 0x7f9b7388ee18>)
  In call to configurable 'train' (<function train at 0x7f9b1f60abf8>)
syzymon commented 4 years ago

It turns out that function trax.data.tf_inputs.download_and_prepare won't download the dataset in case of imagenet64 - it has to be downloaded manually, as per t2t documentation in imagenet.py data generator:

  """Image generator for Imagenet 64x64 downsampled images.

  It assumes that the data has been downloaded from
  http://image-net.org/small/*_32x32.tar or
  http://image-net.org/small/*_64x64.tar into tmp_dir.

One more issue that I had to resolve before I was able to run reformer-imagenet64 gin config successfully is changing image files read to binary mode (also in t2t imagenet_pixelrnn_generator): from with tf.gfile.Open(filename, "r") as f: to with tf.gfile.Open(filename, "rb") as f:

is this change required to load images from the dataset?