Training won't start - Githubissues

Description

Hello.

I'm trying to run tensor2tensor on Google Colab using a GPU environment, but it gets stuck after loading dynamic library libcublas.

!pip3 install --upgrade tensorflow-gpu
!pip3 install --upgrade tensor2tensor
!pip3 install pydub
!apt -qq install -y ffmpeg
!apt -qq install -y sox

from google.colab import drive
drive.mount('/content/gdrive/')

!t2t-trainer \
    --tmp_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/tmp' \
    --problem='librispeech_clean_small' \
    --model='lstm_seq2seq' \
    --train_steps=100 \
    --hparams_set='lstm_seq2seq' \
    --data_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/data/' \
    --output_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/output' \
    --hparams="optimizer = rms_prop, learning_rate_schedule = rsqrt_decay" \
    --worker_gpu=1

Here's the terminal output :

WARNING: Logging before flag parsing goes to stderr.
W0821 05:27:27.318480 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/expert_utils.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0821 05:27:28.270877 139658590254976 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0821 05:27:29.911124 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/adafactor.py:27: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0821 05:27:29.911762 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/multistep_optimizer.py:32: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0821 05:27:29.923251 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/mesh_tensorflow/ops.py:4237: The name tf.train.CheckpointSaverListener is deprecated. Please use tf.estimator.CheckpointSaverListener instead.

W0821 05:27:29.923428 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/mesh_tensorflow/ops.py:4260: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

W0821 05:27:29.952985 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/rl/gym_utils.py:219: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

W0821 05:27:29.984675 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:109: The name tf.OptimizerOptions is deprecated. Please use tf.compat.v1.OptimizerOptions instead.

W0821 05:27:30.376930 139658590254976 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0821 05:27:30.377137 139658590254976 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

W0821 05:27:30.377253 139658590254976 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:33: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

W0821 05:27:30.378021 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/hparams_lib.py:49: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

I0821 05:27:30.379424 139658590254976 hparams_lib.py:64] Loading hparams from existing json /content/gdrive/My Drive/TCC/T2T LibriSpeech/output/hparams.json
W0821 05:27:30.379606 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/hparams_lib.py:65: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

I0821 05:27:30.381827 139658590254976 hparams_lib.py:85] Overwrite key batch_size: 1024 -> 1000
I0821 05:27:30.381956 139658590254976 hparams_lib.py:85] Overwrite key learning_rate_schedule: legacy -> rsqrt_decay
I0821 05:27:30.382053 139658590254976 hparams_lib.py:85] Overwrite key optimizer: adam -> rms_prop
I0821 05:27:30.382306 139658590254976 hparams_lib.py:55] Overriding hparams in lstm_seq2seq with optimizer = rms_prop, learning_rate_schedule = rsqrt_decay
W0821 05:27:30.382661 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:780: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

W0821 05:27:30.383641 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:121: The name tf.GraphOptions is deprecated. Please use tf.compat.v1.GraphOptions instead.

W0821 05:27:30.383837 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:127: The name tf.GPUOptions is deprecated. Please use tf.compat.v1.GPUOptions instead.

W0821 05:27:30.384021 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:240: RunConfig.__init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
I0821 05:27:30.384210 139658590254976 trainer_lib.py:263] Configuring DataParallelism to replicate the model.
I0821 05:27:30.384292 139658590254976 devices.py:76] schedule=continuous_train_and_eval
I0821 05:27:30.384358 139658590254976 devices.py:77] worker_gpu=1
I0821 05:27:30.384418 139658590254976 devices.py:78] sync=False
W0821 05:27:30.384511 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/devices.py:139: The name tf.logging.warn is deprecated. Please use tf.compat.v1.logging.warn instead.

W0821 05:27:30.384588 139658590254976 devices.py:141] Schedule=continuous_train_and_eval. Assuming that training is running on a single machine.
I0821 05:27:30.385225 139658590254976 devices.py:170] datashard_devices: ['gpu:0']
I0821 05:27:30.385297 139658590254976 devices.py:171] caching_devices: None
I0821 05:27:30.385771 139658590254976 devices.py:172] ps_devices: ['gpu:0']
I0821 05:27:30.386448 139658590254976 estimator.py:209] Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f04606f64a8>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_train_distribute': None, '_eval_distribute': None, '_experimental_max_worker_delay_secs': None, '_device_fn': None, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_protocol': None, '_session_config': gpu_options {
  per_process_gpu_memory_fraction: 0.95
}
allow_soft_placement: true
graph_options {
  optimizer_options {
    global_jit_level: OFF
  }
}
isolate_session_state: true
, '_save_checkpoints_steps': 1000, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': '/content/gdrive/My Drive/TCC/T2T LibriSpeech/output', 'use_tpu': False, 't2t_device_info': {'num_async_replicas': 1}, 'data_parallelism': <tensor2tensor.utils.expert_utils.Parallelism object at 0x7f04606f6518>}
W0821 05:27:30.386659 139658590254976 model_fn.py:630] Estimator's model_fn (<function T2TModel.make_estimator_model_fn.<locals>.wrapping_model_fn at 0x7f046076bd08>) includes params argument, but params are not passed to Estimator.
W0821 05:27:30.387291 139658590254976 trainer_lib.py:724] ValidationMonitor only works with --schedule=train_and_evaluate
I0821 05:27:30.399102 139658590254976 estimator_training.py:186] Not using Distribute Coordinator.
I0821 05:27:30.399344 139658590254976 training.py:612] Running training and evaluation locally (non-distributed).
I0821 05:27:30.399628 139658590254976 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 1000 or save_checkpoints_secs None.
W0821 05:27:30.411955 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I0821 05:27:30.422870 139658590254976 problem.py:644] Reading data files from /content/gdrive/My Drive/TCC/T2T LibriSpeech/data/librispeech_clean_small-train*
I0821 05:27:30.446713 139658590254976 problem.py:670] partition: 0 num_data_files: 100
W0821 05:27:30.448990 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/data_generators/problem.py:680: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0821 05:27:30.634850 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/common_audio.py:92: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0821 05:27:30.758503 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/common_audio.py:115: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0821 05:27:30.951467 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:275: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
W0821 05:27:31.428636 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:395: DatasetV1.output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
W0821 05:27:31.429009 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:398: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

W0821 05:27:31.429161 139658590254976 data_reader.py:399] Shapes are not fully defined. Assuming batch_size means tokens.
W0821 05:27:31.484967 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/ops/grouping.py:193: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0821 05:27:31.532430 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:231: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

I0821 05:27:31.598924 139658590254976 estimator.py:1145] Calling model_fn.
I0821 05:27:31.611495 139658590254976 t2t_model.py:2172] Setting T2TModel mode to 'train'
W0821 05:27:31.684887 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/t2t_model.py:243: The name tf.summary.text is deprecated. Please use tf.compat.v1.summary.text instead.

I0821 05:27:32.464440 139658590254976 api.py:255] Using variable initializer: uniform_unit_scaling
I0821 05:27:33.088927 139658590254976 t2t_model.py:2172] Transforming feature 'inputs' with speech_recognition_modality.bottom
W0821 05:27:33.090807 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/modalities.py:439: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
I0821 05:27:33.388660 139658590254976 t2t_model.py:2172] Transforming feature 'targets' with symbol_modality_256_128.targets_bottom
I0821 05:27:33.406166 139658590254976 t2t_model.py:2172] Building model body
W0821 05:27:33.421513 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/models/lstm.py:33: The name tf.nn.rnn_cell.DropoutWrapper is deprecated. Please use tf.compat.v1.nn.rnn_cell.DropoutWrapper instead.

W0821 05:27:33.421710 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/models/lstm.py:34: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
W0821 05:27:33.432483 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/models/lstm.py:62: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
W0821 05:27:33.432950 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/models/lstm.py:67: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
W0821 05:27:33.783363 139658590254976 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
I0821 05:27:34.804927 139658590254976 t2t_model.py:2172] Transforming body output with symbol_modality_256_128.top
W0821 05:27:34.924930 139658590254976 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/learning_rate.py:107: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

I0821 05:27:34.932349 139658590254976 optimize.py:327] Trainable Variables Total size: 1677440
I0821 05:27:34.932615 139658590254976 optimize.py:327] Non-trainable variables Total size: 5
I0821 05:27:34.932751 139658590254976 optimize.py:182] Using optimizer rms_prop
I0821 05:27:34.934037 139658590254976 optimize.py:78] Clipping gradients, norm: 2.00000
W0821 05:27:36.683928 139658590254976 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
I0821 05:27:36.894444 139658590254976 estimator.py:1147] Done calling model_fn.
I0821 05:27:36.896018 139658590254976 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
I0821 05:27:37.520785 139658590254976 monitored_session.py:240] Graph was finalized.
2019-08-21 05:27:37.521225: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-21 05:27:37.526665: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-21 05:27:37.690962: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 05:27:37.691522: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x13459c0 executing computations on platform CUDA. Devices:
2019-08-21 05:27:37.691557: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-08-21 05:27:37.693788: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-08-21 05:27:37.694004: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1344a00 executing computations on platform Host. Devices:
2019-08-21 05:27:37.694041: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-21 05:27:37.694265: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 05:27:37.694626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-08-21 05:27:37.695007: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-21 05:27:37.696442: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-21 05:27:37.697717: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-21 05:27:37.698160: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-21 05:27:37.699885: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-21 05:27:37.701143: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-21 05:27:37.704909: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-21 05:27:37.705064: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 05:27:37.705594: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 05:27:37.706305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-21 05:27:37.706401: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-21 05:27:37.707871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-21 05:27:37.707903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-08-21 05:27:37.707917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-08-21 05:27:37.708211: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 05:27:37.708615: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-21 05:27:37.708965: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-08-21 05:27:37.709011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14325 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
W0821 05:27:37.711665 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I0821 05:27:37.713894 139658590254976 saver.py:1280] Restoring parameters from /content/gdrive/My Drive/TCC/T2T LibriSpeech/output/model.ckpt-0
W0821 05:27:38.503897 139658590254976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1066: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
2019-08-21 05:27:38.591958: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
I0821 05:27:38.599505 139658590254976 session_manager.py:500] Running local_init_op.
I0821 05:27:38.638172 139658590254976 session_manager.py:502] Done running local_init_op.
I0821 05:27:40.499942 139658590254976 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /content/gdrive/My Drive/TCC/T2T LibriSpeech/output/model.ckpt.
2019-08-21 05:27:42.511778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0

Any ideas on how to solve it?

tensorflow / tensor2tensor

Training won't start #1668

Description