zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.18k stars 1.18k forks source link

XLNet stuck for Text Classification task #35

Open aisheh90 opened 5 years ago

aisheh90 commented 5 years ago

Hello,

I want to use the Text Classification task on our own data:

1- In BERT the data is formatted in (id, label, etc). I understood that for XLNet, the data required to be formatted in the same way correct?

2- I tried to run XLnet, firstly it worked and saved some files then reached to this point (last thing printed as the screenshot shows) and then stuck for hours without giving any update or even saving files (checkpoints) and it didn't show any error at all.

I am using Google Colab GPU.

Thanks, image_2019_06_23T15_46_46_465Z

zihangdai commented 5 years ago

Could you provide a more detailed description of the experiment environment so that I can look into this?

CharlieBickerton commented 5 years ago

Having not managed to get classification tasks to run using TPU, I am currently attemping on GPU. I will let you know how I get on @aisheh90

CharlieBickerton commented 5 years ago

44 @aisheh90 Here's a link to a working GPU colab doing classification on imdb

manrajgrover commented 5 years ago

@zihangdai Even I'm facing this exact same issue for a custom dataset. Sharing my logs here. Kindly let me know if you need more information.

I0628 13:27:40.125477 140458947311424 model_utils.py:36] Single device mode.
I0628 13:27:40.955644 140458947311424 run_classifier.py:703] Get model function
I0628 13:27:40.955801 140458947311424 run_classifier.py:705] Get model function completed
I0628 13:27:40.955867 140458947311424 run_classifier.py:721] Use normal estimator
I0628 13:27:40.955993 140458947311424 estimator.py:201] Using config: {'_model_dir': '/mnt/data/testuser/datauser/project/data/task1/exp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbeafbd04e0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
W0628 13:27:40.956146 140458947311424 estimator.py:1924] Estimator's model_fn (<function get_model_fn.<locals>.model_fn at 0x7fbeafbd47b8>) includes params argument, but params are not passed to Estimator.
I0628 13:27:40.956591 140458947311424 run_classifier.py:726] Done estimating
I0628 13:27:40.956651 140458947311424 run_classifier.py:729] Starting training process
I0628 13:27:40.956712 140458947311424 run_classifier.py:733] Use tfrecord file /mnt/data/testuser/datauser/project/data/task1/output/spiece.model.len-280.train.tf_record
I0628 13:27:44.928856 140458947311424 run_classifier.py:737] Num of train samples: 31132
I0628 13:27:44.930318 140458947311424 run_classifier.py:424] Do not overwrite tfrecord /mnt/data/testuser/datauser/project/data/task1/output/spiece.model.len-280.train.tf_record exists.
I0628 13:27:44.930620 140458947311424 run_classifier.py:481] Input tfrecord file /mnt/data/testuser/datauser/project/data/task1/output/spiece.model.len-280.train.tf_record
I0628 13:27:44.930716 140458947311424 run_classifier.py:749] Training process begins
W0628 13:27:44.944874 140458947311424 deprecation.py:323] From /home/testuser/anaconda3/envs/dependency/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0628 13:27:44.968578 140458947311424 deprecation.py:323] From xlnet/run_classifier.py:526: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
I0628 13:27:44.989415 140458947311424 estimator.py:1111] Calling model_fn.
I0628 13:27:44.997832 140458947311424 modeling.py:451] memory input None
I0628 13:27:44.997921 140458947311424 modeling.py:453] Use float type <dtype: 'float32'>
W0628 13:27:45.052411 140458947311424 deprecation.py:323] From /home/testuser/datauser/project/xlnet/modeling.py:533: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
W0628 13:27:45.053383 140458947311424 deprecation.py:506] From /home/testuser/anaconda3/envs/dependency/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0628 13:27:45.288335 140458947311424 deprecation.py:323] From /home/testuser/datauser/project/xlnet/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
I0628 13:27:51.747468 140458947311424 run_classifier.py:549] #params: 361319426
I0628 13:27:51.747665 140458947311424 model_utils.py:71] Initialize from the ckpt /mnt/data/testuser/datauser/project/data/xlnet/xlnet_model.ckpt
I0628 13:27:52.705267 140458947311424 model_utils.py:85] **** Global Variables ****
I0628 13:27:52.705481 140458947311424 model_utils.py:91]   name = model/transformer/r_w_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.705564 140458947311424 model_utils.py:91]   name = model/transformer/r_r_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.705631 140458947311424 model_utils.py:91]   name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.705696 140458947311424 model_utils.py:91]   name = model/transformer/r_s_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.705755 140458947311424 model_utils.py:91]   name = model/transformer/seg_embed:0, shape = (24, 2, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.705827 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.705899 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.705973 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.706045 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.706144 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.706224 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.706292 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.706357 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.706441 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.706507 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.706573 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.706640 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.706701 140458947311424 model_utils.py:91]   name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.706766 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.706836 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.706907 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.706978 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.707052 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.707155 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.707222 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.707316 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.707398 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.707463 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.707527 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.707591 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.707654 140458947311424 model_utils.py:91]   name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.707717 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.707787 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.707858 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.707931 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.708003 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.708097 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.708174 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.708245 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.708310 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.708388 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.708452 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.708516 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.708576 140458947311424 model_utils.py:91]   name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.708642 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.708711 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.708782 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.708853 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.708926 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.709000 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.709068 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.709137 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.709208 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.709275 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.709338 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.709401 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.709462 140458947311424 model_utils.py:91]   name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.709526 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.709598 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.709668 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.709744 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.709819 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.709890 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.709951 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.710015 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.710094 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.710169 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.710237 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.710303 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.710376 140458947311424 model_utils.py:91]   name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.710438 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.710506 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.710578 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.710649 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.710720 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.710794 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.710855 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.710919 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.710983 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.711053 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.711144 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.711214 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.711279 140458947311424 model_utils.py:91]   name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.711347 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.711426 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.711496 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.711565 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.711637 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.711709 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.711770 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.711833 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.711894 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.711958 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.712022 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.712090 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.712154 140458947311424 model_utils.py:91]   name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.712218 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.712289 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.712359 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.712428 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.712499 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.712572 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.712634 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.712697 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.712762 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.712825 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.712896 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.712959 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.713020 140458947311424 model_utils.py:91]   name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.713099 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.713178 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.713254 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.713327 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.713412 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.713484 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.713544 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.713607 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.713669 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.713732 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.713795 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.713855 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.713912 140458947311424 model_utils.py:91]   name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.713965 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714022 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714084 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714145 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714203 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714259 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.714311 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.714363 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.714417 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.714470 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.714525 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.714592 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.714647 140458947311424 model_utils.py:91]   name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.714709 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714781 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714853 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714925 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.714999 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.715080 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.715150 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.715215 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.715279 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.715344 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.715408 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.715472 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.715533 140458947311424 model_utils.py:91]   name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.715598 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.715669 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.715740 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.715813 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.715886 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.715962 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.716024 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.716106 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.716178 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.716248 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.716314 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.716390 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.716456 140458947311424 model_utils.py:91]   name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.716520 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.716592 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.716662 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.716734 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.716806 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.716879 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.716939 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.717002 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.717071 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.717143 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.717208 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.717270 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.717329 140458947311424 model_utils.py:91]   name = model/transformer/layer_12/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.717393 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.717464 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.717535 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.717605 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.717678 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.717752 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.717812 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.717878 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.717941 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.718005 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.718074 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.718143 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.718206 140458947311424 model_utils.py:91]   name = model/transformer/layer_13/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.718274 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.718345 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.718416 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.718488 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.718561 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.718636 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.718697 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.718762 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.718826 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.718890 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.718954 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.719018 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.719083 140458947311424 model_utils.py:91]   name = model/transformer/layer_14/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.719153 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.719224 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.719295 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.719365 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.719438 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.719513 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.719575 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.719638 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.719702 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.719768 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.719830 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.719893 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.719953 140458947311424 model_utils.py:91]   name = model/transformer/layer_15/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.720015 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.720093 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.720170 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.720245 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.720318 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.720392 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.720454 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.720520 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.720583 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.720646 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.720709 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.720773 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.720832 140458947311424 model_utils.py:91]   name = model/transformer/layer_16/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.720893 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.720965 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.721036 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.721116 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.721191 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.721266 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.721326 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.721390 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.721454 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.721519 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.721584 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.721648 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.721708 140458947311424 model_utils.py:91]   name = model/transformer/layer_17/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.721771 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.721840 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.721916 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.721988 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.722064 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.722145 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.722203 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.722268 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.722333 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.722398 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.722462 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.722527 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.722587 140458947311424 model_utils.py:91]   name = model/transformer/layer_18/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.722649 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.722718 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.722787 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.722857 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.722930 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.723004 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.723073 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.723142 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.723206 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.723270 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.723335 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.723399 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.723458 140458947311424 model_utils.py:91]   name = model/transformer/layer_19/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.723520 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.723590 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.723659 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.723736 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.723809 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.723885 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.723946 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.724009 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.724077 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.724148 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.724212 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.724277 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.724337 140458947311424 model_utils.py:91]   name = model/transformer/layer_20/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.724399 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.724468 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.724540 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.724610 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.724683 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.724756 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.724819 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.724881 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.724944 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.725009 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.725079 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.725147 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.725208 140458947311424 model_utils.py:91]   name = model/transformer/layer_21/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.725272 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.725340 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.725413 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.725483 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.725561 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.725636 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.725696 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.725760 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.725824 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.725886 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.725949 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726010 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726076 140458947311424 model_utils.py:91]   name = model/transformer/layer_22/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726144 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.726216 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.726287 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.726359 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.726432 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0628 13:27:52.726503 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726565 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726629 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0628 13:27:52.726693 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0628 13:27:52.726757 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0628 13:27:52.726819 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726880 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726938 140458947311424 model_utils.py:91]   name = model/transformer/layer_23/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0628 13:27:52.726999 140458947311424 model_utils.py:91]   name = model/sequnece_summary/summary/kernel:0, shape = (1024, 1024)
I0628 13:27:52.727067 140458947311424 model_utils.py:91]   name = model/sequnece_summary/summary/bias:0, shape = (1024,)
I0628 13:27:52.727130 140458947311424 model_utils.py:91]   name = model/classification_project/logit/kernel:0, shape = (1024, 2)
I0628 13:27:52.727193 140458947311424 model_utils.py:91]   name = model/classification_project/logit/bias:0, shape = (2,)
W0628 13:27:52.735434 140458947311424 deprecation.py:323] From /home/testuser/anaconda3/envs/dependency/lib/python3.6/site-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
W0628 13:27:52.815898 140458947311424 deprecation.py:323] From /home/testuser/anaconda3/envs/dependency/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
I0628 13:28:07.320057 140458947311424 estimator.py:1113] Done calling model_fn.
I0628 13:28:07.321415 140458947311424 basic_session_run_hooks.py:527] Create CheckpointSaverHook.
I0628 13:28:12.920827 140458947311424 monitored_session.py:222] Graph was finalized.
2019-06-28 13:28:12.921327: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-28 13:28:17.940826: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5600db3a20d0 executing computations on platform CUDA. Devices:
2019-06-28 13:28:17.940971: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-06-28 13:28:17.941005: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla K80, Compute Capability 3.7
2019-06-28 13:28:17.951001: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500035000 Hz
2019-06-28 13:28:17.958930: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5600d114f7a0 executing computations on platform Host. Devices:
2019-06-28 13:28:17.958999: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-28 13:28:17.959732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:83:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-06-28 13:28:17.960225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:84:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-06-28 13:28:17.960726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2019-06-28 13:28:17.964213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-28 13:28:17.964262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 1 
2019-06-28 13:28:17.964283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N Y 
2019-06-28 13:28:17.964300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   Y N 
2019-06-28 13:28:17.965268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10801 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:83:00.0, compute capability: 3.7)
2019-06-28 13:28:17.965834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10801 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:84:00.0, compute capability: 3.7)
W0628 13:28:17.976674 140458947311424 deprecation.py:323] From /home/testuser/anaconda3/envs/dependency/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I0628 13:28:17.980594 140458947311424 saver.py:1270] Restoring parameters from /mnt/data/testuser/datauser/project/data/task1/exp/model.ckpt-0
W0628 13:28:46.735411 140458947311424 deprecation.py:323] From /home/testuser/anaconda3/envs/dependency/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
I0628 13:28:48.019851 140458947311424 session_manager.py:491] Running local_init_op.
I0628 13:28:48.396723 140458947311424 session_manager.py:493] Done running local_init_op.
I0628 13:29:01.599510 140458947311424 basic_session_run_hooks.py:594] Saving checkpoints for 0 into /mnt/data/testuser/datauser/project/data/task1/exp/model.ckpt.
2019-06-28 13:29:43.285319: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
kimiyoung commented 5 years ago

Does it work if reduce the batch size, sequence length, or whatever reduces memory usage?

aisheh90 commented 5 years ago

I used the following values to reduce memory usage, and just to make sure it works. batch size = 8 sequence length = 50 train_steps=50 warmup_steps=10 save_steps=10 iterations=10 But still the same issue! [Colab GPU memory 11-12G)

aisheh90 commented 5 years ago

I found the reason for this issue, it's not related to memory. The reason is it doesn't see the data, or the data format not correct. When I feed the data correctly, it works fine!

manrajgrover commented 5 years ago

@kimiyoung Reducing memory usage does help. Thank you!

GeetDsa commented 5 years ago

So, what is the correct data format?

manrajgrover commented 5 years ago

@GeetDsa In case of classification, you need to write a class that processes the data for you and returns a list of InputExample. See the link shared for example. https://github.com/zihangdai/xlnet/blob/master/run_classifier.py#L273

oakkas commented 4 years ago

@manrajgrover how can I write a class for single sentence classification with labels of 0 and 1?