guotong1988 / BERT-GPU

multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
Apache License 2.0
173 stars 54 forks source link

Suffer the Error: tensorflow.python.framework.errors_impl.InvalidArgumentError #20

Closed shuxiaobo closed 4 years ago

shuxiaobo commented 4 years ago

When I run gpu v2 with num_gpu = 8, using the default sample.tfrecord, there is a error raised:

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node NcclAllReduce (defined at data/shuxiaobo/BERT-multi-gpu/run_pretraining_gpu_v2.py:480) with these attrs: [num_devices=6, reduction="sum", shared_name="c0", T=DT_FLOAT]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

     [[NcclAllReduce]]

Anyone has any idea about this?

guotong1988 commented 4 years ago

Need full stack trace.

guotong1988 commented 4 years ago

Upgrade to TF 1.14

shuxiaobo commented 4 years ago

Yes, tensorflow ==1.14 and the full trace:


I1209 19:40:33.324634 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
I1209 19:40:33.324860 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
I1209 19:40:33.325103 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
I1209 19:40:33.325333 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.325585 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.325816 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.326046 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.326277 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.326528 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.326762 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.327003 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.327236 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.327488 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.327719 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.327962 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.328192 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.328415 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
I1209 19:40:33.328659 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
I1209 19:40:33.328895 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
I1209 19:40:33.329134 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.329376 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.329592 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.329800 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.330018 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.330240 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.330461 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.330687 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.330904 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.331131 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.331343 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.331578 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.331790 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.332001 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
I1209 19:40:33.332210 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
I1209 19:40:33.332444 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
I1209 19:40:33.332658 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.332886 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.333094 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.333303 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.333528 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.333754 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.333970 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.334192 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.334407 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.334646 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.334859 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.335081 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.335297 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.335515 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
I1209 19:40:33.335725 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
I1209 19:40:33.335944 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
I1209 19:40:33.336178 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.336403 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.336619 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.336827 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.337044 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.337268 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.337491 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.337710 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.337929 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.338150 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.338362 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.338592 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.338804 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.339011 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
I1209 19:40:33.339223 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
I1209 19:40:33.339454 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
I1209 19:40:33.339662 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.339871 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.340068 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.340259 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.340462 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.340672 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.340872 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.341082 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.341281 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.341506 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.341710 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.341921 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.342117 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.342328 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
I1209 19:40:33.342533 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
I1209 19:40:33.342742 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
I1209 19:40:33.342943 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.343151 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.343347 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.343556 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.343757 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.343960 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.344160 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.344367 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.344584 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.344799 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.345004 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.345210 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.345407 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.345617 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
I1209 19:40:33.345813 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
I1209 19:40:33.346021 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
I1209 19:40:33.346211 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.346419 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.346625 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.346820 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.347020 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.347229 140591702722304 run_pretraining_gpu_v2.py:186]   name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
I1209 19:40:33.347424 140591702722304 run_pretraining_gpu_v2.py:186]   name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.347643 140591702722304 run_pretraining_gpu_v2.py:186]   name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.347839 140591702722304 run_pretraining_gpu_v2.py:186]   name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
I1209 19:40:33.348041 140591702722304 run_pretraining_gpu_v2.py:186]   name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*
I1209 19:40:33.348239 140591702722304 run_pretraining_gpu_v2.py:186]   name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*
I1209 19:40:33.348448 140591702722304 run_pretraining_gpu_v2.py:186]   name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*
I1209 19:40:33.348661 140591702722304 run_pretraining_gpu_v2.py:186]   name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*
WARNING:tensorflow:Efficient allreduce is not supported for 1 IndexedSlices
W1209 19:40:37.624574 140602746058560 cross_device_ops.py:751] Efficient allreduce is not supported for 1 IndexedSlices
INFO:tensorflow:Reduce to /replica:0/task:0/device:GPU:0 then broadcast to ('/replica:0/task:0/device:GPU:0', '/replica:0/task:0/device:GPU:1', '/replica:0/task:0/device:GPU:2', '/replica:0/task:0/device:GPU:3', '/replica:0/task:0/device:GPU:4', '/replica:0/task:0/device:GPU:5').
I1209 19:40:37.625795 140602746058560 cross_device_ops.py:393] Reduce to /replica:0/task:0/device:GPU:0 then broadcast to ('/replica:0/task:0/device:GPU:0', '/replica:0/task:0/device:GPU:1', '/replica:0/task:0/device:GPU:2', '/replica:0/task:0/device:GPU:3', '/replica:0/task:0/device:GPU:4', '/replica:0/task:0/device:GPU:5').
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:37.654547 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:37.778326 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:37.871515 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:37.965023 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:38.058165 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:38.152199 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:38.245916 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 19:40:38.339321 140602746058560 cross_device_ops.py:720] batch_all_reduce: 1 all-reduces with algorithm = nccl,num_packs = 6, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:Done calling model_fn.
I1209 19:41:30.312126 140592222807808 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1209 19:41:30.315484 140592214415104 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1209 19:41:30.318616 140591727900416 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1209 19:41:30.322615 140591719507712 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1209 19:41:30.325594 140591711115008 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1209 19:41:30.328356 140591702722304 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
I1209 19:41:30.330210 140602746058560 cross_device_ops.py:393] Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Create CheckpointSaverHook.
I1209 19:41:33.816081 140602746058560 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I1209 19:42:06.394142 140602746058560 monitored_session.py:240] Graph was finalized.
Traceback (most recent call last):
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
    self._extend_graph()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node NcclAllReduce}}with these attrs: [num_devices=6, reduction="sum", shared_name="c0", T=DT_FLOAT]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

     [[NcclAllReduce]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/shuxiaobo/BERT-multi-gpu/run_pretraining_gpu_v2.py", line 504, in <module>
    tf.app.run()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/data/shuxiaobo/BERT-multi-gpu/run_pretraining_gpu_v2.py", line 480, in main
    estimator.train(input_fn = train_input_fn, max_steps = FLAGS.num_train_steps)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1156, in _train_model
    return self._train_model_distributed(input_fn, hooks, saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1219, in _train_model_distributed
    self._config._train_distribute, input_fn, hooks, saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1329, in _actual_train_model_distributed
    saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
    return self._sess_creator.create_session()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 647, in create_session
    init_fn=self._scaffold.init_fn)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 296, in prepare_session
    sess.run(init_op, feed_dict=init_feed_dict)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node NcclAllReduce (defined at data/shuxiaobo/BERT-multi-gpu/run_pretraining_gpu_v2.py:480) with these attrs: [num_devices=6, reduction="sum", shared_name="c0", T=DT_FLOAT]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

     [[NcclAllReduce]]
guotong1988 commented 4 years ago

You use CPU to train?

shuxiaobo commented 4 years ago
I1209 21:02:09.292431 140486286108480 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I1209 21:02:40.306236 140486286108480 monitored_session.py:240] Graph was finalized.
2019-12-09 21:02:40.310599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla V100-PCIE-16GB-LS major: 7 minor: 0 memoryClockRate(GHz): 1.297
pciBusID: 0000:02:00.0
2019-12-09 21:02:40.311754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: 
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:03:00.0
2019-12-09 21:02:40.312856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties: 
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:83:00.0
2019-12-09 21:02:40.313949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties: 
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:84:00.0
2019-12-09 21:02:40.314131: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-12-09 21:02:40.314214: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-12-09 21:02:40.314286: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-12-09 21:02:40.314356: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-12-09 21:02:40.314422: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-12-09 21:02:40.314488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-12-09 21:02:40.314517: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-12-09 21:02:40.314532: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-12-09 21:02:40.314654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-09 21:02:40.314672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 2 3 
2019-12-09 21:02:40.314686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y N N 
2019-12-09 21:02:40.314697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N N N 
2019-12-09 21:02:40.314709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2:   N N N Y 
2019-12-09 21:02:40.314719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3:   N N Y N 
Traceback (most recent call last):
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
    self._extend_graph()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node NcclAllReduce}}with these attrs: [num_devices=6, reduction="sum", shared_name="c0", T=DT_FLOAT]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'

     [[NcclAllReduce]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shuxiaobo/BERT-multi-gpu/run_pretraining_gpu_v2.py", line 504, in <module>
    tf.app.run()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/shuxiaobo/BERT-multi-gpu/run_pretraining_gpu_v2.py", line 480, in main
    estimator.train(input_fn = train_input_fn, max_steps = FLAGS.num_train_steps)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1156, in _train_model
    return self._train_model_distributed(input_fn, hooks, saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1219, in _train_model_distributed
    self._config._train_distribute, input_fn, hooks, saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1329, in _actual_train_model_distributed
    saving_listeners)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
    return self._sess_creator.create_session()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 647, in create_session
    init_fn=self._scaffold.init_fn)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 296, in prepare_session
    sess.run(init_op, feed_dict=init_feed_dict)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node NcclAllReduce (defined at /BERT-multi-gpu/run_pretraining_gpu_v2.py:480) with these attrs: [num_devices=6, reduction="sum", shared_name="c0", T=DT_FLOAT]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'

     [[NcclAllReduce]]

No, I just transfer to another server, I test the tensorflow-gpu with another project, it runs well. but this one still raise Error.

guotong1988 commented 4 years ago

I know this error. Not the problem of my code. You should install cuda 10 right.

guotong1988 commented 4 years ago

TensorFlow is using CPU because it can not use cuda.

shuxiaobo commented 4 years ago
2019-12-09 21:16:39.061066: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-12-09 21:16:39.061106: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-12-09 21:16:39.061129: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-12-09 21:16:39.061149: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-12-09 21:16:39.061178: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-12-09 21:16:39.061217: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-12-09 21:16:39.061254: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-12-09 21:16:39.069434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1, 2, 3
2019-12-09 21:16:39.069589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-09 21:16:39.069609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 2 3 
2019-12-09 21:16:39.069621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y N N 
2019-12-09 21:16:39.069641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N N N 
2019-12-09 21:16:39.069651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2:   N N N Y 
2019-12-09 21:16:39.069660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3:   N N Y N 
2019-12-09 21:16:39.076271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14925 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:02:00.0, compute capability: 7.0)
2019-12-09 21:16:39.077359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30458 MB memory) -> physical GPU (device: 1, name: Tesla V100-PCIE-32GB, pci bus id: 0000:03:00.0, compute capability: 7.0)
2019-12-09 21:16:39.078469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30458 MB memory) -> physical GPU (device: 2, name: Tesla V100-PCIE-32GB, pci bus id: 0000:83:00.0, compute capability: 7.0)
2019-12-09 21:16:39.079576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30458 MB memory) -> physical GPU (device: 3, name: Tesla V100-PCIE-32GB, pci bus id: 0000:84:00.0, compute capability: 7.0)
WARNING:tensorflow:From /home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W1209 21:16:39.080965 140300370384704 deprecation.py:323] From /home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from output2/model.ckpt-0
I1209 21:16:39.082738 140300370384704 saver.py:1280] Restoring parameters from output2/model.ckpt-0
2019-12-09 21:16:46.355335: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1066: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W1209 21:16:52.061076 140300370384704 deprecation.py:323] From /home/shuxiaobo/python3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1066: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I1209 21:16:55.574278 140300370384704 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1209 21:16:56.657786 140300370384704 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into output2/model.ckpt.
I1209 21:17:39.632050 140300370384704 basic_session_run_hooks.py:606] Saving checkpoints for 0 into output2/model.ckpt.
2019-12-09 21:18:34.684387: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
INFO:tensorflow:loss = 74.11139, step = 0
I1209 21:18:38.587360 140300370384704 basic_session_run_hooks.py:262] loss = 74.11139, step = 0
INFO:tensorflow:global_step/sec: 0.246814
I1209 21:19:19.102671 140300370384704 basic_session_run_hooks.py:692] global_step/sec: 0.246814

Thanks ! I've run this code well, It seem like I need reinstall my cuda on server.