2020-06-15 13:25:26.120816: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 134217728 totalling 128.00MiB
2020-06-15 13:25:26.120856: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 221800704 totalling 211.53MiB
2020-06-15 13:25:26.120889: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 10.05GiB
2020-06-15 13:25:26.120924: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 10813302272 memory_limit_: 10813302375 available bytes: 103 curr_region_allocation_bytes_: 17179869184
2020-06-15 13:25:26.120964: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 10813302375
InUse: 10796987136
MaxInUse: 10797147392
NumAllocs: 3723
MaxAllocSize: 221800704
2020-06-15 13:25:26.121263: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ****************************************************************************************************
2020-06-15 13:25:26.121329: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at transpose_op.cc:198 : Resource exhausted: OOM when allocating tensor with shape[512,896,2,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-06-15 13:25:26.328082: W tensorflow/core/kernels/data/cache_dataset_ops.cc:824] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Traceback (most recent call last):
File "/data/home/zhushanfeng/anaconda3/envs/TXLNet3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/data/home/zhushanfeng/anaconda3/envs/TXLNet3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/data/home/zhushanfeng/anaconda3/envs/TXLNet3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[512,1408,2,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/transformer/layer_12/rel_attn_1/einsum_3/transpose_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[model/transformer/StopGradient_17/_95]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[512,1408,2,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/transformer/layer_12/rel_attn_1/einsum_3/transpose_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
well i try smaller batch 8 and 2 in both data_util.py and train_gpu.py, i got the same error
but whatever the batch size, i got the same curr_region_allocationbytes: 17179869184 ,so after i tried 4 different batches, i am tied and dont want to try smaller seq_len, i wonder whether there exist memory leak
if anyone succeed in training with gpu of 11GB, please tell me more of your running spec, while most envs i saw in issues are GPU of 32GB
my env: py3+tf 1.15+2080Ti 11G*4
i wanna train xlnet from scratch, but got OOM with batch size 32, 16 ,8 and 2 under seq_len=512
here is the spec i run
python data_utils.py --bsz_per_host=16 --num_core_per_host=8 --seq_len=512 --reuse_len=256 --input_glob=data/pubmed_300w_line.txt --save_dir=untfrec16 --num_passes=20 --bi_data=True --sp_path=data/pub3m_cased.model --mask_alpha=6 --mask_beta=1 --num_predict=85 --uncased=False --use_tpu=False
python train_gpu.py --record_info_dir=/data/untfrec16/tfrecords/ --model_dir=/data/model/ --train_batch_size=16 --seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --mask_alpha=6 --mask_beta=1 --num_predict=85 --save_steps=10000
and i got this
well i try smaller batch 8 and 2 in both data_util.py and train_gpu.py, i got the same error but whatever the batch size, i got the same curr_region_allocationbytes: 17179869184 ,so after i tried 4 different batches, i am tied and dont want to try smaller seq_len, i wonder whether there exist memory leak
if anyone succeed in training with gpu of 11GB, please tell me more of your running spec, while most envs i saw in issues are GPU of 32GB
batch 2 shall set the core to 1...........