Open yuanhuang0825 opened 4 years ago
I ran into the same problem when I tried to fine-tune a model on my GPU with only 4 GB of memory. I had to reduce the per_gpu_train_batch_size to one and could only use the layoutlm base model.
U need GPU with higher memory
reducing the batch size from 16 to 8 solved this issue in my case
Describe Model I am using (LayoutLM ): what's the problem?
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",) Iteration: 0%| | 0/14 [00:01<?, ?it/s] Epoch: 0%| | 0/100 [00:01<?, ?it/s] Traceback (most recent call last): File "run_seq_labeling.py", line 811, in
main()
File "run_seq_labeling.py", line 704, in main
args, train_dataset, model, tokenizer, labels, pad_token_label_id
File "run_seq_labeling.py", line 219, in train
outputs = model(inputs)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/home/yuan/淵/layoutlm/unilm/layoutlm/examples/seq_labeling/layoutlm.py", line 578, in forward
head_mask=head_mask,
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, kwargs)
File "/home/yuan/淵/layoutlm/unilm/layoutlm/examples/seq_labeling/layoutlm.py", line 535, in forward
embedding_output, extended_attention_mask, head_mask=head_mask
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 407, in forward
hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 368, in forward
self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, kwargs)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 314, in forward
hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 251, in forward
context_layer = torch.matmul(attention_probs, value_layer)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/apex/amp/wrap.py", line 27, in wrapper
kwargs)
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/apex/amp/utils.py", line 81, in casted_args
new_args.append(cast_fn(x))
File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/apex/amp/utils.py", line 63, in maybe_half
return x.half()
RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 3.82 GiB total capacity; 2.76 GiB already allocated; 8.25 MiB free; 2.79 GiB reserved in total by PyTorch)