microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.62k stars 2.5k forks source link

run_seq_labeling error #242

Open yuanhuang0825 opened 4 years ago

yuanhuang0825 commented 4 years ago

Describe Model I am using (LayoutLM ): what's the problem?


Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",) Iteration: 0%| | 0/14 [00:01<?, ?it/s] Epoch: 0%| | 0/100 [00:01<?, ?it/s] Traceback (most recent call last): File "run_seq_labeling.py", line 811, in main() File "run_seq_labeling.py", line 704, in main args, train_dataset, model, tokenizer, labels, pad_token_label_id File "run_seq_labeling.py", line 219, in train outputs = model(inputs) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/home/yuan/淵/layoutlm/unilm/layoutlm/examples/seq_labeling/layoutlm.py", line 578, in forward head_mask=head_mask, File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/yuan/淵/layoutlm/unilm/layoutlm/examples/seq_labeling/layoutlm.py", line 535, in forward embedding_output, extended_attention_mask, head_mask=head_mask File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 407, in forward hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 368, in forward self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 314, in forward hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/transformers/modeling_bert.py", line 251, in forward context_layer = torch.matmul(attention_probs, value_layer) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/apex/amp/wrap.py", line 27, in wrapper kwargs) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/apex/amp/utils.py", line 81, in casted_args new_args.append(cast_fn(x)) File "/home/yuan/anaconda3/envs/layoutlm/lib/python3.6/site-packages/apex/amp/utils.py", line 63, in maybe_half return x.half() RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 3.82 GiB total capacity; 2.76 GiB already allocated; 8.25 MiB free; 2.79 GiB reserved in total by PyTorch)

r000bin commented 4 years ago

I ran into the same problem when I tried to fine-tune a model on my GPU with only 4 GB of memory. I had to reduce the per_gpu_train_batch_size to one and could only use the layoutlm base model.

cloudfool commented 4 years ago

U need GPU with higher memory

ruifcruz commented 3 years ago

reducing the batch size from 16 to 8 solved this issue in my case