naver / sqlova

Apache License 2.0
632 stars 168 forks source link

RuntimeError: CUDA out of memory. #23

Closed guotong1988 closed 5 years ago

guotong1988 commented 5 years ago
BERT-type: uncased_L-12_H-768_A-12
Batch_size = 32
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: True
vocab size: 30522
hidden_size: 768
num_hidden_layer: 12
num_attention_heads: 12
hidden_act: gelu
intermediate_size: 3072
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
Load pre-trained parameters.
Seq-to-SQL: the number of final BERT layers to be used: 2
Seq-to-SQL: the size of hidden dimension = 100
Seq-to-SQL: LSTM encoding layer size = 2
Seq-to-SQL: dropout rate = 0.3
Seq-to-SQL: learning rate = 0.001

Traceback (most recent call last):
  File "train.py", line 603, in <module>
    dset_name='train')
  File "train.py", line 239, in train
    num_out_layers_n=num_target_layers, num_out_layers_h=num_target_layers)
  File "/data4/tong.guo/sqlova-master/sqlova/utils/utils_wikisql.py", line 817, in get_wemb_bert
    nlu_tt, t_to_tt_idx, tt_to_t_idx = get_bert_output(model_bert, tokenizer, nlu_t, headers, max_seq_length)
  File "/data4/tong.guo/sqlova-master/sqlova/utils/utils_wikisql.py", line 751, in get_bert_output
    all_encoder_layer, pooled_output = model_bert(all_input_ids, all_segment_ids, all_input_mask)
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 396, in forward
    all_encoder_layers = self.encoder(embedding_output, extended_attention_mask)
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 326, in forward
    hidden_states = layer_module(hidden_states, attention_mask)
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 311, in forward
    attention_output = self.attention(hidden_states, attention_mask)
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 272, in forward
    self_output = self.self(input_tensor, attention_mask)
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 226, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: CUDA out of memory. Tried to allocate 10.50 MiB (GPU 0; 11.17 GiB total capacity; 10.59 GiB already allocated; 5.69 MiB free; 257.72 MiB cached)
whwang299 commented 5 years ago

Hi @guotong1988

Decrease batch size and increase the number of accumulations of gradients. For example, --bS 8 --accumulate_gradients 4 would be the safe choice for the GPU having ram ~12G.

guotong1988 commented 5 years ago

Thank you !!