Closed guotong1988 closed 5 years ago
BERT-type: uncased_L-12_H-768_A-12 Batch_size = 32 BERT parameters: learning rate: 1e-05 Fine-tune BERT: True vocab size: 30522 hidden_size: 768 num_hidden_layer: 12 num_attention_heads: 12 hidden_act: gelu intermediate_size: 3072 hidden_dropout_prob: 0.1 attention_probs_dropout_prob: 0.1 max_position_embeddings: 512 type_vocab_size: 2 initializer_range: 0.02 Load pre-trained parameters. Seq-to-SQL: the number of final BERT layers to be used: 2 Seq-to-SQL: the size of hidden dimension = 100 Seq-to-SQL: LSTM encoding layer size = 2 Seq-to-SQL: dropout rate = 0.3 Seq-to-SQL: learning rate = 0.001 Traceback (most recent call last): File "train.py", line 603, in <module> dset_name='train') File "train.py", line 239, in train num_out_layers_n=num_target_layers, num_out_layers_h=num_target_layers) File "/data4/tong.guo/sqlova-master/sqlova/utils/utils_wikisql.py", line 817, in get_wemb_bert nlu_tt, t_to_tt_idx, tt_to_t_idx = get_bert_output(model_bert, tokenizer, nlu_t, headers, max_seq_length) File "/data4/tong.guo/sqlova-master/sqlova/utils/utils_wikisql.py", line 751, in get_bert_output all_encoder_layer, pooled_output = model_bert(all_input_ids, all_segment_ids, all_input_mask) File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 396, in forward all_encoder_layers = self.encoder(embedding_output, extended_attention_mask) File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 326, in forward hidden_states = layer_module(hidden_states, attention_mask) File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 311, in forward attention_output = self.attention(hidden_states, attention_mask) File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 272, in forward self_output = self.self(input_tensor, attention_mask) File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/data4/tong.guo/sqlova-master/bert/modeling.py", line 226, in forward attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) RuntimeError: CUDA out of memory. Tried to allocate 10.50 MiB (GPU 0; 11.17 GiB total capacity; 10.59 GiB already allocated; 5.69 MiB free; 257.72 MiB cached)
Hi @guotong1988
Decrease batch size and increase the number of accumulations of gradients. For example, --bS 8 --accumulate_gradients 4 would be the safe choice for the GPU having ram ~12G.
--bS 8 --accumulate_gradients 4
Thank you !!