snap-stanford / GreaseLM

[ICLR 2022 spotlight]GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
MIT License
228 stars 40 forks source link

RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB #2

Open tarun3300 opened 2 years ago

tarun3300 commented 2 years ago

When we are trying to run the greaselm.py we are getting this issue even if we run the batch size minimum of 8

we tried from 128-8 every time, It throws the error with different memory size as free , after some epochs. can you help us here in solving this issue and run the code

logits, _ = model(*[x[a:b] for x in input_data])
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 85, in forward
    logits, attn = self.lmgnn(lm_inputs, concept_ids,
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 217, in forward
    outputs, gnn_output = self.mp(input_ids, token_type_ids, attention_mask, output_mask, gnn_input, adj, node_type_ids, node_scores, special_nodes_mask, output_hidden_$
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 411, in forward
    encoder_outputs, _X = self.encoder(embedding_output,
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 815, in forward
    _X = self.gnn_layers[gnn_layer_index](_X, edge_index, edge_type, _node_type, _node_feature_extra)
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_gnn.py", line 91, in forward
    aggr_out = self.propagate(edge_index, x=x, edge_attr=edge_embeddings) #[N, emb_dim]
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 261, in propagate
    coll_dict = self.__collect__(self.__user_args__, edge_index, size,
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 171, in _collect_
    data = self.__lift__(data, edge_index,
  File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 141, in _lift_
    return src.index_select(self.node_dim, index)
RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 15.78 GiB total capacity; 14.28 GiB already allocated; 133.50 MiB free; 14.39 GiB reserved in tot
anas-rz commented 2 years ago

Faced the same issue. While training.

XikunZhang commented 2 years ago

Can you try further decreasing the mini-batch size?

weiyifan1023 commented 2 years ago

Can you try further decreasing the mini-batch size?

This method doesn't work for me.

Lucency0331 commented 9 months ago

When we are trying to run the greaselm.py we are getting this issue even if we run the batch size minimum of 8

we tried from 128-8 every time, It throws the error with different memory size as free , after some epochs. can you help us here in solving this issue and run the code

logits, _ = model(*[x[a:b] for x in input_data])
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
  result = self.forward(*input, **kwargs)
File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 85, in forward
  logits, attn = self.lmgnn(lm_inputs, concept_ids,
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
  result = self.forward(*input, **kwargs)
File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 217, in forward
  outputs, gnn_output = self.mp(input_ids, token_type_ids, attention_mask, output_mask, gnn_input, adj, node_type_ids, node_scores, special_nodes_mask, output_hidden_$
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
  result = self.forward(*input, **kwargs)
File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 411, in forward
  encoder_outputs, _X = self.encoder(embedding_output,
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
  result = self.forward(*input, **kwargs)
File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_greaselm.py", line 815, in forward
  _X = self.gnn_layers[gnn_layer_index](_X, edge_index, edge_type, _node_type, _node_feature_extra)
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
  result = self.forward(*input, **kwargs)
File "/scratch/users/tgudela/greaseLM/GreaseLM-main/modeling/modeling_gnn.py", line 91, in forward
  aggr_out = self.propagate(edge_index, x=x, edge_attr=edge_embeddings) #[N, emb_dim]
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 261, in propagate
  coll_dict = self.__collect__(self.__user_args__, edge_index, size,
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 171, in _collect_
  data = self.__lift__(data, edge_index,
File "/home/t/tgudela/.conda/envs/greaselm2/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 141, in _lift_
  return src.index_select(self.node_dim, index)
RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 15.78 GiB total capacity; 14.28 GiB already allocated; 133.50 MiB free; 14.39 GiB reserved in tot

oh, the totally same problem, have u guys solved this?