Closed smith-co closed 2 years ago
Please try a smaller batch_size or try another GPU with larger memory.
@AlanSwift I already tried a smaller batch size. What I find suprising is:
Its the same dataset. But GGNN and GraphSage fails to run while GCN and GAT works.
So GGNN/GraphSage needs more resource for some reason? Super interested to know why?
We haven't investigated the memory efficiency for dgl :). It seems that GGNN and GraphSage need more GPU memory.
@AlanSwift I get this OOM error at runtime for GGNN:
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/graph4nlp_cu111-0.4.0-py3.9.egg/graph4nlp/pytorch/models/graph2seq.py", line 226, in forward
return self.encoder_decoder(batch_graph=batch_graph, oov_dict=oov_dict, tgt_seq=tgt_seq)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/graph4nlp_cu111-0.4.0-py3.9.egg/graph4nlp/pytorch/models/graph2seq.py", line 173, in encoder_decoder
batch_graph = self.gnn_encoder(batch_graph)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/graph4nlp_cu111-0.4.0-py3.9.egg/graph4nlp/pytorch/modules/graph_embedding/ggnn.py", line 557, in forward
h = self.models(dgl_graph, (feat_in, feat_out), etypes, edge_weight)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/graph4nlp_cu111-0.4.0-py3.9.egg/graph4nlp/pytorch/modules/graph_embedding/ggnn.py", line 442, in forward
return self.model(graph, node_feats, etypes, edge_weight)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/graph4nlp_cu111-0.4.0-py3.9.egg/graph4nlp/pytorch/modules/graph_embedding/ggnn.py", line 210, in forward
graph_in.apply_edges(
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/dgl_cu111-0.7a210520-py3.9-linux-x86_64.egg/dgl/heterograph.py", line 4300, in apply_edges
edata = core.invoke_edge_udf(g, eid, etype, func)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/dgl_cu111-0.7a210520-py3.9-linux-x86_64.egg/dgl/core.py", line 85, in invoke_edge_udf
return func(ebatch)
File "/mnt/volume1/anaconda3/envs/ggnn/lib/python3.9/site-packages/graph4nlp_cu111-0.4.0-py3.9.egg/graph4nlp/pytorch/modules/graph_embedding/ggnn.py", line 212, in <lambda>
"W_e*h": self.linears_in[i](edges.src["h"])
RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 3; 14.76 GiB total capacity; 11.83 GiB already allocated; 447.75 MiB free; 12.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Any idea?
@AlanSwift came across this discussion at DGL: Memory consumption of the GGNN module
It seems the dgl sacrifices memory efficiency for time efficiency. We will pay attention to this problem. Thank you for letting us know it!
@AlanSwift can you please provide me with a fix/suggestion 🙏
@AlanSwift, this is interesting. I also faced the same problem. Wondering do you have any solution to this?
@AlanSwift do you have a plan to address the GGNN implementation limitation?
Currently, this is not on my plan since it is related to the DGL.
❓ Questions and Help
I am running the NMT example on the same dataset with GNN variants:
While the execution runs with GCN, I get Out-of-Memory (OOM) for GGNN and GraphSage. Can anyone help me with this query?