wengong-jin / hgraph2graph

Hierarchical Generation of Molecular Graphs using Structural Motifs
MIT License
379 stars 110 forks source link

Getting error when run vae_train.py #17

Open SejeongPark8354 opened 3 years ago

SejeongPark8354 commented 3 years ago

First of all, Thank you for your great research on molecule generation. Nowadays, I am training my ZINC datasets with your vae_train.py (in generation folder). When I run the code, I got the error like below. This error occur occasionally. I think it depends on the batch. Is there any solution for this problem?

  warnings.warn(warning.format(ret))
Model #Params: 160850K
[50] Beta: 0.100, KL: 19.11, loss: 57.167, Word: 10.76, 52.60, Topo: 80.77, Assm: 56.73, PNorm: 175.70, GNorm: 18.64
[100] Beta: 0.100, KL: 9.08, loss: 42.075, Word: 14.69, 59.69, Topo: 93.39, Assm: 75.03, PNorm: 236.81, GNorm: 14.60
[150] Beta: 0.100, KL: 9.66, loss: 39.316, Word: 16.71, 62.58, Topo: 96.62, Assm: 77.06, PNorm: 293.82, GNorm: 17.42
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "vae_train.py", line 81, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/hgnn.py", line 88, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 130, in forward
    hatom,_ = self.graph_encoder(*tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 30, in forward
    h = self.rnn(fmess, bgraph)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 105, in forward
    h,c = self.LSTM(fmess, h_nei, c_nei)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 92, in LSTM
    c = i * u + (f * c_nei).sum(dim=1)
RuntimeError: CUDA error: device-side assert triggered
jks17 commented 3 years ago

I am also getting the above issue - Did you manage to find a fix @SejeongPark8354 ?

marshallcase commented 2 years ago

getting a very similar issue when running train_generator.py:

Namespace(anneal_iter=25000, anneal_rate=0.9, atom_vocab=<hgraph.vocab.Vocab object at 0x000001C10639ED48>, batch_size=20, clip_norm=5.0, depthG=15, depthT=15, diterG=3, diterT=1, dropout=0.0, embed_size=250, epoch=20, hidden_size=125, kl_anneal_iter=2000, latent_size=32, load_model=None, lr=0.001, max_beta=1.0, print_iter=50, rnn_type='LSTM', save_dir='ckpt/cyclic_truncated_pretrained', save_iter=5000, seed=7, step_beta=0.001, train='train_processed/cyclic_truncated_processed/', vocab='data/chembl/cyclic_peptide_vocab_truncated.txt', warmup=10000)
C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Model #Params: 1318K
  0%|▏                                                                              | 2/1000 [00:32<4:01:50, 14.54s/it]C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [40,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [41,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [42,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [43,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [60,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [61,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [62,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [63,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
  0%|▏                                                                              | 2/1000 [00:35<4:56:07, 17.80s/it]
Traceback (most recent call last):
  File "train_generator.py", line 92, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\hgnn.py", line 55, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 129, in forward
    tensors = self.embed_graph(graph_tensors)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 114, in embed_graph
    fpos = self.E_apos.index_select(index=fmess[:, 3], dim=0)
RuntimeError: CUDA error: device-side assert triggered
marshallcase commented 2 years ago

Actually, I think I figured it out. There's a parameter defined in mol_graph.py , MAX_POS = 20, which limits the E_apos matrix, E_pos matrix, and subsequently when in the enconder, the f_mess matrix will be out of index which is why you get the error.

I think it's an issue of molecule size and graph complexity - in the paper, there's a subscript: "The number of possible attachments are limited because the number of attaching atoms between two motifs is small and the attaching points must be consecutive.3

3In our experiments, the number of possible attachments are usually less than 20 for polymers and small molecules."

Bunnybeibei commented 1 year ago

I agree with the above person's advice. I first use "_os.environ['CUDA_LAUNCHBLOCKING'] = '1'" to locate the bug, I find there are some problem with "_fpos = self.E_apos.indexselect(index=fmess[:, 3], dim=0)". And then I use the slice to locate where the error is,I find the max number of fmess[:,3] is 22 while self.E_apos only has 20 dims. So I increase the MAX_POS in mol_graph.py and solve this problem. I think the operation would not affect the models, maybe waste some memory.