wengong-jin / hgraph2graph

Hierarchical Generation of Molecular Graphs using Structural Motifs
MIT License
355 stars 107 forks source link

Regarding structural motifs (bee hives) #23

Closed hwidong-na closed 3 years ago

hwidong-na commented 3 years ago

Hi,

I have a question for the tree decomposition function below, especially related to structural motifs (bee hives).

https://github.com/wengong-jin/hgraph2graph/blob/e396dbaf43f9d4ac2ee2568a4d5f93ad9e78b767/hgraph/mol_graph.py#L54-L83

It seems structural motifs are not extracted from the code above. For example, converting a polymer smiles using mol_graph

echo "Cc1cc2c(cc1C)c1cc(-c3cc4c5nn(C)nc5c5cc(-c6cc7c8cc(C)c(C)cc8c8ccsc8c7s6)sc5c4s3)sc1c1sccc21" | python hgraph/mol_graph.py

results only bonds and single rings, not bee hives.

[(0, 1), (6, 7), (10, 11), (16, 17), (22, 23), (28, 29), (30, 31), (1, 6, 5, 4, 3, 2), (9, 8, 46, 45, 10), (12, 11, 44, 43, 13), (15, 14, 19, 18, 16), (21, 20, 42, 41, 22), (24, 23, 40, 39, 25), (27, 28, 30, 32, 33, 26), (35, 34, 38, 37, 36), (48, 47, 51, 50, 49), (3, 51, 47, 46, 8, 4), (13, 43, 42, 20, 19, 14), (25, 39, 38, 34, 33, 26)] {0: ('CC', 'C[CH3:1]'), 1: ('CC', 'C[CH3:1]'), 2: ('CC', 'C[CH3:1]'), 3: ('CN', 'C[NH2:1]'), 4: ('CC', 'C[CH3:1]'), 5: ('CC', 'C[CH3:1]'), 6: ('CC', 'C[CH3:1]'), 7: ('C1=CC=CC=C1', 'C1=CC=[CH:1]C=C1'), 8: ('C1=CSC=C1', 'C1=C[CH:1]=[CH:1]S1'), 9: ('C1=CSC=C1', 'C1=CS[CH:1]=C1'), 10: ('C1=N[NH]N=C1', 'N1=[CH:1][CH:1]=N[NH]1'), 11: ('C1=CSC=C1', 'C1=C[CH:1]=[CH:1]S1'), 12: ('C1=CSCC1', 'C1=[CH:1]SCC1'), 13: ('C1=CCCC=C1', 'C1=C[CH2:1][CH2:1]C=C1'), 14: ('C1=CSCC1', 'C1=C[CH2:1][CH2:1]S1'), 15: ('C1=CSC=C1', 'C1=C[CH:1]=[CH:1]S1'), 16: ('C1=CC=CC=C1', 'C1=C[CH:1]=[CH:1]C=C1'), 17: ('C1=CCCC=C1', 'C1=C[CH:1]=[CH:1]CC1'), 18: ('C1=CC=CC=C1', 'C1=CC=[CH:1][CH:1]=C1')}

Any suggestions?

wengong-jin commented 3 years ago

Hi,

Please use the code in polymer/ directory for polymer generation. The tree decomposition there is different and it will give you bee hives.

hwidong-na commented 3 years ago

Training vae raises the following error.

hgraph2graph/polymers$ python vae_train.py --train train_processed/ --vocab ../data/polymers/inter_vocab.txt --save_dir ckpt/tmp

Model #Params: 5708K
Traceback (most recent call last):
  File "vae_train.py", line 80, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "/home/leona/anaconda3/envs/motif/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
hgraph2graph/polymers/poly_hgraph/hgnn.py", line 77, in forward
    loss, wacc, iacc, tacc, sacc = self.decoder((root_vecs, root_vecs, root_vecs), graphs, tensors, orders)
  File "/home/leona/anaconda3/envs/motif/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889,
    result = self.forward(*input, **kwargs)
hgraph2graph/polymers/poly_hgraph/decoder.py", line 229, in forward
    clab, ilab = self.vocab[ tree_batch.nodes[yid]['label'] ]
hgraph2graph/polymers/poly_hgraph/vocab.py", line 43, in __getitem__
    return self.hmap[x[0]], self.vmap[x]
KeyError: 'O=C1NC(=O)C2=C3C1=CC=C1C(=O)NC(=O)C(=C13)C=C2'
hwidong-na commented 3 years ago

Solved: It is required to use the same RDKit version for generating vocab.txt