Generation example not working

cristianregep commented 4 years ago

I downloaded the package and ran from the generation folder the suggested process : python get_vocab.py --min_frequency 100 --ncpu 8 < ../data/polymers/all.txt > ../data/polymers/vocab.txt python preprocess.py --train ../data/polymers/train.txt --vocab data/polymers/vocab.txt --ncpu 8

I get the following error: """ Traceback (most recent call last): File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "preprocess.py", line 19, in tensorize x = MolGraph.tensorize(mol_batch, vocab, common_atom_vocab) File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/mol_graph.py", line 168, in tensorize tree_tensors, tree_batchG = MolGraph.tensorize_graph([x.mol_tree for x in mol_batch], vocab) File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/mol_graph.py", line 209, in tensorize_graph fnode[v] = vocab[attr] File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/vocab.py", line 43, in getitem return self.hmap[x[0]], self.vmap[x] KeyError: ('C1=CSC=N1', 'N1=[CH:2]S[CH:2]=[CH:1]1') """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "preprocess.py", line 49, in all_data = pool.map(func, batches) File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value KeyError: ('C1=CSC=N1', 'N1=[CH:2]S[CH:2]=[CH:1]1')

cristianregep commented 4 years ago

I traced the issue to be the fact that you load the motifs from the vocab in preprocess.py, instead of loading the original motifs that pass the min_frequency mark in get_vocab.py MolGraph.load_fragments([x[0] for x in vocab])

I got rid of the behaviour by saving the original fragments in a separate file after get_vocab.py and then loading them in preprocess.py. What I think is happening is that molecules are not split in the same way because of the difference of starting fragments.

wengong-jin commented 4 years ago

Hi,

I fixed this issue and now it should be able to run. Thank you!

HayeonLee commented 4 years ago

Hi, when I tried to run the generation example, a similar error occurs as below. Could you check this error? @wengong-jin

code: python preprocess.py --train ../data/polymers/train.txt --vocab ../data/polymers/inter_vocab.txt --ncpu 8

error: Traceback (most recent call last): File "preprocess.py", line 48, in <module> all_data = pool.map(func, batches) File "/st2/hayeon/anaconda3/envs/metasamp/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/st2/hayeon/anaconda3/envs/metasamp/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value KeyError: ('CN1C(=O)C2=C3C(=C(F)C=C4C(=O)N(C)C(=O)C(=C43)C(F)=C2)C1=O', 'CN1C(=O)C2=CC(F)=C3C(=O)N(C)C(=O)C4=C3C2=C(C1=O)C(F)=[CH:1]4')

wengong-jin commented 4 years ago

Hi,

I tried running the same command and there was no error. I think what you can do is to run get_vocab.py and see if the output is different from data/polymers/inter_vocab.txt. If they are different (I would be surprised), please try rerun preprocess.py and see if it succeeds.

nikhilmittal444 commented 4 years ago

Hi, I had the same trouble. I found the problem to be that the string being called to map from vocab is different from the ones available. In my case there was difference in the SMILES represntation of the double bond C(O) and C(=O) Screenshot (827) Could you tell how the problem can be resolved?

wengong-jin / hgraph2graph

Generation example not working #3