wengong-jin / hgraph2graph

Hierarchical Generation of Molecular Graphs using Structural Motifs
MIT License
375 stars 109 forks source link

Some motifs in generated vocabulary are not parseable for rdkit #45

Open suyufeng opened 1 year ago

suyufeng commented 1 year ago

I was trying to build our customized language models. I found the pattern "C1=CC=CCNCCcc[cH:1]CC=CCCCC=CCCC=CCCCCC=C1" generated by "get_vocab.py" are not parseable for rdkit.

So when I ran the "preprocess.py", it would report an error on hgraph2graph/hgraph/vocab.py line 65, in count_inters: inters = [a for a in mol.GetAtoms() if a.GetAtomMapNum() > 0] AttributeError: 'NoneType' object has no attribute 'GetAtoms'

It is because within the function vocab.py::count_inters, the code tried to covert smile to mol: line 64: mol = Chem.MolFromSmiles(s)

I would appreciate someone can provide a solution.