wengong-jin / hgraph2graph

Hierarchical Generation of Molecular Graphs using Structural Motifs
MIT License
367 stars 108 forks source link

error with the latest version of rdkit (2021.03.1) #20

Closed byooooo closed 3 years ago

byooooo commented 3 years ago

Hello,

I just wanted to share that the code does not run properly if using the latest version of rdkit (2021.03.1). I tried to run the examples from the readme file python get_vocab.py < data/qed/mols.txt > vocab.txt and got the following error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/brian/opt/anaconda3/envs/hiervae/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/Users/brian/opt/anaconda3/envs/hiervae/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "get_vocab.py", line 10, in process
    hmol = MolGraph(s)
  File "/Users/brian/Desktop/git/hgraph2graph/hgraph/mol_graph.py", line 22, in __init__
    self.order = self.label_tree()
  File "/Users/brian/Desktop/git/hgraph2graph/hgraph/mol_graph.py", line 112, in label_tree
    cmol, inter_label = get_inter_label(mol, cls, inter_atoms)
  File "/Users/brian/Desktop/git/hgraph2graph/hgraph/chemutils.py", line 142, in get_inter_label
    new_mol = get_clique_mol(mol, atoms)
  File "/Users/brian/Desktop/git/hgraph2graph/hgraph/chemutils.py", line 111, in get_clique_mol
    smiles = Chem.MolFragmentToSmiles(mol, atoms, kekuleSmiles=True)
rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 7 marked aromatic
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "get_vocab.py", line 29, in <module>
    vocab_list = pool.map(process, batches)
  File "/Users/brian/opt/anaconda3/envs/hiervae/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Users/brian/opt/anaconda3/envs/hiervae/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 7 marked aromatic

I also faced a similar error when running the preprocessing script.

This Kekulize error has been mentioned in the latest RDKit release update:

https://github.com/rdkit/rdkit/releases

"Setting the kekuleSmiles argument (doKekule in C++) to MolToSmiles will now cause the molecule to be kekulized before SMILES generation. Note that this can lead to an exception being thrown. Previously this argument would only write kekulized SMILES if the molecule had already been kekulized (#2788)"

So far, my workaround has been to downgrade RDkit to an older version. I seem to no longer run into these errors after testing rdkit=2018.09.1.0, although it would be nice to know what version of rdkit was used in the original implementation.

byooooo commented 3 years ago

hi @wengong-jin, could you share what version of rdkit you used? I also run into the same key error during preprocessing that others have reported in #18 and #3 after running the get_vocab script successfully.

byooooo commented 3 years ago

More precise error: image

byooooo commented 3 years ago

@wengong-jin kindly shared his conda env with me:

numpy 1.19.1 pypi_0 pypi python 3.7.6 h0371630_2 pytorch 1.5.1 rdkit 2019.03.4.0 py37hc20afe1_1 rdkit

After rerunning the vocab and preprocessing with this environment (also with networkx) I no longer ran into the KeyError. There seem to be some subtle differences in fragment generation for rdkit=2018.09.1.0 and rdkit=2019.03.4.0 version, but i think it's probably best to further investigate to confirm. Will mark this issue resolved.

Nokimann commented 3 years ago

rdkit version: 2021.03.3 (didn't check in other versions)

Doing python get_vocab.py --ncpu 16 < aromatic.txt > vocab.txt with aromatic SMILES, an error occurs because of Chem.Kekulize function.

In chemutils.py,

Before:

def get_mol(smiles):
    mol = Chem.MolFromSmiles(smiles)
    if mol is not None: Chem.Kekulize(mol)
    return mol

After: adding clearAromaticFlags=True

def get_mol(smiles):
    mol = Chem.MolFromSmiles(smiles)
    if mol is not None: Chem.Kekulize(mol, clearAromaticFlags=True)
    return mol

Solved.