Closed byooooo closed 3 years ago
hi @wengong-jin, could you share what version of rdkit you used? I also run into the same key error during preprocessing that others have reported in #18 and #3 after running the get_vocab script successfully.
More precise error:
@wengong-jin kindly shared his conda env with me:
numpy 1.19.1 pypi_0 pypi python 3.7.6 h0371630_2 pytorch 1.5.1 rdkit 2019.03.4.0 py37hc20afe1_1 rdkit
After rerunning the vocab and preprocessing with this environment (also with networkx) I no longer ran into the KeyError. There seem to be some subtle differences in fragment generation for rdkit=2018.09.1.0 and rdkit=2019.03.4.0 version, but i think it's probably best to further investigate to confirm. Will mark this issue resolved.
rdkit version: 2021.03.3 (didn't check in other versions)
Doing
python get_vocab.py --ncpu 16 < aromatic.txt > vocab.txt
with aromatic SMILES,
an error occurs because of Chem.Kekulize function.
In chemutils.py,
Before:
def get_mol(smiles):
mol = Chem.MolFromSmiles(smiles)
if mol is not None: Chem.Kekulize(mol)
return mol
After: adding clearAromaticFlags=True
def get_mol(smiles):
mol = Chem.MolFromSmiles(smiles)
if mol is not None: Chem.Kekulize(mol, clearAromaticFlags=True)
return mol
Solved.
Hello,
I just wanted to share that the code does not run properly if using the latest version of rdkit (2021.03.1). I tried to run the examples from the readme file
python get_vocab.py < data/qed/mols.txt > vocab.txt
and got the following error:I also faced a similar error when running the preprocessing script.
This Kekulize error has been mentioned in the latest RDKit release update:
https://github.com/rdkit/rdkit/releases
"Setting the kekuleSmiles argument (doKekule in C++) to MolToSmiles will now cause the molecule to be kekulized before SMILES generation. Note that this can lead to an exception being thrown. Previously this argument would only write kekulized SMILES if the molecule had already been kekulized (#2788)"
So far, my workaround has been to downgrade RDkit to an older version. I seem to no longer run into these errors after testing rdkit=2018.09.1.0, although it would be nice to know what version of rdkit was used in the original implementation.