rdkit / knime-rdkit

The RDKit nodes for the KNIME Analytics Platform
24 stars 14 forks source link

SMILES Canonicalization With Very Long SMILES Values Crashed KNIME #50

Open manuelschwarze opened 5 years ago

manuelschwarze commented 5 years ago

A colleague encountered that KNIME crashes based on an RDKit crash that is caused by very long SMILES values when passed into the RDKit Canonicalize SMILES node. The SMILES in which this occurred link together Cyclooctine. The correct behavior would be to generate an error for such a SMILES if atom count is too large, but it should of course not crash.

greglandrum commented 5 years ago

Agreed. That is bad behavior. Which version of the nodes are being used? Which version of KNIME and which OS? Can you provide a SMILES that allows the problem to be reproduced?

manuelschwarze commented 5 years ago

Hi Greg,

We are using KNIME 3.3.4 with the NIBR build of the RDKit Nodes. I had deployed a hotfix in April I believe, so it would have the March edition of the RDKit binaries included. To reproduce I would suggest not to care about the KNIME version, but only about the RDKit binaries. I think it would be reproducible with the latest KNIME as well, if the cause for the freeze is really in the RDKit library.

Sorry, but I cannot give you a SMILES due to legal reasons, but Bernd Rohde said you would be able to build one yourself based on the information I gave you. It basically contains the same pattern n-times to make it so long that it has more than 1000 atoms. Hope that helps a little.

-Manuel

From: Greg Landrum [mailto:notifications@github.com] Sent: Tuesday, July 17, 2018 6:12 PM To: rdkit/knime-rdkit knime-rdkit@noreply.github.com Cc: Manuel Schwarze manuelschwarze@hotmail.com; Author author@noreply.github.com Subject: Re: [rdkit/knime-rdkit] SMILES Canonicalization With Very Long SMILES Values Crashed KNIME (#50)

Agreed. That is bad behavior. Which version of the nodes are being used? Which version of KNIME and which OS? Can you provide a SMILES that allows the problem to be reproduced?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/rdkit/knime-rdkit/issues/50#issuecomment-405639011, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AF6uWbaKrsfkOm3GS7Ywg15Ot0UbHvweks5uHgzRgaJpZM4VTHRy.

greglandrum commented 5 years ago

Unfortunately that doesn't help. We do tests like that already as part of the normal RDKit testing suite and it's not a general problem. Here's an example of generating canonical SMILES for 1600 atoms (it's "cyclooctene", which may be what you meant?). Using the RDKit SMILES generator:

In [3]: rdBase.rdkitVersion
Out[3]: '2018.03.2'

In [8]: m = Chem.MolFromSmiles('C1CCCC=CCC1'*200)

In [9]: s = Chem.MolToSmiles(m)

In [10]: len(s)
Out[10]: 2202

and using the AvalonTools smiles generator:

In [11]: s2 = pyAvalonTools.GetCanonSmiles(m)
2: clearing ring closure buffers

In [12]: len(s2)
Out[12]: 2360