openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
313 stars 92 forks source link

RDKit `from_smiles` allows implicit H smiles when explicit H is True #936

Open SimonBoothroyd opened 3 years ago

SimonBoothroyd commented 3 years ago

Describe the bug

The RDKit toolkit wrapper does not raise the expected exception when creating a molecule from '[NH+:1]([C@:2]([C:3](=[O:4])[O-:10])([C:5]([C:6]([C:7]([C:8]([NH+:9]([H:22])[H:23])([H:20])[H:21])([H:18])[H:19])([H:16])[H:17])([H:14])[H:15])[H:13])([H:11])[H:12]' when the explicit_hydrogens option is set to true.

This allows some of the Season 1 pharmaceutical benchmarking molecules (of which this is an example) to pass through the checks likely incorrectly. The correct exception is raised by OpenEye.

(cc @JoshHorton , @dotsdl , @j-wags )

To Reproduce

Molecule.from_mapped_smiles(
    '[NH+:1]([C@:2]([C:3](=[O:4])[O-:10])([C:5]([C:6]([C:7]([C:8]([NH+:9]([H:22])[H:23])([H:20])[H:21])([H:18])[H:19])([H:16])[H:17])([H:14])[H:15])[H:13])([H:11])[H:12]',
    toolkit_registry=RDKitToolkitWrapper()
)

Output The full error message (may be large, that's fine. Better to paste too much than too little.)

Computing environment (please complete the following information):

Additional context Add any other context about the problem here.

j-wags commented 3 years ago

Possibly related to #900 -- Looking into this