openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

Prevent OFFTK from making invalid CMILES (for example, containing `NH+`) #1697

Open j-wags opened 1 year ago

j-wags commented 1 year ago

Describe the bug

cc #1696 cc https://github.com/openforcefield/qca-dataset-submission/pull/207 cc https://github.com/openforcefield/qca-dataset-submission/issues/327 cc https://github.com/openforcefield/openff-qcsubmit/pull/228

A few of the industry benchmarking molecules have CMILES with implicit Hs, for example "[F:1][c:2]1[c:3]([H:32])[c:4]([H:33])[c:5]([H:34])[c:6]([F:7])[c:8]1[C:9]1=[N:12][N:13]2[C:14](=[C:15]([H:37])[N:16]=[C:17]2[N:18]([c:19]2[c:20]([H:39])[nH+:21][c:22]([H:40])[c:23]([H:41])[c:24]2[N:25]2[C:26]([H:42])([H:43])[C@:30]([NH+:31]([H:51])[H:52])([H:50])[C:29]([H:48])([H:49])[C:28]([H:46])([H:47])[C:27]2([H:44])[H:45])[H:38])[C:11]([H:36])=[C:10]1[H:35]". This CMILES likely came from the OpenFF Toolkit - We should check the pathway that made it and ensure that it can't be made again.