openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

N-methyl groups can be inappropriately perceived in molecules like N-methyl leucine #1454

Open Yoshanuikabundi opened 1 year ago

Yoshanuikabundi commented 1 year ago

Describe the bug N-methyl "residues" can be incorrectly perceived at the N-terminus.

N-methyl residues usually cap the C-terminus, where they form a peptide bond with the previous residue. If the N terminus of a peptide is directly methylated, that can be perceived as an N-methyl group, though its chemistry is very different. This affects both percieve_residues() and parametrisation:

To Reproduce Note that there is no peptide bond in this molecule:

>>> n_meth_leu = Molecule.from_smiles("CN[C@H](CCC(C)C)C(=O)O")
>>> n_meth_leu.perceive_residues()
>>> n_meth_leu.residues
[HierarchyElement ('None', 'None', 'None', 'None') of iterator 'residues' containing 22 atom(s),
 HierarchyElement ('None', 1, ' ', 'NME') of iterator 'residues' containing 6 atom(s)]

Adding an NME cap, the NME cap is correctly parametrized by Amber while the leucine's N-methylation is incorrectly identically parametrized:

sage_ff14sb = ForceField('openff-2.0.0.offxml', 'ff14sb_off_impropers_0.0.3.offxml')
n_methyl_leucine_nme = Molecule.from_smiles("CN[C@H](CCC(C)C)C(=O)ONC")
depict_charge_source(n_methyl_leucine_nme, sage_ff14sb)

nmeleu

The leucine residue is correctly left uncharged (ie, charges must be generated).

Additional context Brought to our attention by Rebecca Alford on Slack: https://openforcefieldgroup.slack.com/archives/C011Z72DFH8/p1667567086898229?thread_ts=1667500852.980619&cid=C011Z72DFH8

Yoshanuikabundi commented 1 year ago

Whoops! That's not a peptide bond. Neither of those N-methyl groups should be matched (though both are). The SMILES in the second example should be "CN[C@H](CCC(C)C)C(=O)NC"

This issue still stands (my explanation was just off)