Open wenyan4work opened 2 years ago
I found these phase angles come from this file: http://www.ccl.net/cca/data/parm_at_Frosst/parm_Frosst.frcmod and has been used in all versions of openff offxml files
Also, in 0.10.6 (not in 0.11.0), the two permutations of the propertorsion atom indices look weird. Is this a bug or a feature? https://github.com/openforcefield/openff-toolkit/blob/37d6806bbb9d0d8aacb7221891b5d86dcdee014b/openff/toolkit/topology/topology.py#L146
Should this be 3-2-1-0, instead of 3-1-2-0?
Should this be 3-2-1-0, instead of 3-1-2-0?
Just to comment on this -- that's right, it should be 3, 2, 1, 0 @wenyan4work. It was a minor bug, fixed in #1185. However, I'm not sure ValenceDict.index_of
is/was actually used anywhere in the code.
"[#16X2,#16X1-1,#16X3+1:1]-[#6X3:2]-[#6X4:3]-[#7X4,#7X3:4]"
Thank you, @lilyminium !
Could you also comment on the issue of the order of atoms in the result returned by 'label_molecule'? For example in the parameter t25, the order defined by the smirks pattern is S-C-C-N, but in our molecule since the N has an atom index smaller than S, the parsed result becomes N-C-C-S.
This is a really great question @wenyan4work.
I could be wrong, but I don't really see a physical justification why general dihedral parameters should have a "handedness" - That is, I'd think that all of our dihedral parameters should be "even". Chemically, if we intended parameters to have a handedness, I'd also expect to see stereochemistry in the SMIRKS (or for the SMIRKS to be constructed such that they guarantee some symmetry, so that the term is applied many times across the central bond and the handedness cancels out)
In addition to phase
, each ProperTorsion term also has a periodicity
(basically a frequency). I was thinking that maybe these were constructed with a periodicity that interacts with the phase to remove the handedness - For the torsions you've identified they are:
Plotting some of these as cos((periodicity*x) - phase)
, some of them look even, but others are indeed odd.
The other weird thing about these parameters is that they have multiple terms with the same periodicities - t25 and 26 have periodicities of 4,3,2,2,1
and 4,3,2,2,1,1
respectively. I don't understand the purpose of the repeated periodicities.
I'm looking through other information sources and will compile information here - This isn't the first time this has come up but I'm not sure that the question was resolved before. Most conversations see that the 90 and 270 values were in parm@Frosst, and everyone assumes they must have a good reason, and the discussion ends there. But maybe it was a mistake all along.
Slack DM thread from March 2021:
Christopher Bayly 13:39 We are puzzled by torsions having more than one parameter of the same periodicity, e.g. t25 in 1.3.0 which has two periodicity 2 torsions (they have different phase angles):
It was unexpected by us so we are fixing our related bug but it seems really really odd to me (physics-wise). Without using too much of your time can you tell me when we started doing this?
Jeffrey Wagner 14:20 Looking quickly, it looks like the doubly-occupied periodicity of t25 started in S99F-1.0.7 [1]. But even before then (in version 1.0.6 and earlier), t26 and t27 has the same thing going on [1] https://github.com/openforcefield/smirnoff99Frosst/blob/master/smirnoff99frosst/offxml/smirnoff99Frosst-1.0.7.offxml#L167 [2] https://github.com/openforcefield/smirnoff99Frosst/blob/master/smirnoff99frosst/offxml/smirnoff99Frosst-1.0.6.offxml#L169-L171
Christopher Bayly 14:22 Huh... did I do that, way back in the day? It doesn't look like the kind of thing I would do... 14:22 Thanks for looking into it.
Thank you @j-wags !
The 90/270 phase angles are indeed really unique. I checked gaff 2.11 and all their phase angles are either 0 or 180.
Aside from this peculiarity, the 'always sorted atom indices' is a feature and is guaranteed to be sorted for all bond/angle/torsion/vdw terms, correct?
Let me check that. I think the desired behavior for both parameter application and label_molecules
would be to return the matches exactly according to the SMIRKS atom labeling, without doing any sorting. So I'll look into it and write back here.
Thank you. We labeled many molecules and observed that for all returned bond/angle/torsion indices: bond: atomidx0 < atomidx1 angle: atomidx0 < atomidx2 propertorsion: atomidx0 < atomidx3
for example, if we randomly generate a molecule with smiles "NCOCl", we get the following
(0, 1) OrderedDict([('smirks', '[#6:1]-[#7:2]'), ('id', 'b7'), ('length', Quantity(value=1.464762957261, unit=angstrom)), ('k', Quantity(value=732.6809445917, unit=kilocalorie/(angstrom**2*mole)))])
(0, 4) OrderedDict([('smirks', '[#7:1]-[#1:2]'), ('id', 'b87'), ('length', Quantity(value=1.019481865027, unit=angstrom)), ('k', Quantity(value=1010.288992386, unit=kilocalorie/(angstrom**2*mole)))])
(0, 5) OrderedDict([('smirks', '[#7:1]-[#1:2]'), ('id', 'b87'), ('length', Quantity(value=1.019481865027, unit=angstrom)), ('k', Quantity(value=1010.288992386, unit=kilocalorie/(angstrom**2*mole)))])
(1, 2) OrderedDict([('smirks', '[#6X4:1]-[#8X2H0:2]'), ('id', 'b16'), ('length', Quantity(value=1.425895053732, unit=angstrom)), ('k', Quantity(value=733.4817683494, unit=kilocalorie/(angstrom**2*mole)))])
(1, 6) OrderedDict([('smirks', '[#6X4:1]-[#1:2]'), ('id', 'b84'), ('length', Quantity(value=1.093899492634, unit=angstrom)), ('k', Quantity(value=740.0934137725, unit=kilocalorie/(angstrom**2*mole)))])
(1, 7) OrderedDict([('smirks', '[#6X4:1]-[#1:2]'), ('id', 'b84'), ('length', Quantity(value=1.093899492634, unit=angstrom)), ('k', Quantity(value=740.0934137725, unit=kilocalorie/(angstrom**2*mole)))])
(0, 1, 2) OrderedDict([('smirks', '[*:1]~[#6X4:2]-[*:3]'), ('angle', Quantity(value=116.5475862634, unit=degree)), ('k', Quantity(value=106.4106325309, unit=kilocalorie/(mole*radian**2))), ('id', 'a1')])
(0, 1, 6) OrderedDict([('smirks', '[*:1]~[#6X4:2]-[*:3]'), ('angle', Quantity(value=116.5475862634, unit=degree)), ('k', Quantity(value=106.4106325309, unit=kilocalorie/(mole*radian**2))), ('id', 'a1')])
(0, 1, 7) OrderedDict([('smirks', '[*:1]~[#6X4:2]-[*:3]'), ('angle', Quantity(value=116.5475862634, unit=degree)), ('k', Quantity(value=106.4106325309, unit=kilocalorie/(mole*radian**2))), ('id', 'a1')])
(1, 0, 4) OrderedDict([('smirks', '[#1:1]-[#7X4,#7X3,#7X2-1:2]-[*:3]'), ('angle', Quantity(value=110.92021963, unit=degree)), ('k', Quantity(value=189.0602485815, unit=kilocalorie/(mole*radian**2))), ('id', 'a19')])
(1, 0, 5) OrderedDict([('smirks', '[#1:1]-[#7X4,#7X3,#7X2-1:2]-[*:3]'), ('angle', Quantity(value=110.92021963, unit=degree)), ('k', Quantity(value=189.0602485815, unit=kilocalorie/(mole*radian**2))), ('id', 'a19')])
(1, 2, 3) OrderedDict([('smirks', '[*:1]-[#8:2]-[*:3]'), ('angle', Quantity(value=110.3538806181, unit=degree)), ('k', Quantity(value=130.181232192, unit=kilocalorie/(mole*radian**2))), ('id', 'a28')])
(2, 1, 6) OrderedDict([('smirks', '[*:1]~[#6X4:2]-[*:3]'), ('angle', Quantity(value=116.5475862634, unit=degree)), ('k', Quantity(value=106.4106325309, unit=kilocalorie/(mole*radian**2))), ('id', 'a1')])
(2, 1, 7) OrderedDict([('smirks', '[*:1]~[#6X4:2]-[*:3]'), ('angle', Quantity(value=116.5475862634, unit=degree)), ('k', Quantity(value=106.4106325309, unit=kilocalorie/(mole*radian**2))), ('id', 'a1')])
(4, 0, 5) OrderedDict([('smirks', '[#1:1]-[#7X4,#7X3,#7X2-1:2]-[*:3]'), ('angle', Quantity(value=110.92021963, unit=degree)), ('k', Quantity(value=189.0602485815, unit=kilocalorie/(mole*radian**2))), ('id', 'a19')])
(6, 1, 7) OrderedDict([('smirks', '[#1:1]-[#6X4:2]-[#1:3]'), ('angle', Quantity(value=115.6030999533, unit=degree)), ('k', Quantity(value=97.55298529519, unit=kilocalorie/(mole*radian**2))), ('id', 'a2')])
(0, 1, 2, 3) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#8X2H0:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't95'), ('k1', Quantity(value=0.1421275145631, unit=kilocalorie/mole)), ('idivf1', 3.0)])
(2, 1, 0, 4) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#7X3:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't51'), ('k1', Quantity(value=0.2036670163956, unit=kilocalorie/mole)), ('idivf1', 1.0)])
(2, 1, 0, 5) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#7X3:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't51'), ('k1', Quantity(value=0.2036670163956, unit=kilocalorie/mole)), ('idivf1', 1.0)])
(3, 2, 1, 6) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#8X2H0:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't95'), ('k1', Quantity(value=0.1421275145631, unit=kilocalorie/mole)), ('idivf1', 3.0)])
(3, 2, 1, 7) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#8X2H0:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't95'), ('k1', Quantity(value=0.1421275145631, unit=kilocalorie/mole)), ('idivf1', 3.0)])
(4, 0, 1, 6) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#7X3:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't51'), ('k1', Quantity(value=0.2036670163956, unit=kilocalorie/mole)), ('idivf1', 1.0)])
(4, 0, 1, 7) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#7X3:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't51'), ('k1', Quantity(value=0.2036670163956, unit=kilocalorie/mole)), ('idivf1', 1.0)])
(5, 0, 1, 6) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#7X3:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't51'), ('k1', Quantity(value=0.2036670163956, unit=kilocalorie/mole)), ('idivf1', 1.0)])
(5, 0, 1, 7) OrderedDict([('smirks', '[*:1]-[#6X4:2]-[#7X3:3]-[*:4]'), ('periodicity1', 3), ('phase1', Quantity(value=0.0, unit=degree)), ('id', 't51'), ('k1', Quantity(value=0.2036670163956, unit=kilocalorie/mole)), ('idivf1', 1.0)])
(0,) OrderedDict([('smirks', '[#7:1]'), ('epsilon', Quantity(value=0.1676915150424, unit=kilocalorie/mole)), ('id', 'n20'), ('rmin_half', Quantity(value=1.799798315098, unit=angstrom))])
(1,) OrderedDict([('smirks', '[#6X4:1]'), ('epsilon', Quantity(value=0.1088406109251, unit=kilocalorie/mole)), ('id', 'n16'), ('rmin_half', Quantity(value=1.896698071741, unit=angstrom))])
(2,) OrderedDict([('smirks', '[#8X2H0+0:1]'), ('epsilon', Quantity(value=0.1684651402602, unit=kilocalorie/mole)), ('id', 'n18'), ('rmin_half', Quantity(value=1.697783613804, unit=angstrom))])
(3,) OrderedDict([('smirks', '[#17:1]'), ('epsilon', Quantity(value=0.2656001046527, unit=kilocalorie/mole)), ('id', 'n24'), ('rmin_half', Quantity(value=1.85628721824, unit=angstrom))])
(4,) OrderedDict([('smirks', '[#1:1]-[#7]'), ('epsilon', Quantity(value=0.01409081474669, unit=kilocalorie/mole)), ('id', 'n11'), ('rmin_half', Quantity(value=0.6192778454102, unit=angstrom))])
(5,) OrderedDict([('smirks', '[#1:1]-[#7]'), ('epsilon', Quantity(value=0.01409081474669, unit=kilocalorie/mole)), ('id', 'n11'), ('rmin_half', Quantity(value=0.6192778454102, unit=angstrom))])
(6,) OrderedDict([('smirks', '[#1:1]-[#6X4](-[#7,#8,#9,#16,#17,#35])-[#7,#8,#9,#16,#17,#35]'), ('epsilon', Quantity(value=0.0157, unit=kilocalorie/mole)), ('id', 'n4'), ('rmin_half', Quantity(value=1.287, unit=angstrom))])
(7,) OrderedDict([('smirks', '[#1:1]-[#6X4](-[#7,#8,#9,#16,#17,#35])-[#7,#8,#9,#16,#17,#35]'), ('epsilon', Quantity(value=0.0157, unit=kilocalorie/mole)), ('id', 'n4'), ('rmin_half', Quantity(value=1.287, unit=angstrom))])
It is clear that atom 0 is 'N'. The first bond is between atom 0 and atom 1 and the returned indices are (0,1), in the reversed order compared to the smirks definition (C-N, which should give (1,0) in the returned labels).
exactly according to the SMIRKS atom labeling, without doing any sorting
IIRC Molecule/Topology.chemical_environment_matches
does this, but the parameter handlers used in creating openmm systems and labelling molecules then store the indices in ValenceDict
for bonds, angles, proper torsions. (Improper torsions have an ImproperDict and LibraryCharges just use a normal (ordered) dict). When a ValenceDict sets or gets a key, it automatically transforms to sort the indices so the lowest index is first -- so the resulting parameter is always applied in a sorted/symmetric way.
That was an overly technical way to say that the implementation always sorts them, but I'm not sure if it's supposed to be a guaranteed feature in the spec.
Yeah, it looks like the current behavior of both label_molecules
and parameter assignment, for both 0.10.6 and 0.11.0, is to sort everything.
For label_molecules
I don't like that behavior but it's not clearly wrong.
For parameter assignment, I'm pretty sure that's wrong. I don't think this would have caused much trouble, but given that some of our parameters are (for reasons I don't fully understand) left/right "handed", this would have been sometimes assigning the wrong handedness to them.
To reproduce, I run with C=N, once with one atom ordering, and then again with the reversed atom ordering. If this was behaving correctly, I'd expect the atom tuples in the output to reverse, but they don't.
Repro code (for 0.11, switch the units package for 0.10):
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.topology import Molecule
from openff.units import unit
mol = Molecule.from_mapped_smiles('[H:1][C:2](=[N:3][H:4])[H:5]')
#print(mol.to_smiles(mapped=True))
ff = ForceField('openff-2.0.0.offxml')
pth = ff['ProperTorsions']
pth.add_parameter({'smirks':'[H:1][N:2]=[C:3][H:4]',
'k1':0*unit.kilocalorie_per_mole,
'phase1': 0*unit.degree,
'periodicity1': 1,
'idivf1': 1})
print(dict(ff.label_molecules(mol.to_topology())[0]['ProperTorsions']))
sys = ff.create_openmm_system(mol.to_topology())
tf = [f for f in sys.getForces() if 'orsion' in str(f)][0]
#print(dir(tf))
for i in range(tf.getNumTorsions()):
print(tf.getTorsionParameters(i))
yields (ignore the bottom 3, they're impropers)
{(0, 1, 2, 3): <ProperTorsionType with smirks: [H:1][N:2]=[C:3][H:4] periodicity1: 1 phase1: 0 degree k1: 0 kilocalorie_per_mole idivf1: 1.0 >, (3, 2, 1, 4): <ProperTorsionType with smirks: [H:1][N:2]=[C:3][H:4] periodicity1: 1 phase1: 0 degree k1: 0 kilocalorie_per_mole idivf1: 1.0 >}
[0, 1, 2, 3, 1, Quantity(value=0.0, unit=radian), Quantity(value=0.0, unit=kilojoule/mole)]
[3, 2, 1, 4, 1, Quantity(value=0.0, unit=radian), Quantity(value=0.0, unit=kilojoule/mole)]
[1, 0, 2, 4, 2, Quantity(value=3.141592653589793, unit=radian), Quantity(value=1.5341333333333333, unit=kilojoule/mole)]
[1, 2, 4, 0, 2, Quantity(value=3.141592653589793, unit=radian), Quantity(value=1.5341333333333333, unit=kilojoule/mole)]
[1, 4, 0, 2, 2, Quantity(value=3.141592653589793, unit=radian), Quantity(value=1.5341333333333333, unit=kilojoule/mole)]
Then, reversing the order of the input molecule:
mol = Molecule.from_mapped_smiles('[H:5][C:4](=[N:3][H:2])[H:1]')
print(dict(ff.label_molecules(mol.to_topology())[0]['ProperTorsions']))
sys = ff.create_openmm_system(mol.to_topology())
tf = [f for f in sys.getForces() if 'orsion' in str(f)][0]
#print(dir(tf))
for i in range(tf.getNumTorsions()):
print(tf.getTorsionParameters(i))
yields (ignore the bottom 3, they're impropers)
{(0, 3, 2, 1): <ProperTorsionType with smirks: [H:1][N:2]=[C:3][H:4] periodicity1: 1 phase1: 0 degree k1: 0 kilocalorie_per_mole idivf1: 1.0 >, (1, 2, 3, 4): <ProperTorsionType with smirks: [H:1][N:2]=[C:3][H:4] periodicity1: 1 phase1: 0 degree k1: 0 kilocalorie_per_mole idivf1: 1.0 >}
[0, 3, 2, 1, 1, Quantity(value=0.0, unit=radian), Quantity(value=0.0, unit=kilojoule/mole)]
[1, 2, 3, 4, 1, Quantity(value=0.0, unit=radian), Quantity(value=0.0, unit=kilojoule/mole)]
[3, 0, 2, 4, 2, Quantity(value=3.141592653589793, unit=radian), Quantity(value=1.5341333333333333, unit=kilojoule/mole)]
[3, 2, 4, 0, 2, Quantity(value=3.141592653589793, unit=radian), Quantity(value=1.5341333333333333, unit=kilojoule/mole)]
[3, 4, 0, 2, 2, Quantity(value=3.141592653589793, unit=radian), Quantity(value=1.5341333333333333, unit=kilojoule/mole)]
@lilyminium I think you're right - this is caused by using the ValenceDict key as the atom tuple for parameter assignment. We got burned by this in vsites so we explicitly use the atom tuple from the Match
item (though the vsite code is really complex so I can't easily point out how we do that). So the fix here may be straightforward, just need to find the time to implement+add tests.
For parameter assignment, I'm pretty sure that's wrong. I don't think this would have caused much trouble, but given that some of our parameters are (for reasons I don't fully understand) left/right "handed", this would have been sometimes assigning the wrong handedness to them.
So the fix here may be straightforward, just need to find the time to implement+add tests.
I agree that symmetric application of asymmetric parameters is wrong, but I'm not sure switching to asymmetric application is the answer here, or that a fix is actually needed to the toolkit. The odd dihedrals really stand out to me. Intuitively I would expect parameters to be applied symmetrically -- IMO a [C:1]=[N:2] bond should have the same parameter as [N:1]=[C:2]. Without a ValenceDict to ensure uniqueness you might actually see two bond parameters get applied for [C:1]=[N:2]
and [N:1]=[C:2]
. Wrt torsion energies, it also wouldn't surprise me if energies were usually calculated as the smaller (<180) angle between two planes -- if that's the case, it shouldn't matter which order the atoms are given, so long as they correctly define the planes. @wenyan4work do you notice differences in energies when you swap the atoms around? What are you using to calculate energies?
Repro code (for 0.11, switch the units package for 0.10):
For the lazy / constantly switching, bridging code can just import from toolkit:
from openff.toolkit.topology.molecule import Molecule, unit
@wenyan4work do you notice differences in energies when you swap the atoms around? What are you using to calculate energies?
@lilyminium We compared the result from openmm (using the template_generator) and a naively implemented torsion energy calculator using the result parsed by 'label_molecules()' function. We see discrepancies, and we are not sure which is correct, due to the strange behavior of those functions (atom indices being sorted).
Also, the more popular way to compute the torsion energy, I believe, is to use the vector representation given in https://onlinelibrary.wiley.com/doi/10.1002/jcc.540130508 to avoid singluar angle values, not just as the smaller (<180) angle between two planes.
Wrt torsion energies, it also wouldn't surprise me if energies were usually calculated as the smaller (<180) angle between two planes -- if that's the case, it shouldn't matter which order the atoms are given, so long as they correctly define the planes.
Ahhh, this is really interesting. But I think, for an odd function running from 0 to 180, this would lead to a discontinuous force at angle=0, right? I think a lot of FF stuff wants continuous forces, but maybe this has never had a significant enough magnitude to cause a crash.
Are we content with our understanding of why a few torsions have this seeming handedness? I am not, based on my understanding of the problem, not seeing a clear answer from others involved (past and present), and how counter-intuitive it seems based on the physics of force fields.
I'd like to open a stub issues in openff-forcefields
to track the question of "why are some of our torsions handed?" separate from "how should the toolkit process handed parameters?" - does that sound reasonable?
@mattwthompson thank you, I think it's a good idea to track this issue in the openff-forcefields repo.
In short, I wish to understand several separate issues:
label_molecules
. Should I check the smirks pattern again and then flip the order of returned atom indices when necessary to make them match?U = \sum_{i=1}^N k_i * (1 + cos(periodicity_i * phi - phase_i))
. Should phi be limited to 0-180, or in the whole range of 0-360? The latter seems to be a more widely used definition of phi.As the choice of these parameters does not fall under the scope of the toolkit's SMIRNOFF implementation, I have raised an issue elsewhere in hopes we can document these choices: https://github.com/openforcefield/openff-forcefields/issues/53
There's a lot going on here and I think it makes sense to move this into a Discussion. This is a better home for some of the questions here and doesn't prevent us from raising more targeted feature requests and bug reports back to the toolkit's issue tracker.
Dear OpenFF developers,
When we are using this wonderful package, we found there are three propertorsion parameters in the sage offxml file somewhat mysterious:
More specifically, they are:
These phase values transform the cos series from an even function to an odd function. Therefore, when a dihedral angle is calculated from the atom indices (ordered) as 1-2-3-4 or 4-3-2-1, we get different results.
The
label_molecule
function in openff toolkit seems to always return atom indices according to their indices in the Molecule data, therefore, the order of the returned 4 atom indices does not always match the order in the defined smirks pattern. For example, for t25, the smirks pattern defined the atom sequence as S-C-C-N, but in a molecule the N atom may appear before the S atom, so in the returned (sorted) atom indices, this group of torsion may be listed as N-C-C-S, and cause some discrepancies for the torsion energy calculations.Did I misunderstand the 'label_molecule' function or the role of the 'phase' parameter?
Thank you,