Closed callumjd closed 3 years ago
Thanks for writing in, @callumjd. I'm able to reproduce this on my computer, and will start looking into the source of the difference.
Still investigating, but one lead is that the complex.prmtop
file has far more SCEE
and SCNB_SCALE_FACTOR
entries, many of which are 1.00E+10
. IIRC, these values are the denominator for scaling 1-4 interactions, so having a humongous number there would explain why the magnitude of the 1-4 interactions is so reduced in the energies for that system.
If this is the case, then the next question is why these 1E+10s are being filled in there.
Whoa. That is very much not good.
We'll have to see whether this is a ParmEd, OpenMM, or openff-toolkit issue, how far this extends back, and whether it impacts any of the ongoing benchmark results.
Also, this seems like fairly important area for test coverage to be extended, with hindsight being 20/20.
That would also help track down where this happened if we need to use git-bisect
down the road.
The ligand that gets openff parameters has 25 atoms, which, since we treat each as a unique "atom type" for compatibility with atom-type-based formats, should add 25 atom types. So I'd expect the prmtop sections describing interactions between atom types (with a size proportional to n_atom_types^2
) to be quadratically larger.
The only places where ParmEd touches the value 1e10 is in this method, which handles cases where it may need to fill in 1-3 and 1-4 scaling values. 1e10 would be a reasonable value for a 1-3 scaling factor. So maybe the ParmEd-outputted system (complex.prmtop
) has added 1-3 scaling values?
So, there should be a difference of 25 atom types somewhere, and potentially also the addition of 1-3 scaling factors. In the files I showed in the previous post, there are 196 entries for the SCNB section in complex.prmtop
, and 5620 in system.prmtop
. I'm playing with some math to confirm that there's a difference of 25 atom types.
(I've also been tracking down a bug in Topology.from_openmm
where the atom order/positions may be scrambled in some cases (#1018). But I don't think that's the root cause here, since mis-ordered positions would show super high bond energies)
Need to run for a meeting, I'll try to get back to this today.
I worked with @mattwthompson on this for a while today, and we'll meet back up tomorrow. Here's a summary of what we've found:
The minimum reproducing example is:
import requests
repo_url = 'https://raw.githubusercontent.com/MobleyLab/benchmarksets/master/input_files/'
sources = {
'system.prmtop' : repo_url + 'BRD4/prmtop-coords/BRD4-1.prmtop',
'system.crd' : repo_url + 'BRD4/prmtop-coords/BRD4-1.crds',
'ligand.sdf' : repo_url + 'BRD4/sdf/ligand-1.sdf',
'ligand.pdb' : repo_url + 'BRD4/pdb/ligand-1.pdb'
}
for (filename, url) in sources.items():
r = requests.get(url)
open(filename, 'w').write(r.text)
#Read AMBER to ParmEd Structure object
import parmed
in_prmtop = 'system.prmtop'
in_crd = 'system.crd'
orig_structure = parmed.amber.AmberParm(in_prmtop, in_crd)
pieces = orig_structure.split()
from openff.toolkit.topology import Molecule, Topology
from simtk.openmm.app import PDBFile
ligand_off_molecule = Molecule('ligand.sdf')
ligand_pdbfile = PDBFile('ligand.pdb')
ligand_off_topology = Topology.from_openmm(ligand_pdbfile.topology,
unique_molecules=[ligand_off_molecule])
# Load the SMIRNOFF-format Parsley force field
from openff.toolkit.typing.engines.smirnoff import ForceField
force_field = ForceField('openff_unconstrained-1.0.0.offxml')
ligand_system = force_field.create_openmm_system(ligand_off_topology)
new_ligand_structure = parmed.openmm.load_topology(ligand_off_topology.to_openmm(),
ligand_system,
xyz=pieces[1][0].positions,
)
# Create a new, empty system
complex_structure = parmed.Structure()
# Add the protein
complex_structure += pieces[0][0]
# Add the ligand
complex_structure += new_ligand_structure
complex_structure.coordinates = orig_structure.coordinates[:len(complex_structure.atoms)]
complex_structure.box_vectors = orig_structure.box_vectors
complex_structure.save('protein_ligand.prmtop', overwrite=True)
complex_structure.save('protein_ligand.inpcrd', overwrite=True)
echo "parm protein_ligand.prmtop" > parmed.in
echo "loadCoordinates protein_ligand.inpcrd" >> parmed.in
echo "energy" >> parmed.in
echo "quit" >> parmed.in
parmed -i parmed.in
condense_atom_types=False
to the parmed.openmm.load_topology
call doesn't fix it.complex_structure.update_dihedral_exclusions()
before writing the system doesn't fix it.
# Create a new, empty system
complex_structure = parmed.Structure()
complex_structure += pieces[0][0]
complex_structure += new_ligand_structure
we run
```python
# Create a new, empty system
# complex_structure = parmed.Structure()
# Add the protein
complex_structure = pieces[0][0]
# Add the ligand
complex_structure += new_ligand_structure
then some characteristics of the original AMBER structure (like 1-4 scaling factors, I suspect) get inherited, and the final energies look a lot better:
Computing a single-point energy for complex.prmtop
Bond = 92.6749096 Angle = 1801.8589508
Dihedral = 1551.1422956 1-4 vdW = 683015.2642038
1-4 Elec = 5126.4418463 vdWaals = 11318.5631490
Elec. = -116419.1903971
TOTAL = 586486.7549580
This gets most terms reasonably close, except for a big discrepancy in 1-4 vdW. This is what I'll start working on tomorrow.
Ok, so two changes are making these numbers look more believable, though I'd like to do more rigorous validation next week:
parmed.Structure
, but instead start with a copy of pieces[0][0]
(the protein from the original input), which is type AmberParm
. This method seems to preserve the original 1-4 scalings instead of setting them to 1e10.complex_structure.update_dihedral_exclusions()
right before writing the prmtop
and inpcrd
filesSo, my current reproducing example, which now includes the ions and water, and gets pretty close to the intended energies, is:
import requests
repo_url = 'https://raw.githubusercontent.com/MobleyLab/benchmarksets/master/input_files/'
sources = {
'system.prmtop' : repo_url + 'BRD4/prmtop-coords/BRD4-1.prmtop',
'system.crd' : repo_url + 'BRD4/prmtop-coords/BRD4-1.crds',
'ligand.sdf' : repo_url + 'BRD4/sdf/ligand-1.sdf',
'ligand.pdb' : repo_url + 'BRD4/pdb/ligand-1.pdb'
}
for (filename, url) in sources.items():
r = requests.get(url)
open(filename, 'w').write(r.text)
#Read AMBER to ParmEd Structure object
import parmed
in_prmtop = 'system.prmtop'
in_crd = 'system.crd'
orig_structure = parmed.amber.AmberParm(in_prmtop, in_crd)
pieces = orig_structure.split()
from openff.toolkit.topology import Molecule, Topology
from simtk.openmm.app import PDBFile
ligand_off_molecule = Molecule('ligand.sdf')
ligand_pdbfile = PDBFile('ligand.pdb')
ligand_off_topology = Topology.from_openmm(ligand_pdbfile.topology,
unique_molecules=[ligand_off_molecule])
# Load the SMIRNOFF-format Parsley force field
from openff.toolkit.typing.engines.smirnoff import ForceField
force_field = ForceField('openff_unconstrained-1.0.0.offxml')
ligand_system = force_field.create_openmm_system(ligand_off_topology)
new_ligand_structure = parmed.openmm.load_topology(ligand_off_topology.to_openmm(),
ligand_system,
xyz=pieces[1][0].positions,
condense_atom_types=False
)
# Create a new, empty system
#complex_structure = parmed.Structure()
import copy
# Add the protein
complex_structure = copy.deepcopy(pieces[0][0])
# Add the ligand
complex_structure += new_ligand_structure
# Add ions
just_ion1_structure = parmed.Structure()
just_ion1_structure += pieces[2][0]
just_ion1_structure *= len(pieces[2][1])
just_ion2_structure = parmed.Structure()
just_ion2_structure += pieces[3][0]
just_ion2_structure *= len(pieces[3][1])
complex_structure += just_ion1_structure
complex_structure += just_ion2_structure
# Add waters
just_water_structure = parmed.Structure()
just_water_structure += pieces[4][0]
just_water_structure *= len(pieces[4][1])
complex_structure += just_water_structure
complex_structure.coordinates = orig_structure.coordinates
complex_structure.box_vectors = orig_structure.box_vectors
complex_structure.update_dihedral_exclusions()
complex_structure.save('whole_shebang.prmtop', overwrite=True)
complex_structure.save('whole_shebang.inpcrd', overwrite=True)
Checking with ParmEd looks pretty close to the original system energies too:
echo "parm whole_shebang.prmtop" > parmed.in
echo "loadCoordinates whole_shebang.inpcrd" >> parmed.in
echo "energy" >> parmed.in
echo "quit" >> parmed.in
parmed -i parmed.in
Computing a single-point energy for whole_shebang.prmtop
Bond = 92.6749096 Angle = 1801.8589508
Dihedral = 1551.1422956 1-4 vdW = 1222.8319144
1-4 Elec = 5575.5902675 vdWaals = 11318.5631490
Elec. = -116419.1903971
TOTAL = -94856.5289103
So, the things to pick up on next week are:
It's looking like this may come down to the implementation choices made in the ParmEd Structure +=
operator, which copies a bunch of terms, but has to make decisions about preserving 1-4 scale factors. Definitely worth raising an issue there so that there can be a more detailed discussion of what the behavior should be and whether there should be warnings issued in cases where it would do unexpected things.
Thanks for digging into this. For what it's worth, if starting from an empty Structure, you can convert either piece to type AmberParm:
complex_structure = parmed.Structure()
complex_structure += parmed.amber.AmberParm.from_structure(pieces[0][0])
By doing this I am getting correct agreement of energies of the saved prmtop.
Hm, @callumjd I'm getting "bad" values when I use the code in your last post. Could you double check?
Here's my reproducing code with your suggested change:
import requests
repo_url = 'https://raw.githubusercontent.com/MobleyLab/benchmarksets/master/input_files/'
sources = {
'system.prmtop' : repo_url + 'BRD4/prmtop-coords/BRD4-1.prmtop',
'system.crd' : repo_url + 'BRD4/prmtop-coords/BRD4-1.crds',
'ligand.sdf' : repo_url + 'BRD4/sdf/ligand-1.sdf',
'ligand.pdb' : repo_url + 'BRD4/pdb/ligand-1.pdb'
}
for (filename, url) in sources.items():
r = requests.get(url)
open(filename, 'w').write(r.text)
#Read AMBER to ParmEd Structure object
import parmed
in_prmtop = 'system.prmtop'
in_crd = 'system.crd'
orig_structure = parmed.amber.AmberParm(in_prmtop, in_crd)
pieces = orig_structure.split()
from openff.toolkit.topology import Molecule, Topology
from simtk.openmm.app import PDBFile
ligand_off_molecule = Molecule('ligand.sdf')
ligand_pdbfile = PDBFile('ligand.pdb')
ligand_off_topology = Topology.from_openmm(ligand_pdbfile.topology,
unique_molecules=[ligand_off_molecule])
# Load the SMIRNOFF-format Parsley force field
from openff.toolkit.typing.engines.smirnoff import ForceField
force_field = ForceField('openff_unconstrained-1.0.0.offxml')
ligand_system = force_field.create_openmm_system(ligand_off_topology)
new_ligand_structure = parmed.openmm.load_topology(ligand_off_topology.to_openmm(),
ligand_system,
xyz=pieces[1][0].positions,
condense_atom_types=False
)
# Create a new, empty system
complex_structure = parmed.Structure()
import copy
# Add the protein
complex_structure += parmed.amber.AmberParm.from_structure(pieces[0][0])
# Add the ligand
complex_structure += new_ligand_structure
# Add ions
just_ion1_structure = parmed.Structure()
just_ion1_structure += pieces[2][0]
just_ion1_structure *= len(pieces[2][1])
just_ion2_structure = parmed.Structure()
just_ion2_structure += pieces[3][0]
just_ion2_structure *= len(pieces[3][1])
complex_structure += just_ion1_structure
complex_structure += just_ion2_structure
# Add waters
just_water_structure = parmed.Structure()
just_water_structure += pieces[4][0]
just_water_structure *= len(pieces[4][1])
complex_structure += just_water_structure
complex_structure.coordinates = orig_structure.coordinates
complex_structure.box_vectors = orig_structure.box_vectors
complex_structure.update_dihedral_exclusions() # (This doesn't make a difference)
complex_structure.save('whole_shebang.prmtop', overwrite=True)
complex_structure.save('whole_shebang.inpcrd', overwrite=True)
echo "parm whole_shebang.prmtop" > parmed.in
echo "loadCoordinates whole_shebang.inpcrd" >> parmed.in
echo "energy" >> parmed.in
echo "quit" >> parmed.in
parmed -i parmed.in
outputs:
Bond = 92.6749096 Angle = 1801.8589508
Dihedral = 1551.1422956 1-4 vdW = 9.7745874
1-4 Elec = -69.1106003 vdWaals = 11318.5631490
Elec. = -116419.1903971
TOTAL = -101714.2871050
Hi @j-wags , apologies, I should've been more explicit. Using parmed to convert the Structure to AmberParm each time you append to the complex_structure works:
# Create a new, empty system
complex_structure = parmed.Structure()
import copy
# Add the protein
complex_structure += parmed.amber.AmberParm.from_structure(pieces[0][0])
# Add the ligand
complex_structure += parmed.amber.AmberParm.from_structure(new_ligand_structure)
# Add ions
just_ion1_structure = parmed.Structure()
just_ion1_structure += pieces[2][0]
just_ion1_structure *= len(pieces[2][1])
just_ion2_structure = parmed.Structure()
just_ion2_structure += pieces[3][0]
just_ion2_structure *= len(pieces[3][1])
complex_structure += parmed.amber.AmberParm.from_structure(just_ion1_structure)
complex_structure += parmed.amber.AmberParm.from_structure(just_ion2_structure)
# Add waters
just_water_structure = parmed.Structure()
just_water_structure += pieces[4][0]
just_water_structure *= len(pieces[4][1])
complex_structure += parmed.amber.AmberParm.from_structure(just_water_structure)
complex_structure.coordinates = orig_structure.coordinates
complex_structure.box_vectors = orig_structure.box_vectors
complex_structure.update_dihedral_exclusions() # (This doesn't make a difference)
complex_structure.save('whole_shebang.prmtop', overwrite=True)
complex_structure.save('whole_shebang.inpcrd', overwrite=True)
Gives:
Bond = 92.6749096 Angle = 1801.8589508
Dihedral = 1551.1422956 1-4 vdW = 1213.0573272
1-4 Elec = 5617.4905656 vdWaals = 11318.5631490
Elec. = -116434.4168318
TOTAL = -94839.6296341
Done!
Oh, right you are! Thanks for the clarification.
I've been working this over with @mattwthompson during my available time the last few weeks, and there are still some loose threads I'd to get to the end of, but I don't have time to do that before our next release. Since @callumjd's suggestion is above is a clear improvement (and very possibly totally correct), I'm going to go ahead and update the example to convert everything to AmberParm
s as they get added to complex_structure
I'm attaching my debugging notebook here, since it's the most complete breadcrumb trail for picking this search up again later.
@j-wags in the resulting "whole_shebang.prmtop" there is a mix of SCNB=1.0 and SCNB=2.0 scaling factors (also mix of SCEE=1.2 and SCEE=1.0). Should we be forcing everything to have the "amber" scaling factors like this? Or does OpenFF ligand require SCNB=1.0 and SCEE=1.0?
# FIXME: ParmEd does not assign scee and scnb to the dihedrals
# If this block is commented out, energies won't match between
# ParmEd and tleap -generated PRMTOPS! I guess this is a bug in
# ParmEd because the generated FRCMOD files are incorrect.
if with_fixes:
for dihedral in structure.dihedral_types:
dihedral.scee = 1.2
dihedral.scnb = 2.0
taken from here: https://github.com/openforcefield/openff-toolkit/issues/440
Yeah, that's why I'm leaving this issue open... I'm fairly certain that the changes in #1033 make the example "less wrong" so I'd like those to be packaged in our next release, but I'm not convinced that these numbers are "less wrong" enough to be "right".
So with the SCEE and SCNB scaling factors, I know the following:
So my thinking is that ParmEd reads an OpenMM system and has to pick apart which dihedrals are propers vs. impropers. But to be safe, it stores them all. Then, when it writes out to the prmtop, the impropers wind up next to the propers in the "dihedrals" list (which gives them entries in SCEE and SCNB as well). But the improper scaling factors should be special, since the first and last atoms in an improper aren't 1-4 to each other, they're 1-3.
BUT, if they were 1-3, then I'd expect their scaling factor to be set in such a way as to zero out the final energy, so the scaling factor should be 1e10
or something. However the "weird" scaling factors are printed as 1.0
. BUT, if the SCEE=1.0 scaling factors were really being used, I'd expect really large ligand electrostatics and vdW energies, which I don't see in the Just ligand as AmberParm
energy evaluation from the notebook. So my running hypothesis is that sander is ignoring the 1.0 scaling factors and zeroing out the 1-3 interaction energies, which would yield the "correct" behavior that we want.
The other possibility is that some of the ligand propers are, for some reason, getting 1-4 scaling factors of 1.0, and that's what I'd like to look into next.
Basically, what we do know is that
(protein+AmberParm(ligand) system's 1-4 energy components as evaluated by sander)
\=
(protein system's 1-4 energy components as evaluated by sander)
+
(AmberParm(ligand) system's 1-4 energy components as evaluated by sander)
.
This is reassuring, but it doesn't rule out that there's some mangling happening in the AmberParm(ligand)
construction or the as evaluated be sander
part.
I think the most concise way to check the remaining uncertainties is to see whether
(AmberParm(ligand) system's 1-4 energy components as evaluated by sander)
\=
(ligand system's 1-4 energy components as evaluated by OpenMM)
which I'll start looking into after the 0.10.0 release (since it's blocking a lot of work internally)
Alright, I've validated that the ligand's energy as evaluated by OpenMM matches the ligand's energy as evaluated by sander when I set the nonbonded methods to be PME (and that each individual component agrees). So I think we can say the the ligand's energy in the combined systems made by this workflow really are correct.
# Create a new, empty system
complex_structure = parmed.Structure()
# Add the ligand
complex_structure += parmed.amber.AmberParm.from_structure(new_ligand_structure)
# Add the protein
#complex_structure += pieces[0][0]
import numpy as np
n_protein_atoms = len(pieces[0][0].atoms)
n_ligand_atoms = len(new_ligand_structure.atoms)
complex_structure.coordinates = orig_structure.coordinates[n_protein_atoms:
n_protein_atoms + n_ligand_atoms]
complex_structure.box_vectors = orig_structure.box_vectors
complex_structure.save('ligand_as_amberparm.prmtop', overwrite=True)
complex_structure.save('ligand_as_amberparm.inpcrd', overwrite=True)
!echo "parm ligand_as_amberparm.prmtop" > parmed.in
!echo "loadCoordinates ligand_as_amberparm.inpcrd" >> parmed.in
!echo "energy Ewald nodisper" >> parmed.in
!echo "quit" >> parmed.in
!parmed -i parmed.in
yields
...
Computing a single-point energy for ligand_as_amberparm.prmtop
Bond = 10.0616164 Angle = 148.3914518
Dihedral = 4.2740128 1-4 vdW = 9.7745872
1-4 Elec = -41.0055802 vdWaals = -2.1825966
Elec. = 2.1506644
TOTAL = 131.4641557
Done!
import copy
# Turn off electrostatics 1-4s and print difference
modified_ligand_system = copy.deepcopy(ligand_system)
nbf = [force for force in modified_ligand_system.getForces() if type(force) is openmm.NonbondedForce][0]
for exc_idx in range(nbf.getNumExceptions()):
p1, p2, q, r, d = nbf.getExceptionParameters(exc_idx)
nbf.setExceptionParameters(exc_idx, p1, p2, 0, r, d)
results = compare_system_energies(ligand_system,
modified_ligand_system,
pieces[1][0].positions,
rtol=100, atol=100)
# print valence terms on the first go-through
print('bonds', results[0]['HarmonicBondForce'].in_units_of(unit.kilocalorie/unit.mole))
print('angles', results[0]['HarmonicAngleForce'].in_units_of(unit.kilocalorie/unit.mole))
print('torsions', results[0]['PeriodicTorsionForce'].in_units_of(unit.kilocalorie/unit.mole))
print('electrostatics 1-4', (results[0]['NonbondedForce']-results[1]['NonbondedForce']).in_units_of(unit.kilocalorie/unit.mole))
# Turn off vdw 1-4s and print difference
modified_ligand_system = copy.deepcopy(ligand_system)
nbf = [force for force in modified_ligand_system.getForces() if type(force) is openmm.NonbondedForce][0]
for exc_idx in range(nbf.getNumExceptions()):
p1, p2, q, r, d = nbf.getExceptionParameters(exc_idx)
nbf.setExceptionParameters(exc_idx, p1, p2, q, r, 0)
results = compare_system_energies(ligand_system,
modified_ligand_system,
pieces[1][0].positions,
rtol=100, atol=100)
print('vdw 1-4', (results[0]['NonbondedForce']-results[1]['NonbondedForce']).in_units_of(unit.kilocalorie/unit.mole))
# Turn off electrostatics and print difference
modified_ligand_system = copy.deepcopy(ligand_system)
#print(dir(modified_ligand_system))
nbf = [force for force in modified_ligand_system.getForces() if type(force) is openmm.NonbondedForce][0]
#print([i for i in dir(nbf) if 'xce' in i])
for p_idx in range(nbf.getNumParticles()):
q, r, d = nbf.getParticleParameters(p_idx)
nbf.setParticleParameters(p_idx, 0, r, d)
results = compare_system_energies(ligand_system,
modified_ligand_system,
pieces[1][0].positions,
rtol=100, atol=100)
print('electrostatics', (results[0]['NonbondedForce']-results[1]['NonbondedForce']).in_units_of(unit.kilocalorie/unit.mole))
# Turn off vdW and print difference
modified_ligand_system = copy.deepcopy(ligand_system)
nbf = [force for force in modified_ligand_system.getForces() if type(force) is openmm.NonbondedForce][0]
for p_idx in range(nbf.getNumParticles()):
q, r, d = nbf.getParticleParameters(p_idx)
nbf.setParticleParameters(p_idx, q, r, 0)
results = compare_system_energies(ligand_system,
modified_ligand_system,
pieces[1][0].positions,
rtol=100, atol=100)
print('vdw', (results[0]['NonbondedForce']-results[1]['NonbondedForce']).in_units_of(unit.kilocalorie/unit.mole))
yields
bonds 10.061340623786299 kcal/mol
angles 148.3913151757439 kcal/mol
torsions 4.274002680587039 kcal/mol
electrostatics 1-4 -41.00698084493896 kcal/mol
vdw 1-4 9.774545410623057 kcal/mol
electrostatics 2.1530100310053926 kcal/mol
vdw -2.183389025482119 kcal/mol
So, I think that the new pathway does successfully send the ligand parameters through ParmEd, and the previous posts show that the protein energy is preserved, so the combined system is likely well-behaved. The small differences in nonbonded components are probably due to differences in PME internals. If I had more patience, I could
but the above comparison shows that there is no significant difference in the outcome.
My now-much-longer scratch notebook for this issue, which have useful breadcrumbs for future ParmEd validation. (Woe be unto ye if you actually need to use this!) swap_existing_ligand_parameters_scratch.ipynb.gz
Any objections to closing this issue?
No objections from me - I think the energy differences observed here (0.003 kcal/mol energy differences resulting from PME and other errors decades smaller) are well within an acceptable error tolerance.
Looks great, thanks for the comprehensive look @j-wags . My simulations in amber using openFF are no longer blowing up :)
I have followed the "Replacing ligand parameters in an already-parametrized system" procedure exactly, I am seeing a large difference between 1-4 elec and 1-4 vdw after applying openFF ligand parameters. Is this expected / known issue?
Running parmed:
Before parm BRD4-1.prmtop loadCoordinates BRD4-1.crds energy quit
gives: Bond = 93.2778504 Angle = 1678.5817639 Dihedral = 1550.2065868 1-4 vdW = 1213.0573270 1-4 Elec = 5617.5794490 vdWaals = 11318.5631489 Elec. = -116434.4232931 TOTAL = -94963.1571672
After parm complex.prmtop loadCoordinates complex.inpcrd energy quit
gives: Bond = 92.6749096 Angle = 1801.8589508 Dihedral = 1551.1422956 1-4 vdW = 9.7745874 1-4 Elec = -41.0324225 vdWaals = 11318.5631490 Elec. = -116434.4168318 TOTAL = -101701.4353620
https://github.com/openforcefield/openff-toolkit/blob/master/examples/swap_amber_parameters/swap_existing_ligand_parameters.ipynb