Closed AdeleHardie closed 3 years ago
Thanks, I'll take a look at this tomorrow. I think the second part is an unrelated issue. With our MD drivers we copy coordinates from the restart trajectory files back into the original system so that we preserve the original topology, e.g. the atom naming / numbering convention. It sounds like something has gone wrong at this step for your particular system. When you directly read the files from the working directory you are creating a brand new system, so there are no consistency checks. (What is in the files is what you get.) This is why you are able to continue.
I think the problem might relate to the recent changes to correctly handle the water topology naming conventions required for AMBER and GROMACS. Before running a simulation we convert to the expected format for the engine. I imagine that the atoms have been re-ordered somehow, such that the topology from the restart/trajectory files no longer matches that of the original system.
At a quick glance there are several differences between the POINTERS
flags in the two topology files:
_systembash.prm7
%FLAG POINTERS
%FORMAT(10I8)
35945 18 33545 2361 5225 3181 10531 10049 0 0
67175 10793 2361 3181 10049 70 162 216 38 1
0 0 0 0 0 0 0 1 24 0
0
_systemBSS.prm7
%FORMAT(10I8)
35945 17 33545 2361 5225 3181 9906 8439 0 0
67175 10793 2361 3181 8439 37 44 228 38 0
0 0 0 0 0 0 0 1 24 0
0 0 0
I'll try to figure out the reason for the difference and if this is what's causing the problem.
I've just checked that this issue hasn't been caused by the recent updates to the Sire.IO.AmberPrm
parser to correctly handle the NATYP
record (see here.) Rolling back to that version gives:
%FLAG POINTERS
%FORMAT(10I8)
35945 17 33545 2361 5225 3181 9906 8439 0 0
67175 10793 2361 3181 8439 37 44 228 0 0
0 0 0 0 0 0 0 1 24 0
0 0 0
(The only difference is the NATYP
record.)
Okay, I had more time that I thought and think I've fixed the second issue. The problem was that, due to a limitation in Sire, I need to remove then re-add water molecules when swapping the topology between AMBER and GROMACS format. This means that, after the old molecules were deleted, the new ones were added after any remaining molecules in the system, rather than in the same position as the old ones. (I thought I'd already handled this, but obviously not.) This isn't an issue when you have protein / ligand and then waters, but is when you have protein / ligand then waters and ions, since the new waters are re-added after the ions rather than before. I've now fixed this by preserving the molecular ordering.
The pointers that are different are:
NTYPES : total number of distinct atom types (18 vs 17)
NPHIH : number of dihedrals containing hydrogen (10531 vs 9906)
MPHIA : number of dihedrals not containing hydrogen (10049 vs 8439)
NPHIA : MPHIA + number of constraint dihedrals (10049 vs 8439)
NUMBND : number of unique bond types (70 vs 37)
NUMANG : number of unique angle type (162 vs 44)
NPTRA : number of unique dihedral types (216 vs 228)
NPHB : number of distinct 10-12 hydrogen bond pair types (1 vs 0)
I'll transfer this issue over to the Sire repository as it's clearly an issue with the Sire.IO.AmberPrm
parser, rather than BioSimSpace itself.
For reference, ParmEd
preserves the original topology on write:
import parmed as pd
ps = pd.load_file("system_bash.prm7", xyz="system_bash.rst7")
ps.save("system_parmed.prmtop")
_systemparmed.prmtop:
%FLAG POINTERS
%FORMAT(10I8)
35945 18 33545 2361 5225 3181 10531 10049 0 0
67175 10793 2361 3181 10049 70 162 216 38 1
0 0 0 0 0 0 0 1 24 0
0
I'm pretty sure that the issue is occurring on read, since the parameters
properties for each molecule loaded from system_bash.*
and system_BSS.*
appear to be identical. A basic single-point energy comparison of the two systems (within Sire) also gives identical results.
I have a small lead for the mismatched NPHB
pointer. On read this is correctly flagged as 1 but on write it is 0.
import BioSimSpace as BSS
# I've added a print statement into the parser to print the pointer.
s = BSS.IO.readMolecules(BSS.IO.glob("system_bash.*"))
NPHB = 1
BSS.IO.saveMolecules("test", s, "prm7")
NPHB = 0
Looking at the conditional here, I'm not even sure if we support 10-12 hydrogen bond parameters? Perhaps this is the problem, i.e. we are not catching this properly so we are then making some incorrect assumptions for other records?
...
//amber stores the A and B coefficients as the product of all
//possible combinations. We need to find the values from the
// LJ_i * LJ_i values
int idx = nb_parm_index[ ntypes * i + i ];
if (idx < 0)
{
//this is a 10-12 parameter
throw SireError::unsupported( QObject::tr(
"Sire does not yet support Amber Parm files that "
"use 10-12 HBond parameters."), CODELOC );
}
...
@chryswoods: Do you have any thoughts on this?
I'm not yet sure about the issue with the other pointers, but at least they so far seem to be consistent on read/write for the ones that I've tested.
Yes, this looks like the issue. Here is the NONBONDED_PARM_INDEX
from the system_bash.prm7
file:
%FLAG NONBONDED_PARM_INDEX
%FORMAT(10I8)
1 2 4 7 11 16 22 29 37 46
56 67 79 92 106 121 137 154 2 3
5 8 12 17 23 30 38 47 57 68
80 93 107 122 138 155 4 5 6 9
13 18 24 31 39 48 58 69 81 94
108 123 139 156 7 8 9 10 14 19
25 32 40 49 59 70 82 95 109 124
140 157 11 12 13 14 15 20 26 33
41 50 60 71 83 96 110 125 141 158
16 17 18 19 20 21 27 34 42 51
61 72 84 97 111 126 142 159 22 23
24 25 26 27 28 35 43 52 62 73
85 98 112 127 143 160 29 30 31 32
33 34 35 36 44 53 63 74 86 99
113 128 144 161 37 38 39 40 41 42
43 44 45 54 64 75 87 100 114 129
145 162 46 47 48 49 50 51 52 53
54 55 65 76 88 101 115 130 146 163
56 57 58 59 60 61 62 63 64 65
66 77 89 102 116 131 147 164 67 68
69 70 71 72 73 74 75 76 77 78
90 103 117 132 148 165 79 80 81 82
83 84 85 86 87 88 89 90 91 104
118 133 149 166 92 93 94 95 96 97
98 99 100 101 102 103 104 105 119 134
150 167 106 107 108 109 110 111 112 113
114 115 116 117 118 119 120 135 151 168
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 152 169 137 138
139 140 141 142 143 144 145 146 147 148
149 150 151 152 153 -1 154 155 156 157
158 159 160 161 162 163 164 165 166 167
168 169 -1 171
Note that two entries (including the second to last) are -1
, hence the conditional above should have been triggered and an exception thrown. I'll try to figure out why this isn't the case.
These are not being read because of the incorrect NTYPES
pointer, i.e. 17 rather than 18. The loop is over NTYPES
x NTYPES
terms, i.e. 18 x 18 = 324. However, the code is actually looping over 17 x 17 = 289 records, so is missing both of the -1 entries, which are in the last 19 places.
Now to work out why NTYPES
is incorrect.
This probably explains why the other pointers / flags are incorrect too, since the are also built from looping over the incorrect number of terms. The NTYPES
record is actually built from looping over the parameters that were read and working out total number of distinct LJ parameters, i.e.:
for (int i=0; i<params.count(); ++i)
{
const auto info = params.constData()[i].info();
const auto ljs = params.constData()[i].ljs();
QVector<qint64> mol_atom_types(info.nAtoms());
for (int j=0; j<info.nAtoms(); ++j)
{
const LJParameter lj = ljs[ info.cgAtomIdx(AtomIdx(j)) ];
int idx = ljparams.indexOf(lj);
if (idx == -1)
{
ljparams.append(lj);
mol_atom_types[j] = ljparams.count();
}
else
{
mol_atom_types[j] = idx + 1;
}
}
atom_types[i] = mol_atom_types;
}
// We now have all of the atom types - create the acoeff and bcoeff arrays.
const int ntypes = ljparams.count();
I assume one term (possibly for those involving 10-12 hydrogen bonds) is being processed incorrectly, which is leading the the miscount.
For simplicity I guess we could just raise an exception whenever NPHB
is non-zero. We should also probably add in checks that the number of records related to certain FLAG
entries matches the expected NTYPES
x NTYPES
. If not, then we know that we haven't correctly identified the unique types in the system.
Yes, that sounds right. Sire does not support 10-12 terms as these (I thought) were an old relic of older force fields and no longer widely used. They would require a lot of work to support (a whole duplicate of 10-12 versions of the LJ terms and forcefields) and so I didn't think it was worth it.
I am surprised that the exception was not raised on read. It would be worth finding out why 10-12 terms are in this file, particularly as they would break some of our MD drivers (e.g. somd) and are not supported by our gromacs writer either (I think).
@jmichel80 do you think 10-12 terms are needed or are making a comeback? Is it worth putting in the work to support them?
Yes, I thought that they were only associated with some of the older force fields too. Thanks for the info regarding the lack of support for the other MD drivers.
I'll try to work out why the NTYPES
pointer is being inferred incorrectly. (We read it, but then re-construct everything from the molecular topology generated from the other records.)
If we don't want to support 10-12 terms it would be easy enough to raise an exception whenever NPHB
(pointers[19]
in the code) is non-zero. This is read before any of the flags are processed so would be caught early.
It think it would also be good to add some self-consistency checks to the code too, since this would help us catch situations where we are misreading (or re-constructing) the data in the topology file.
Actually, there are some self-consistency checks, e.g. here:
const int ntypes = pointers[1]; //number of distinct atom types
if (ntypes <= 0)
return;
const int nphb = pointers[19]; //number of distinct 10-12 hydrogen bond pair types
lj_data = QVector<LJParameter>(ntypes);
auto lj_data_array = lj_data.data();
auto acoeffs = float_data.value("LENNARD_JONES_ACOEF");
auto bcoeffs = float_data.value("LENNARD_JONES_BCOEF");
auto hbond_acoeffs = float_data.value("HBOND_ACOEF");
auto hbond_bcoeffs = float_data.value("HBOND_BCOEF");
auto nb_parm_index = int_data.value("NONBONDED_PARM_INDEX");
qDebug() << "NONBONDED_PARM_INDEX:" << nb_parm_index.count();
qDebug() << "NTYPES:" << ntypes;
if (acoeffs.count() != bcoeffs.count() or
acoeffs.count() != (ntypes*(ntypes+1))/2)
{
throw SireIO::parse_error( QObject::tr(
"Incorrect number of LJ coefficients for the number of specified "
"atom types! Should be %1 for %2 types, but actually have "
"%3 LJ A-coefficients, and %4 LJ B-coefficients")
.arg((ntypes*(ntypes+1))/2)
.arg(ntypes)
.arg(acoeffs.count())
.arg(bcoeffs.count()), CODELOC );
}
if (nb_parm_index.count() != ntypes*ntypes)
{
throw SireIO::parse_error( QObject::tr(
"Incorrect number of non-bonded parameter indicies. There should "
"be %1 indicies for %2 types, but actually have %3.")
.arg(ntypes*ntypes)
.arg(ntypes)
.arg(nb_parm_index.count()), CODELOC );
}
if (hbond_acoeffs.count() != nphb or
hbond_bcoeffs.count() != nphb)
{
throw SireIO::parse_error( QObject::tr(
"Incorrect number of HBond parameters. There should be "
"%1 such parameters, but the number of HBond A coefficients is "
"%2, and the number of B coefficients is %3.")
.arg(nphb)
.arg(hbond_acoeffs.count())
.arg(hbond_bcoeffs.count()), CODELOC );
}
On read, this prints:
NONBONDED_PARM_INDEX: 324
NTYPES: 18
This means that it is looping over the correct number of entries, i.e. it is using the pointers value rather than that inferred from the unique LJ parameters. I'll figure out why it isn't seeing the -1 entries (presumably the loop is wrong) since this would trigger the exception.
Looking again at the code the loop is only over diagonal elements of the matrix, so we're missing the -1 terms. (As such, I'm not sure if we'd ever detect the presence of 10-12 terms.) This code from ParmEd has routines for looping over all interactions, detecting non-zero 10-12 terms, etc.
If we don't care about the off-diagonal terms then we could just add an additional loop inside build_lj that loops over them and raises the exception if any index is -1.
Adding something like the following as a check:
// The build_lj function above only considers diagonal elements of the
// NONBONDED_PARM_INDEX matrix. Here we loop over the off-diagonal elements
// to check for 10-12 parameters, which are currently unsupported.
// The matrix is symmetric, so perform a triangular loop over off-diagonal
// elements.
for (int i=0; i<ntypes; ++i)
{
for (int j=i+1; j<ntypes; ++j)
{
int idx = nb_parm_index[ ntypes * i + j ];
if (idx < 0)
{
auto a = hbond_acoeffs[idx];
auto b = hbond_bcoeffs[idx];
if ((a > 1e-6) and (b > 1e-6))
{
//this is a 10-12 parameter
throw SireError::unsupported( QObject::tr(
"Sire does not yet support Amber Parm files that "
"use 10-12 HBond parameters."), CODELOC );
}
}
}
}
However, looking at the system_bash.prm7
file, the values of the coefficients are:
%FLAG HBOND_ACOEF
%FORMAT(5E16.8)
0.00000000E+00
%FLAG HBOND_BCOEF
%FORMAT(5E16.8)
0.00000000E+00
Since they are zero, there aren't any non-zero 10-12 terms for this system, so we should be okay. (SireUnitTests contains several top files with -1 entries in the NONBONDED_PARM_INDEX
flags, but also with zero terms for the coefficients.)
Will try to work out why the other pointers are incorrect.
Hmm, I've realised the writing out with ParmEd was simply saving the original file to back to disk. Going via an OpenMM system loses the NPHB
pointer for this system, so it's clearly redundant. (For this system, at least.) Sorry about that.
import parmed as pd
ps0 = pd.load_file("system_bash.prm7", xyz="system_bash.rst7")
omm_system = ps0.createSystem()
ps1 = pd.openmm.load_topology(ps0.topology, omm_system)
ps1.save("system_parmed.prmtop")
Here are the pointers for the original system and the BSS and ParmEd conversions:
_systembash.prm7
%FLAG POINTERS
%FORMAT(10I8)
35945 18 33545 2361 5225 3181 10531 10049 0 0
67175 10793 2361 3181 10049 70 162 216 38 1
0 0 0 0 0 0 0 1 24 0
0
_systemBSS.prm7
%FORMAT(10I8)
35945 17 33545 2361 5225 3181 9906 8439 0 0
67175 10793 2361 3181 8439 37 44 228 38 0
0 0 0 0 0 0 0 1 24 0
0 0 0
_systemparmed.prmtop:
%FLAG POINTERS
%FORMAT(10I8)
35945 17 33545 2361 5225 3181 10531 10049 0 0
67175 10793 2361 3181 10049 37 44 12269 1 0
0 0 0 0 0 0 0 0 24 0
0
Note that the reconstructed ParmEd topology disagrees in the NATYP
pointer (it is 1, when it should be 38). It also claims that there are 12269 unique dihedral types, rather than 216. However, it now agrees with Sire that there are 17 rather than 18 unique types, (NTYPES
), 37 rather than 70 unique bonds (NUMBND
) and 44 rather than 162 unique angles (NUMANG
).
It would be interesting to know whether the ParmEd file blows up in simulation too, which might point to there being something funky with the original system. (At least in the sense of it being a hard system to parse correctly.) The fact that, apart from the dihedrals, it agrees more closely with Sire after conversion makes me think that something is wrong, or that there is redundant information in the original topology.
I'll keep digging on the other terms.
On read the dihedral terms are correct, in the sense that counting the total number of dihedrals plus impropers with and without hydrogen gives the correct values of 10531 and 10049 respectively, i.e. 10027 + 504 and 9620 + 429. However, the values that are produced on write, 9906 and 8439, aren't simply the value of just the dihedral terms with and without hydrogen, i.e. missing the impropers, which would be 10027 and 9620.
Something that I should have tried earlier is a single-point energy comparison between the the three topology files using sander
, rather than just doing this internally within Sire. Here I'm performing a default, single-step, minimisation using BioSimSpace and looking at the energy log file amber.nrg
. I've just replaced amber.prm7
in the working directory with the appropriate topology file, and re-run the minimisation. (Importantly this is copied across by hand, not written by BioSimSpace.)
Here are the results:
sander -O -i amber.cfg -p amber.prm7 -c amber.rst7 -o stdout -r amber.crd -inf amber.nrg
_systembash.prm7:
NSTEP ENERGY RMS GMAX NAME NUMBER
1 -1.1623E+05 1.4600E+01 1.0911E+02 C 3537
BOND = 867.4181 ANGLE = 2384.8699 DIHED = 3746.3088
VDWAALS = 12410.6216 EEL = -148008.4340 HBOND = 0.0000
1-4 VDW = 1063.6145 1-4 EEL = 11302.3846 RESTRAINT = 0.0000
_systemBSS.prm7:
NSTEP ENERGY RMS GMAX NAME NUMBER
1 -1.1623E+05 1.4600E+01 1.0911E+02 C 3537
BOND = 867.4180 ANGLE = 2384.8698 DIHED = 3746.3089
VDWAALS = 12410.6217 EEL = -148008.4341 HBOND = 0.0000
1-4 VDW = 1063.6145 1-4 EEL = 11302.3846 RESTRAINT = 0.0000
_systemparmed.prm7:
NSTEP ENERGY RMS GMAX NAME NUMBER
1 -1.1623E+05 1.4600E+01 1.0911E+02 C 3537
BOND = 867.4180 ANGLE = 2384.8698 DIHED = 3746.3089
VDWAALS = 12410.6217 EEL = -148008.4341 HBOND = 0.0000
1-4 VDW = 1063.6145 1-4 EEL = 11302.3846 RESTRAINT = 0.0000
As you can see, despite the differences in the topology files, the energies are in near perfect agreement. BioSimSpace and ParmEd agree exactly, whereas the original system_bash.prm7
file differs only in the fourth decimal place for terms in the first two lines.
Differences in topology file records are hard to interpret without looking at single-point energy tests such as those above, since Sire performs a lot of clever de-duplication of the records, such as for dihedral terms. This would explain why the pointers count for those records is lower, yet the energy is the same. (There can be a lot of redundancy in a topology file.)
Given the above, I'm at a loss to explain why you are seeing such markedly different results using the BioSimSpace topology. Is the behaviour repeatable across multiple runs? What happens if you use the ParmEd topology file? If only the BioSimSpace topology reproducibly gives weird results, then the only thing that I can think of is that there are other records in the topology file that are being lost on write that are causing incompatibility with some of the default options that we use for our other protocols, e.g. BioSimSpace.Protocol.Production
.
I just realised that I had copied across the binary system_bash.rst7
file to use as the coordinates file amber.rst7
. If you use the Sire.IO.AmberPrm7
text file written by BioSimSpace (which you said hasn't caused issues), then the energies computed using all three topology are identical, i.e. the differences in the fourth decimal places disappear.
I have tried a lot of different bash+copy/BSS combinations to arrive at the conclusion it's something to do with topology, so this issue has happened for multiple runs.
I also used BSS for production runs for older systems I had setup with ff19SB. I know now BSS doesn't support ff19SB, but loading the topology into AMBER's cpptraj and saving it again seemed to remove the CMAP parameters and the system could then be loaded into BSS (which isn't entirely right but it worked for the time being). Those runs were alright for the 100-200 ns timescale using all the files created by BSS in terms of protein stability. However, the system did show some behaviour that was different (mainly some strange loop dynamics). This doesn't happen now that I copy over files.
All this makes me wonder if there is just something with the system itself that is just really sensitive to some small change, since I assume noone else has complained about protein unfolding so far...
I can do some more tests with ParmEd once I one of my GPUs frees up tomorrow.
If the CMAP terms were omitted surely the dihedrals would be completely wrong, leading to erratic behavior, especially on a longer MD timescale ?
Thanks for the update. I'd certainly be interested in the ParmEd comparison if you get a chance.
Just to check:
I also used BSS for production runs for older systems I had setup with ff19SB. I know now BSS doesn't support ff19SB, but loading the topology into AMBER's cpptraj and saving it again seemed to remove the CMAP parameters and the system could then be loaded into BSS (which isn't entirely right but it worked for the time being). Those runs were alright for the 100-200 ns timescale using all the files created by BSS in terms of protein stability. However, the system did show some behaviour that was different (mainly some strange loop dynamics). This doesn't happen now that I copy over files.
Is the comparison here between the original ff19SB files with CMAP records and those generated by BioSimSpace after the loading and saving the CCPTRAJ files that stripped those records, or between the CCPTRAJ files themselves and those written by BioSimSpace? If it was the former, then I imagine that the CMAP terms in the original files are stabilising backbone fluctuations that you are seeing with BioSimSpace. If it's the latter, then it suggests that something is going wrong with the BioSimSpace generated topology. (It might be good to check single-point energies for these files too.)
yes, I do think remoivng the CMAP parameters must've messed with the system somehow. However that happened everytime I'd load the ff19SB system into cpptraj for processing, it would just say those are ignored, loading then immediately saving the topology without doing anything else to it removed them, and the simulations ran (kind of?) okay so it didn't alarm me as much as it should have.
The comparison is between (1) original ff19SB ran by hand, (2) ff19SB with CMAP automatically removed by cpptraj and ran in BSS and now also (3) ff14SB ran in BSS but while copying files. The behaviour I see in (1) and (3) is the same, while (2) is different but as Julien said it might be due to the whole system being a mess without CMAP. I can go back to those systems with ff19SB and run a short test simulation by hand to see what happens with cpptraj generated topologies without CMAP ran by hand.
Okay, that explains the difference then. Only the system_bash.prm7
file doesn't include any CMAP records?
Looking at the CCPTRAJ GitHub page it looks like support for CMAP records was only added in September of last year, so it might not be in the most recent AmberTools. I know that ParmEd supports the records, although I'm not sure if this is just read/write to the same format, or on conversion too.
Yes, so the process of getting an ff19SB system into BSS was:
.rst7
filesThe system_bash.prm7
file (and all files I've uploaded here) is different, however. It has been prepared with ff14SB to stay consistent with what BSS supports, since I made all those files while trying to identify what was going wrong.
Thanks for the clarification. As @jmichel80 says, I would certainly expect the systems without CMAP terms (for a forcefield that uses them) to be unstable on long time scales if those terms were removed. As for the ff14SB non-CMAP system, given that the single-point energies agree, I can't think what could have gone wrong.
I had another look at the files and noticed that there is a BOX_DIMENSIONS
record in the prm7
files. This differs between system_bash.prm7
and system_BSS.prm7
:
_systembash.prm7:
%FLAG BOX_DIMENSIONS
%FORMAT(5E16.8)
9.00000000E+01 9.22133330E+01 7.39995300E+01 6.71360020E+01
_systemBSS.prm7:
%FLAG BOX_DIMENSIONS
%FORMAT(5E16.8)
9.00000000E+01 8.49602111E+01 6.81790310E+01 6.18553599E+01
However, there must also be box information in the *.rst7
files, since when I load them the space
property of the systems are identical:
import BioSimSpace as BSS
s_bash = BSS.IO.readMolecules(BSS.IO.glob("system_bash.*"))
s_bash._sire_object.property("space")
PeriodicBox( ( 84.9602, 68.179, 61.8554 ) )
s_bss = BSS.IO.readMolecules(BSS.IO.glob("system_BSS.*"))
s_bss._sire_object.property("space")
PeriodicBox( ( 84.9602, 68.179, 61.8554 ) )
I think that this record in the topology file is redundant, and that the information in the RST7 file should take precedence. (This is the case with the Sire parsers when constructing a system.) However, perhaps this isn't the case when running AMBER? It might be worth trying a simulation where you copy the dimensions from the system_bash.prm7
file into the system_BSS.prm7
file. That said, the IFBOX
pointer is 0 so the record should be ignored according to the AMBER docs.
(I note that ParmEd strips the BOX_DIMENSIONS flag from the topology file on write and the values that it writes to a RST7 file are consistent with Sire.)
I ran a simulaiton test with the box dimensions copied over from system_BSS.prm7
as follows:
import BioSimSpace as BSS
from shutil import copyfile
system = BSS.IO.readMolecules(['system_bash.prm7', 'system_bash.rst7'])
protocol = BSS.Protocol.Production(runtime=5*BSS.Units.Time.nanosecond)
process = BSS.Process.Amber(system, protocol, exe='/home/adele/software/amber20/bin/pmemd.cuda')
copyfile('system_bash.prm7', f'{process.workDir()}/amber.prm7')
#Replace %FLAG BOX_DIMENSIONS with the one from system_BSS.prm7
process.start()
the system was stable after 5 ns.
I have also tried running the same simulation using ParmEd topology via OpenMM as you've done before:
import BioSimSpace as BSS
import parmed as pmd
from shutil import copyfile
top0 = pmd.load_file('system_bash.prm7', xyz='.system_bash.rst7')
parmed_system = top0.createSystem()
top1 = pmd.openmm.load_topology(top0.topology, parmed_system)
top1.save('system_parmed.prm7', format='amber')
system = BSS.IO.readMolecules(['system_bash.prm7', 'system_bash.rst7'])
protocol = BSS.Protocol.Production(runtime=5*BSS.Units.Time.nanosecond)
process = BSS.Process.Amber(system, protocol, exe='/home/adele/software/amber20/bin/pmemd.cuda')
copyfile('system_parmed.prm7', f'{process.workDir()}/amber.prm7')
process.start()
however, I get the following error:
| ERROR: the combination ntb != 0, ntp != 0, ifbox == 0 is not supported! Input errors occurred. Terminating execution.
In section 1. RESOURCE USE
of the output file it gives:
getting new box info from bottom of inpcrd
NATOM = 35945 NTYPES = 17 NBONH = 33545 MBONA = 2361
NTHETH = 5225 MTHETA = 3181 NPHIH = 10531 MPHIA = 10049
NHPARM = 0 NPARM = 0 NNB = 67175 NRES = 10793
NBONA = 2361 NTHETA = 3181 NPHIA = 10049 NUMBND = 37
NUMANG = 44 NPTRA = 12269 NATYP = 1 NPHB = 0
IFBOX = 0 NMXRS = 24 IFCAP = 0 NEXTRA = 0
NCOPY = 0
however, the inpcrd file was not changed. And if I run the above process omitting the copyfile
line, the output gives IFBOX = 1
. I'm including the topology, coordinate and output files here. The coordinate files are exactly the same.
The ParmEd I used was the default that came with my Amber20 installation, could that be an issue? If you managed to run the energy tests with sander
using the topology you produced with ParmEd and could upload the topology here, I can try to repeat it with that file.
ETA: changing the system_parmed.prm7
IFBOX pointer from 0 to 1 furthers the process. However, that produced some other errors about missing flags, namely ATOMS_PER_MOLECULE
and SOLVENT_POINTERS
. Copying those flags from system_BSS.prm7
allowed the simulation to start. I will update with results.
Thanks for the info. Good to know that changing the box dimensions doesn't have an effect. (It shouldn't.) Here is the ParmEd topology that was generated via conversion to / from an OpenMM system:
I've compare the two ParmEd system files and they are identical. I have also finished the 5 ns test with the working ParmEd topology (system_parmed_working.prm7
), and the system is stable.
Here are the files. I've also included the final frames of a simulation with using default BSS topology (system_BSS_final.pdb
) and the modified ParmEd topology (system_parmed_final.pdb
) for reference.
I have pulled 10 frames from a steering trajectory as discussed here. The folder also includes the system topology.
This trajectory is quite long (150 ns) and I was seeing unfolding within a few ns in my previous tests. Let me know if you want frames within the first 5-10 ns and I can get those for you as well.
Just to check: The system.prm7
file included in the directory looks different to the system_bash.prm7
from the post, e.g. different number of atoms, etc.
system.prm7
%FLAG POINTERS
%FORMAT(10I8)
40270 21 37769 2458 5397 3312 10879 10405 0 0
73621 12193 2458 3312 10405 73 170 217 41 1
0 0 0 0 0 0 0 1 24 0
system_bash.prm7
%FLAG POINTERS
%FORMAT(10I8)
35945 18 33545 2361 5225 3181 10531 10049 0 0
67175 10793 2361 3181 10049 70 162 216 38 1
0 0 0 0 0 0 0 1 24 0
0
Is this just the same system with, e.g., a different number of water molecules? Also, is this the original topology, i.e. from you setup pipleline, or the one written by BioSimSpace. Since this differs from the system_bash.prm7
I'll need the equivalent file to be able to make comparisons, i.e. the one from tLEaP
, not BioSimSpace.
Sorry, I forgot which system I had provided originally. This is PTP1B + a peptide substrate. I can get snapshots from just PTP1B, as well as snapshots from an unstable trajectory.
For reference this topology is written by leap.
That's okay. As long as it's the tLEaP
files and the system is unstable, that's all we need.
Single point energies agree along the entire trajectory. This archive contains some scripts to perform the checks (data already included). To run the single-point calculations, update the path to sander
in single_point.sh
and run it with something like:
./single_point.sh leap
./single_point.sh bss
./single_point.sh parmed
Once energy files are generated we can compare them using diff3
(assuming it is installed). The check_energies.sh
script will do this for all frames and raise an error if the energies disagree and tell you what frame mismatched.
(Edited to included missing min/amber.cfg
file.)
Here are 10 frames from an unstable 5 ns snapshot. I loaded the same system as from yesterday and ran a default 5 ns Amber production process. I also included the topology BSS produced just in case. Maybe this will help to understand why this is happening...
I will also run short setups and production from other PDBs and will let you know how that goes.
Sorry, realised I forgot to upload the amber.cfg file, which should be placed in the min
directory. Energies also agree along the unstable trajectory.
hi @AdeleLip might be a studid question, the frames are not imaged ? are you sure the simulation is done in a periodic box ? It does look like a water droplet evaporating, leaving a protein in vacuum to unfold.
I have been using the config files generated by BSS, so I assumed periodic conditions were applied. The file contents are:
Production.
&cntrl
ig=-1,
ntx=1,
ntxo=1,
ntpr=2500,
ntwr=2500,
ntwx=2500,
irest=0,
dt=0.002,
nstlim=2500000,
ntc=2,
ntf=2,
ntt=3,
gamma_ln=2,
cut=8.0,
tempi=300.00,
temp0=300.00,
ntp=1,
pres0=1.01325,
/
from the Amber20 manual:
ntb This variable controls whether or not periodic boundaries are imposed on the system ... = 2 constant pressure (default when ntp > 0) If NTB is nonzero then there must be a periodic boundary in the topology file.
I'm not 100% sure where in the topology file periodic boundaries are indicated, but if it's DIHEDRAL_PERIODICITY
then I'm seeing a lot of differences between the original and the BSS topology. Could that be it?
The topology file has a %FLAG BOX_DIMENSIONS
record, although this is now deprecated. Instead, box dimensions are read from the coordinate file.
I'll check the AMBER configs. This was one of the first things I set up, so I might have missed some options, or they might not be set correctly when other options are turned on / off. I know that I set ntb=0
if there is no box.
It looks like I don't explicitly set ntb
if constant pressure is turned on, since it's set to 2
by default if ntp>0
. (As in the docs that you posted.) However, I don't seem to set if it it's not constant pressure. I think it's ntb=1
by default, but will check in the manual.
Might be a red herring, you need iwrap=1 (default 0) to wrap coordinates in the minimum image, see p 342 of https://ambermd.org/doc12/Amber20.pdf
@AdeleLip what happens if you run an MD simulation on the BSS prm7/rst7 inputs using a default config file from a relevant Amber20 tutorial ?
The difference in the DIHEDRAL_PERIODICITY
could be due to the ordering of the dihedral terms. I would have thought that this would surely lead to a difference in the single point energies if it was important, though.
I built the configs using some best practice from a few tutorials, but it would be good to test against some sane defaults from elsewhere to see if anything is inconsistent.
Is it preferable to set iwrap=1
? I would have thought that it would be possible in post-processing and some people might want actual coordinates to compute displacements, etc. (If image numbers aren't available.)
I just ran a test with BSS topology and the following cfg from this tutorial:
Production.
&cntrl
imin=0, irest=1, ntx=5,
ntpr=2500, ntwx=2500, ntwr=2500, nstlim=2500000,
dt=0.002, ntt=3, tempi=300,
temp0=300, gamma_ln=1.0, ig=-1,
ntp=1, ntc=2, ntf=2, cut=9,
ntb=2, iwrap=1, ioutfm=1,
/
after 1.5 ns I see unfolding.
I guess you're using pmemd
here. Could you run the single-point calculations using that? I wonder if there are some rounding issues that are only apparent on the GPU?
You're on to something here. I get small differences in energies for all frames ( pmemd_energies.zip). I used pmem.cuda. From a quick inspection it looks like the differences are in 1-4 VDW and EEL energies.
It could be useful to run an OpenMM MD simulation on the same inputs to check numerical stability.
Do you mean single point energies or full simulations? I was using this same system for openMM sMD testing and had no issues using Amber topologies.
I am running some simple production MD, and am seeing protein unfolding where there shouldn't be any. This is the code I run:
This results in my protein start to unfold (particularly near the N terminal, highlighted in blue): I have also seen complete unfolding after 100 ns previously (which was what led me to investigate):
However, when I run the following:
I get a stable protein, as expected and as I've seen for MD runs outside of BSS. Is it possible that the reading/writing of the system is resulting in a topology different enough to destabilise protein on such a short timescale?
I'm including relevant files: file-diff.zip
(I also tried this with only copying over the original topology file and using the BSS generated rst, which worked fine; this is pointing me towards the topology being an issue).
I am using the dev version of BSS downloaded yesterday (02/02), but the release version downloaded today (03/02) gave the same results.
On import I get:
/home/adele/anaconda3/envs/BSS-env/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 88 from PyObject return f(*args, **kwds)
but michellab/BioSimSpace#37 makes me think it shouldn't matter?Might be unrelated (or a seprate issue), but when I run:
I get the following Sire issue:
The property of type SireMol::AtomCoords is incompatible with the layout with UID {d01e83b8-690a-45b5-9e27-73ebb2651b85}
IncompatibleError: Unable to update 'coordinates' for molecule index '1'
When I load the molecule fresh from the workDir I can continue. Additionally, putting a system parameterised/solvated in leap outside BSS through the same process does not raise an error. Let me know if you think this is related, or if this a separa/non issue.