Sire::IO::AmberPrm topology file giving strange MD results

AdeleHardie commented 3 years ago

I am running some simple production MD, and am seeing protein unfolding where there shouldn't be any. This is the code I run:

import BioSimSpace as BSS
from shutil import copyfile

system = BSS.IO.readMolecules(['system_bash.prm7', 'system_bash.rst7'])
protocol = BSS.Protocol.Production(runtime=5*BSS.Units.Time.nanosecond, restart_interval=2500)
process = BSS.Process.Amber(system, protocol, exe='/home/adele/software/amber20_20.04/amber20/bin/pmemd.cuda')

process.start()
process.wait()
copyfile(f'{process.workDir()}/amber.nc', 'production_BSS.nc')

This results in my protein start to unfold (particularly near the N terminal, highlighted in blue): I have also seen complete unfolding after 100 ns previously (which was what led me to investigate): Outlook-tvjmo5nz

However, when I run the following:

import BioSimSpace as BSS
from shutil import copyfile

system = BSS.IO.readMolecules(['system_bash.prm7', 'system_bash.rst7'])
protocol = BSS.Protocol.Production(runtime=5*BSS.Units.Time.nanosecond, restart_interval=2500)
process = BSS.Process.Amber(system, protocol, exe='/home/adele/software/amber20_20.04/amber20/bin/pmemd.cuda')

copyfile('system_bash.prm7', f'{process.workDir()}/amber.prm7')
copyfile('system_bash.rst7', f'{process.workDir()}/amber.rst7')

process.start()
process.wait()
copyfile(f'{process.workDir()}/amber.nc', 'production_BSS.nc')

I get a stable protein, as expected and as I've seen for MD runs outside of BSS. Is it possible that the reading/writing of the system is resulting in a topology different enough to destabilise protein on such a short timescale?

I'm including relevant files: file-diff.zip

(I also tried this with only copying over the original topology file and using the BSS generated rst, which worked fine; this is pointing me towards the topology being an issue).

I am using the dev version of BSS downloaded yesterday (02/02), but the release version downloaded today (03/02) gave the same results.

On import I get: /home/adele/anaconda3/envs/BSS-env/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 88 from PyObject return f(*args, **kwds) but michellab/BioSimSpace#37 makes me think it shouldn't matter?

Might be unrelated (or a seprate issue), but when I run:

protein = BSS.IO.readMolecules('open.pdb')
protein_parm_process = BSS.Parameters.ff14SB(protein.getMolecule(0))
protein_parm = protein_parm_process.getMolecule()
solvated = BSS.Solvent.tip3p(protein_parm.toSystem(), shell=10*BSS.Units.Length.angstrom, ion_conc=0.15, is_neutral=True)
minimisation_protocol = BSS.Protocol.Minimisation(steps=2000)
minimisation_process = BSS.Process.Amber(solvated, minimisation_protocol)
minimisation_process.start()
minimisation_process.wait()
minimised = minimisation_process.getSystem()

I get the following Sire issue: The property of type SireMol::AtomCoords is incompatible with the layout with UID {d01e83b8-690a-45b5-9e27-73ebb2651b85} IncompatibleError: Unable to update 'coordinates' for molecule index '1' When I load the molecule fresh from the workDir I can continue. Additionally, putting a system parameterised/solvated in leap outside BSS through the same process does not raise an error. Let me know if you think this is related, or if this a separa/non issue.

lohedges commented 3 years ago

Thanks, I'll take a look at this tomorrow. I think the second part is an unrelated issue. With our MD drivers we copy coordinates from the restart trajectory files back into the original system so that we preserve the original topology, e.g. the atom naming / numbering convention. It sounds like something has gone wrong at this step for your particular system. When you directly read the files from the working directory you are creating a brand new system, so there are no consistency checks. (What is in the files is what you get.) This is why you are able to continue.

I think the problem might relate to the recent changes to correctly handle the water topology naming conventions required for AMBER and GROMACS. Before running a simulation we convert to the expected format for the engine. I imagine that the atoms have been re-ordered somehow, such that the topology from the restart/trajectory files no longer matches that of the original system.

lohedges commented 3 years ago

At a quick glance there are several differences between the POINTERS flags in the two topology files:

_systembash.prm7

%FLAG POINTERS
%FORMAT(10I8)
   35945      18   33545    2361    5225    3181   10531   10049       0       0
   67175   10793    2361    3181   10049      70     162     216      38       1
       0       0       0       0       0       0       0       1      24       0
       0

_systemBSS.prm7

%FORMAT(10I8)
   35945      17   33545    2361    5225    3181    9906    8439       0       0
   67175   10793    2361    3181    8439      37      44     228      38       0
       0       0       0       0       0       0       0       1      24       0
       0       0       0

I'll try to figure out the reason for the difference and if this is what's causing the problem.

lohedges commented 3 years ago

I've just checked that this issue hasn't been caused by the recent updates to the Sire.IO.AmberPrm parser to correctly handle the NATYP record (see here.) Rolling back to that version gives:

%FLAG POINTERS
%FORMAT(10I8)
   35945      17   33545    2361    5225    3181    9906    8439       0       0
   67175   10793    2361    3181    8439      37      44     228       0       0
       0       0       0       0       0       0       0       1      24       0
       0       0       0

(The only difference is the NATYP record.)

lohedges commented 3 years ago

Okay, I had more time that I thought and think I've fixed the second issue. The problem was that, due to a limitation in Sire, I need to remove then re-add water molecules when swapping the topology between AMBER and GROMACS format. This means that, after the old molecules were deleted, the new ones were added after any remaining molecules in the system, rather than in the same position as the old ones. (I thought I'd already handled this, but obviously not.) This isn't an issue when you have protein / ligand and then waters, but is when you have protein / ligand then waters and ions, since the new waters are re-added after the ions rather than before. I've now fixed this by preserving the molecular ordering.

lohedges commented 3 years ago

The pointers that are different are:

NTYPES   : total number of distinct atom types  (18 vs 17)
NPHIH    : number of dihedrals containing hydrogen (10531 vs 9906)
MPHIA    : number of dihedrals not containing hydrogen (10049 vs 8439)
NPHIA    : MPHIA + number of constraint dihedrals (10049 vs 8439)
NUMBND   : number of unique bond types (70 vs 37)
NUMANG   : number of unique angle type (162 vs 44)
NPTRA    : number of unique dihedral types (216 vs 228)
NPHB     : number of distinct 10-12 hydrogen bond pair types (1 vs 0)

I'll transfer this issue over to the Sire repository as it's clearly an issue with the Sire.IO.AmberPrm parser, rather than BioSimSpace itself.

lohedges commented 3 years ago

For reference, ParmEd preserves the original topology on write:

import parmed as pd

ps =  pd.load_file("system_bash.prm7", xyz="system_bash.rst7")
ps.save("system_parmed.prmtop")

_systemparmed.prmtop:

%FLAG POINTERS
%FORMAT(10I8)
   35945      18   33545    2361    5225    3181   10531   10049       0       0
   67175   10793    2361    3181   10049      70     162     216      38       1
       0       0       0       0       0       0       0       1      24       0
       0

lohedges commented 3 years ago

I'm pretty sure that the issue is occurring on read, since the parameters properties for each molecule loaded from system_bash.* and system_BSS.* appear to be identical. A basic single-point energy comparison of the two systems (within Sire) also gives identical results.

lohedges commented 3 years ago

I have a small lead for the mismatched NPHB pointer. On read this is correctly flagged as 1 but on write it is 0.

import BioSimSpace as BSS

# I've added a print statement into the parser to print the pointer.
s = BSS.IO.readMolecules(BSS.IO.glob("system_bash.*"))
NPHB = 1

BSS.IO.saveMolecules("test", s, "prm7")
NPHB = 0

Looking at the conditional here, I'm not even sure if we support 10-12 hydrogen bond parameters? Perhaps this is the problem, i.e. we are not catching this properly so we are then making some incorrect assumptions for other records?

       ...

       //amber stores the A and B coefficients as the product of all
        //possible combinations. We need to find the values from the
        // LJ_i * LJ_i values
        int idx = nb_parm_index[ ntypes * i + i  ];

        if (idx < 0)
        {
            //this is a 10-12 parameter
            throw SireError::unsupported( QObject::tr(
                    "Sire does not yet support Amber Parm files that "
                    "use 10-12 HBond parameters."), CODELOC );
        }
        ...

@chryswoods: Do you have any thoughts on this?

I'm not yet sure about the issue with the other pointers, but at least they so far seem to be consistent on read/write for the ones that I've tested.

lohedges commented 3 years ago

Yes, this looks like the issue. Here is the NONBONDED_PARM_INDEX from the system_bash.prm7 file:

%FLAG NONBONDED_PARM_INDEX
%FORMAT(10I8)
       1       2       4       7      11      16      22      29      37      46
      56      67      79      92     106     121     137     154       2       3
       5       8      12      17      23      30      38      47      57      68
      80      93     107     122     138     155       4       5       6       9
      13      18      24      31      39      48      58      69      81      94
     108     123     139     156       7       8       9      10      14      19
      25      32      40      49      59      70      82      95     109     124
     140     157      11      12      13      14      15      20      26      33
      41      50      60      71      83      96     110     125     141     158
      16      17      18      19      20      21      27      34      42      51
      61      72      84      97     111     126     142     159      22      23
      24      25      26      27      28      35      43      52      62      73
      85      98     112     127     143     160      29      30      31      32
      33      34      35      36      44      53      63      74      86      99
     113     128     144     161      37      38      39      40      41      42
      43      44      45      54      64      75      87     100     114     129
     145     162      46      47      48      49      50      51      52      53
      54      55      65      76      88     101     115     130     146     163
      56      57      58      59      60      61      62      63      64      65
      66      77      89     102     116     131     147     164      67      68
      69      70      71      72      73      74      75      76      77      78
      90     103     117     132     148     165      79      80      81      82
      83      84      85      86      87      88      89      90      91     104
     118     133     149     166      92      93      94      95      96      97
      98      99     100     101     102     103     104     105     119     134
     150     167     106     107     108     109     110     111     112     113
     114     115     116     117     118     119     120     135     151     168
     121     122     123     124     125     126     127     128     129     130
     131     132     133     134     135     136     152     169     137     138
     139     140     141     142     143     144     145     146     147     148
     149     150     151     152     153      -1     154     155     156     157
     158     159     160     161     162     163     164     165     166     167
     168     169      -1     171

Note that two entries (including the second to last) are -1, hence the conditional above should have been triggered and an exception thrown. I'll try to figure out why this isn't the case.

lohedges commented 3 years ago

These are not being read because of the incorrect NTYPES pointer, i.e. 17 rather than 18. The loop is over NTYPES x NTYPES terms, i.e. 18 x 18 = 324. However, the code is actually looping over 17 x 17 = 289 records, so is missing both of the -1 entries, which are in the last 19 places.

Now to work out why NTYPES is incorrect.

lohedges commented 3 years ago

This probably explains why the other pointers / flags are incorrect too, since the are also built from looping over the incorrect number of terms. The NTYPES record is actually built from looping over the parameters that were read and working out total number of distinct LJ parameters, i.e.:

        for (int i=0; i<params.count(); ++i)
        {
            const auto info = params.constData()[i].info();
            const auto ljs = params.constData()[i].ljs();

            QVector<qint64> mol_atom_types(info.nAtoms());

            for (int j=0; j<info.nAtoms(); ++j)
            {
                const LJParameter lj = ljs[ info.cgAtomIdx(AtomIdx(j)) ];

                int idx = ljparams.indexOf(lj);

                if (idx == -1)
                {
                    ljparams.append(lj);
                    mol_atom_types[j] = ljparams.count();
                }
                else
                {
                    mol_atom_types[j] = idx + 1;
                }
            }

            atom_types[i] = mol_atom_types;
        }

        // We now have all of the atom types - create the acoeff and bcoeff arrays.
        const int ntypes = ljparams.count();

I assume one term (possibly for those involving 10-12 hydrogen bonds) is being processed incorrectly, which is leading the the miscount.

For simplicity I guess we could just raise an exception whenever NPHB is non-zero. We should also probably add in checks that the number of records related to certain FLAGentries matches the expected NTYPES x NTYPES. If not, then we know that we haven't correctly identified the unique types in the system.

chryswoods commented 3 years ago

Yes, that sounds right. Sire does not support 10-12 terms as these (I thought) were an old relic of older force fields and no longer widely used. They would require a lot of work to support (a whole duplicate of 10-12 versions of the LJ terms and forcefields) and so I didn't think it was worth it.

I am surprised that the exception was not raised on read. It would be worth finding out why 10-12 terms are in this file, particularly as they would break some of our MD drivers (e.g. somd) and are not supported by our gromacs writer either (I think).

@jmichel80 do you think 10-12 terms are needed or are making a comeback? Is it worth putting in the work to support them?

lohedges commented 3 years ago

Yes, I thought that they were only associated with some of the older force fields too. Thanks for the info regarding the lack of support for the other MD drivers.

I'll try to work out why the NTYPES pointer is being inferred incorrectly. (We read it, but then re-construct everything from the molecular topology generated from the other records.)

If we don't want to support 10-12 terms it would be easy enough to raise an exception whenever NPHB (pointers[19] in the code) is non-zero. This is read before any of the flags are processed so would be caught early.

It think it would also be good to add some self-consistency checks to the code too, since this would help us catch situations where we are misreading (or re-constructing) the data in the topology file.

lohedges commented 3 years ago

Actually, there are some self-consistency checks, e.g. here:


    const int ntypes = pointers[1];  //number of distinct atom types

    if (ntypes <= 0)
        return;

    const int nphb = pointers[19];   //number of distinct 10-12 hydrogen bond pair types

    lj_data = QVector<LJParameter>(ntypes);
    auto lj_data_array = lj_data.data();

    auto acoeffs = float_data.value("LENNARD_JONES_ACOEF");
    auto bcoeffs = float_data.value("LENNARD_JONES_BCOEF");

    auto hbond_acoeffs = float_data.value("HBOND_ACOEF");
    auto hbond_bcoeffs = float_data.value("HBOND_BCOEF");

    auto nb_parm_index = int_data.value("NONBONDED_PARM_INDEX");

    qDebug() << "NONBONDED_PARM_INDEX:" << nb_parm_index.count();
    qDebug() << "NTYPES:" << ntypes;

    if (acoeffs.count() != bcoeffs.count() or
        acoeffs.count() != (ntypes*(ntypes+1))/2)
    {
        throw SireIO::parse_error( QObject::tr(
                "Incorrect number of LJ coefficients for the number of specified "
                "atom types! Should be %1 for %2 types, but actually have "
                "%3 LJ A-coefficients, and %4 LJ B-coefficients")
                    .arg((ntypes*(ntypes+1))/2)
                    .arg(ntypes)
                    .arg(acoeffs.count())
                    .arg(bcoeffs.count()), CODELOC );
    }

    if (nb_parm_index.count() != ntypes*ntypes)
    {
        throw SireIO::parse_error( QObject::tr(
                "Incorrect number of non-bonded parameter indicies. There should "
                "be %1 indicies for %2 types, but actually have %3.")
                    .arg(ntypes*ntypes)
                    .arg(ntypes)
                    .arg(nb_parm_index.count()), CODELOC );
    }

    if (hbond_acoeffs.count() != nphb or
        hbond_bcoeffs.count() != nphb)
    {
        throw SireIO::parse_error( QObject::tr(
                "Incorrect number of HBond parameters. There should be "
                "%1 such parameters, but the number of HBond A coefficients is "
                "%2, and the number of B coefficients is %3.")
                    .arg(nphb)
                    .arg(hbond_acoeffs.count())
                    .arg(hbond_bcoeffs.count()), CODELOC );
    }

On read, this prints:

NONBONDED_PARM_INDEX: 324
NTYPES: 18

This means that it is looping over the correct number of entries, i.e. it is using the pointers value rather than that inferred from the unique LJ parameters. I'll figure out why it isn't seeing the -1 entries (presumably the loop is wrong) since this would trigger the exception.

lohedges commented 3 years ago

Looking again at the code the loop is only over diagonal elements of the matrix, so we're missing the -1 terms. (As such, I'm not sure if we'd ever detect the presence of 10-12 terms.) This code from ParmEd has routines for looping over all interactions, detecting non-zero 10-12 terms, etc.

If we don't care about the off-diagonal terms then we could just add an additional loop inside build_lj that loops over them and raises the exception if any index is -1.

lohedges commented 3 years ago

Adding something like the following as a check:

    // The build_lj function above only considers diagonal elements of the
    // NONBONDED_PARM_INDEX matrix. Here we loop over the off-diagonal elements
    // to check for 10-12 parameters, which are currently unsupported.

    // The matrix is symmetric, so perform a triangular loop over off-diagonal
    // elements.
    for (int i=0; i<ntypes; ++i)
    {
        for (int j=i+1; j<ntypes; ++j)
        {
            int idx = nb_parm_index[ ntypes * i + j  ];

            if (idx < 0)
            {
                auto a = hbond_acoeffs[idx];
                auto b = hbond_bcoeffs[idx];

                if ((a > 1e-6) and (b > 1e-6))
                {
                    //this is a 10-12 parameter
                    throw SireError::unsupported( QObject::tr(
                            "Sire does not yet support Amber Parm files that "
                            "use 10-12 HBond parameters."), CODELOC );
                }
            }
        }
    }

However, looking at the system_bash.prm7 file, the values of the coefficients are:

%FLAG HBOND_ACOEF
%FORMAT(5E16.8)
  0.00000000E+00
%FLAG HBOND_BCOEF
%FORMAT(5E16.8)
  0.00000000E+00

Since they are zero, there aren't any non-zero 10-12 terms for this system, so we should be okay. (SireUnitTests contains several top files with -1 entries in the NONBONDED_PARM_INDEX flags, but also with zero terms for the coefficients.)

Will try to work out why the other pointers are incorrect.

lohedges commented 3 years ago

Hmm, I've realised the writing out with ParmEd was simply saving the original file to back to disk. Going via an OpenMM system loses the NPHB pointer for this system, so it's clearly redundant. (For this system, at least.) Sorry about that.

import parmed as pd

ps0 =  pd.load_file("system_bash.prm7", xyz="system_bash.rst7")
omm_system = ps0.createSystem()
ps1 = pd.openmm.load_topology(ps0.topology, omm_system)
ps1.save("system_parmed.prmtop")

Here are the pointers for the original system and the BSS and ParmEd conversions:

_systembash.prm7

%FLAG POINTERS
%FORMAT(10I8)
   35945      18   33545    2361    5225    3181   10531   10049       0       0
   67175   10793    2361    3181   10049      70     162     216      38       1
       0       0       0       0       0       0       0       1      24       0
       0

_systemBSS.prm7

%FORMAT(10I8)
   35945      17   33545    2361    5225    3181    9906    8439       0       0
   67175   10793    2361    3181    8439      37      44     228      38       0
       0       0       0       0       0       0       0       1      24       0
       0       0       0

_systemparmed.prmtop:

%FLAG POINTERS
%FORMAT(10I8)
   35945      17   33545    2361    5225    3181   10531   10049       0       0
   67175   10793    2361    3181   10049      37      44   12269       1       0
       0       0       0       0       0       0       0       0      24       0
       0

Note that the reconstructed ParmEd topology disagrees in the NATYP pointer (it is 1, when it should be 38). It also claims that there are 12269 unique dihedral types, rather than 216. However, it now agrees with Sire that there are 17 rather than 18 unique types, (NTYPES), 37 rather than 70 unique bonds (NUMBND) and 44 rather than 162 unique angles (NUMANG).

It would be interesting to know whether the ParmEd file blows up in simulation too, which might point to there being something funky with the original system. (At least in the sense of it being a hard system to parse correctly.) The fact that, apart from the dihedrals, it agrees more closely with Sire after conversion makes me think that something is wrong, or that there is redundant information in the original topology.

I'll keep digging on the other terms.

lohedges commented 3 years ago

On read the dihedral terms are correct, in the sense that counting the total number of dihedrals plus impropers with and without hydrogen gives the correct values of 10531 and 10049 respectively, i.e. 10027 + 504 and 9620 + 429. However, the values that are produced on write, 9906 and 8439, aren't simply the value of just the dihedral terms with and without hydrogen, i.e. missing the impropers, which would be 10027 and 9620.

lohedges commented 3 years ago

Something that I should have tried earlier is a single-point energy comparison between the the three topology files using sander, rather than just doing this internally within Sire. Here I'm performing a default, single-step, minimisation using BioSimSpace and looking at the energy log file amber.nrg. I've just replaced amber.prm7 in the working directory with the appropriate topology file, and re-run the minimisation. (Importantly this is copied across by hand, not written by BioSimSpace.)

Here are the results:

sander -O -i amber.cfg -p amber.prm7 -c amber.rst7 -o stdout -r amber.crd -inf amber.nrg

_systembash.prm7:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1      -1.1623E+05     1.4600E+01     1.0911E+02     C        3537

 BOND    =      867.4181  ANGLE   =     2384.8699  DIHED      =     3746.3088
 VDWAALS =    12410.6216  EEL     =  -148008.4340  HBOND      =        0.0000
 1-4 VDW =     1063.6145  1-4 EEL =    11302.3846  RESTRAINT  =        0.0000

_systemBSS.prm7:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1      -1.1623E+05     1.4600E+01     1.0911E+02     C        3537

 BOND    =      867.4180  ANGLE   =     2384.8698  DIHED      =     3746.3089
 VDWAALS =    12410.6217  EEL     =  -148008.4341  HBOND      =        0.0000
 1-4 VDW =     1063.6145  1-4 EEL =    11302.3846  RESTRAINT  =        0.0000

_systemparmed.prm7:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1      -1.1623E+05     1.4600E+01     1.0911E+02     C        3537

 BOND    =      867.4180  ANGLE   =     2384.8698  DIHED      =     3746.3089
 VDWAALS =    12410.6217  EEL     =  -148008.4341  HBOND      =        0.0000
 1-4 VDW =     1063.6145  1-4 EEL =    11302.3846  RESTRAINT  =        0.0000

As you can see, despite the differences in the topology files, the energies are in near perfect agreement. BioSimSpace and ParmEd agree exactly, whereas the original system_bash.prm7 file differs only in the fourth decimal place for terms in the first two lines.

Differences in topology file records are hard to interpret without looking at single-point energy tests such as those above, since Sire performs a lot of clever de-duplication of the records, such as for dihedral terms. This would explain why the pointers count for those records is lower, yet the energy is the same. (There can be a lot of redundancy in a topology file.)

Given the above, I'm at a loss to explain why you are seeing such markedly different results using the BioSimSpace topology. Is the behaviour repeatable across multiple runs? What happens if you use the ParmEd topology file? If only the BioSimSpace topology reproducibly gives weird results, then the only thing that I can think of is that there are other records in the topology file that are being lost on write that are causing incompatibility with some of the default options that we use for our other protocols, e.g. BioSimSpace.Protocol.Production.

lohedges commented 3 years ago

I just realised that I had copied across the binary system_bash.rst7 file to use as the coordinates file amber.rst7. If you use the Sire.IO.AmberPrm7 text file written by BioSimSpace (which you said hasn't caused issues), then the energies computed using all three topology are identical, i.e. the differences in the fourth decimal places disappear.

AdeleHardie commented 3 years ago

I have tried a lot of different bash+copy/BSS combinations to arrive at the conclusion it's something to do with topology, so this issue has happened for multiple runs.

I also used BSS for production runs for older systems I had setup with ff19SB. I know now BSS doesn't support ff19SB, but loading the topology into AMBER's cpptraj and saving it again seemed to remove the CMAP parameters and the system could then be loaded into BSS (which isn't entirely right but it worked for the time being). Those runs were alright for the 100-200 ns timescale using all the files created by BSS in terms of protein stability. However, the system did show some behaviour that was different (mainly some strange loop dynamics). This doesn't happen now that I copy over files.

All this makes me wonder if there is just something with the system itself that is just really sensitive to some small change, since I assume noone else has complained about protein unfolding so far...

I can do some more tests with ParmEd once I one of my GPUs frees up tomorrow.

jmichel80 commented 3 years ago

If the CMAP terms were omitted surely the dihedrals would be completely wrong, leading to erratic behavior, especially on a longer MD timescale ?

lohedges commented 3 years ago

Thanks for the update. I'd certainly be interested in the ParmEd comparison if you get a chance.

Just to check:

I also used BSS for production runs for older systems I had setup with ff19SB. I know now BSS doesn't support ff19SB, but loading the topology into AMBER's cpptraj and saving it again seemed to remove the CMAP parameters and the system could then be loaded into BSS (which isn't entirely right but it worked for the time being). Those runs were alright for the 100-200 ns timescale using all the files created by BSS in terms of protein stability. However, the system did show some behaviour that was different (mainly some strange loop dynamics). This doesn't happen now that I copy over files.

Is the comparison here between the original ff19SB files with CMAP records and those generated by BioSimSpace after the loading and saving the CCPTRAJ files that stripped those records, or between the CCPTRAJ files themselves and those written by BioSimSpace? If it was the former, then I imagine that the CMAP terms in the original files are stabilising backbone fluctuations that you are seeing with BioSimSpace. If it's the latter, then it suggests that something is going wrong with the BioSimSpace generated topology. (It might be good to check single-point energies for these files too.)

AdeleHardie commented 3 years ago

yes, I do think remoivng the CMAP parameters must've messed with the system somehow. However that happened everytime I'd load the ff19SB system into cpptraj for processing, it would just say those are ignored, loading then immediately saving the topology without doing anything else to it removed them, and the simulations ran (kind of?) okay so it didn't alarm me as much as it should have.

The comparison is between (1) original ff19SB ran by hand, (2) ff19SB with CMAP automatically removed by cpptraj and ran in BSS and now also (3) ff14SB ran in BSS but while copying files. The behaviour I see in (1) and (3) is the same, while (2) is different but as Julien said it might be due to the whole system being a mess without CMAP. I can go back to those systems with ff19SB and run a short test simulation by hand to see what happens with cpptraj generated topologies without CMAP ran by hand.

lohedges commented 3 years ago

Okay, that explains the difference then. Only the system_bash.prm7 file doesn't include any CMAP records?

Looking at the CCPTRAJ GitHub page it looks like support for CMAP records was only added in September of last year, so it might not be in the most recent AmberTools. I know that ParmEd supports the records, although I'm not sure if this is just read/write to the same format, or on conversion too.

AdeleHardie commented 3 years ago

Yes, so the process of getting an ff19SB system into BSS was:

load topology into cpptraj
save the topology (no changes made)
load the new topology and unchanged .rst7 files
proceed with production

The system_bash.prm7 file (and all files I've uploaded here) is different, however. It has been prepared with ff14SB to stay consistent with what BSS supports, since I made all those files while trying to identify what was going wrong.

lohedges commented 3 years ago

Thanks for the clarification. As @jmichel80 says, I would certainly expect the systems without CMAP terms (for a forcefield that uses them) to be unstable on long time scales if those terms were removed. As for the ff14SB non-CMAP system, given that the single-point energies agree, I can't think what could have gone wrong.

I had another look at the files and noticed that there is a BOX_DIMENSIONS record in the prm7 files. This differs between system_bash.prm7 and system_BSS.prm7:

_systembash.prm7:

%FLAG BOX_DIMENSIONS
%FORMAT(5E16.8)
  9.00000000E+01  9.22133330E+01  7.39995300E+01  6.71360020E+01

_systemBSS.prm7:

%FLAG BOX_DIMENSIONS
%FORMAT(5E16.8)
  9.00000000E+01  8.49602111E+01  6.81790310E+01  6.18553599E+01

However, there must also be box information in the *.rst7files, since when I load them the space property of the systems are identical:

import BioSimSpace as BSS

s_bash = BSS.IO.readMolecules(BSS.IO.glob("system_bash.*"))
s_bash._sire_object.property("space")
PeriodicBox( ( 84.9602, 68.179, 61.8554 ) )

s_bss = BSS.IO.readMolecules(BSS.IO.glob("system_BSS.*"))
s_bss._sire_object.property("space")
PeriodicBox( ( 84.9602, 68.179, 61.8554 ) )

I think that this record in the topology file is redundant, and that the information in the RST7 file should take precedence. (This is the case with the Sire parsers when constructing a system.) However, perhaps this isn't the case when running AMBER? It might be worth trying a simulation where you copy the dimensions from the system_bash.prm7 file into the system_BSS.prm7 file. That said, the IFBOX pointer is 0 so the record should be ignored according to the AMBER docs.

(I note that ParmEd strips the BOX_DIMENSIONS flag from the topology file on write and the values that it writes to a RST7 file are consistent with Sire.)

AdeleHardie commented 3 years ago

I ran a simulaiton test with the box dimensions copied over from system_BSS.prm7 as follows:

import BioSimSpace as BSS
from shutil import copyfile
system = BSS.IO.readMolecules(['system_bash.prm7', 'system_bash.rst7'])
protocol = BSS.Protocol.Production(runtime=5*BSS.Units.Time.nanosecond)
process = BSS.Process.Amber(system, protocol, exe='/home/adele/software/amber20/bin/pmemd.cuda')
copyfile('system_bash.prm7', f'{process.workDir()}/amber.prm7')
#Replace %FLAG BOX_DIMENSIONS with the one from system_BSS.prm7
process.start()

the system was stable after 5 ns.

I have also tried running the same simulation using ParmEd topology via OpenMM as you've done before:

import BioSimSpace as BSS
import parmed as pmd
from shutil import copyfile
top0 = pmd.load_file('system_bash.prm7', xyz='.system_bash.rst7')
parmed_system = top0.createSystem()
top1 = pmd.openmm.load_topology(top0.topology, parmed_system)
top1.save('system_parmed.prm7', format='amber')
system = BSS.IO.readMolecules(['system_bash.prm7', 'system_bash.rst7'])
protocol = BSS.Protocol.Production(runtime=5*BSS.Units.Time.nanosecond)
process = BSS.Process.Amber(system, protocol, exe='/home/adele/software/amber20/bin/pmemd.cuda')
copyfile('system_parmed.prm7', f'{process.workDir()}/amber.prm7')
process.start()

however, I get the following error: | ERROR: the combination ntb != 0, ntp != 0, ifbox == 0 is not supported! Input errors occurred. Terminating execution. In section 1. RESOURCE USE of the output file it gives:

 getting new box info from bottom of inpcrd
 NATOM  =   35945 NTYPES =      17 NBONH =   33545 MBONA  =    2361
 NTHETH =    5225 MTHETA =    3181 NPHIH =   10531 MPHIA  =   10049
 NHPARM =       0 NPARM  =       0 NNB   =   67175 NRES   =   10793
 NBONA  =    2361 NTHETA =    3181 NPHIA =   10049 NUMBND =      37
 NUMANG =      44 NPTRA  =   12269 NATYP =       1 NPHB   =       0
 IFBOX  =       0 NMXRS  =      24 IFCAP =       0 NEXTRA =       0
 NCOPY  =       0

however, the inpcrd file was not changed. And if I run the above process omitting the copyfile line, the output gives IFBOX = 1. I'm including the topology, coordinate and output files here. The coordinate files are exactly the same.

The ParmEd I used was the default that came with my Amber20 installation, could that be an issue? If you managed to run the energy tests with sander using the topology you produced with ParmEd and could upload the topology here, I can try to repeat it with that file.

ETA: changing the system_parmed.prm7 IFBOX pointer from 0 to 1 furthers the process. However, that produced some other errors about missing flags, namely ATOMS_PER_MOLECULE and SOLVENT_POINTERS. Copying those flags from system_BSS.prm7 allowed the simulation to start. I will update with results.

lohedges commented 3 years ago

Thanks for the info. Good to know that changing the box dimensions doesn't have an effect. (It shouldn't.) Here is the ParmEd topology that was generated via conversion to / from an OpenMM system:

system_parmed.prm7.txt

AdeleHardie commented 3 years ago

I've compare the two ParmEd system files and they are identical. I have also finished the 5 ns test with the working ParmEd topology (system_parmed_working.prm7), and the system is stable. Here are the files. I've also included the final frames of a simulation with using default BSS topology (system_BSS_final.pdb) and the modified ParmEd topology (system_parmed_final.pdb) for reference.

AdeleHardie commented 3 years ago

I have pulled 10 frames from a steering trajectory as discussed here. The folder also includes the system topology.

This trajectory is quite long (150 ns) and I was seeing unfolding within a few ns in my previous tests. Let me know if you want frames within the first 5-10 ns and I can get those for you as well.

lohedges commented 3 years ago

Just to check: The system.prm7 file included in the directory looks different to the system_bash.prm7 from the post, e.g. different number of atoms, etc.

system.prm7

%FLAG POINTERS
%FORMAT(10I8)
   40270      21   37769    2458    5397    3312   10879   10405       0       0
   73621   12193    2458    3312   10405      73     170     217      41       1
       0       0       0       0       0       0       0       1      24       0

system_bash.prm7

%FLAG POINTERS
%FORMAT(10I8)
   35945      18   33545    2361    5225    3181   10531   10049       0       0
   67175   10793    2361    3181   10049      70     162     216      38       1
       0       0       0       0       0       0       0       1      24       0
       0

Is this just the same system with, e.g., a different number of water molecules? Also, is this the original topology, i.e. from you setup pipleline, or the one written by BioSimSpace. Since this differs from the system_bash.prm7 I'll need the equivalent file to be able to make comparisons, i.e. the one from tLEaP, not BioSimSpace.

AdeleHardie commented 3 years ago

Sorry, I forgot which system I had provided originally. This is PTP1B + a peptide substrate. I can get snapshots from just PTP1B, as well as snapshots from an unstable trajectory.

For reference this topology is written by leap.

lohedges commented 3 years ago

That's okay. As long as it's the tLEaP files and the system is unstable, that's all we need.

lohedges commented 3 years ago

Single point energies agree along the entire trajectory. This archive contains some scripts to perform the checks (data already included). To run the single-point calculations, update the path to sander in single_point.sh and run it with something like:

./single_point.sh leap
./single_point.sh bss
./single_point.sh parmed

Once energy files are generated we can compare them using diff3 (assuming it is installed). The check_energies.sh script will do this for all frames and raise an error if the energies disagree and tell you what frame mismatched.

(Edited to included missing min/amber.cfg file.)

AdeleHardie commented 3 years ago

Here are 10 frames from an unstable 5 ns snapshot. I loaded the same system as from yesterday and ran a default 5 ns Amber production process. I also included the topology BSS produced just in case. Maybe this will help to understand why this is happening...

I will also run short setups and production from other PDBs and will let you know how that goes.

lohedges commented 3 years ago

Sorry, realised I forgot to upload the amber.cfg file, which should be placed in the min directory. Energies also agree along the unstable trajectory.

jmichel80 commented 3 years ago

hi @AdeleLip might be a studid question, the frames are not imaged ? are you sure the simulation is done in a periodic box ? It does look like a water droplet evaporating, leaving a protein in vacuum to unfold.

AdeleHardie commented 3 years ago

I have been using the config files generated by BSS, so I assumed periodic conditions were applied. The file contents are:

Production.
 &cntrl
  ig=-1,
  ntx=1,
  ntxo=1,
  ntpr=2500,
  ntwr=2500,
  ntwx=2500,
  irest=0,
  dt=0.002,
  nstlim=2500000,
  ntc=2,
  ntf=2,
  ntt=3,
  gamma_ln=2,
  cut=8.0,
  tempi=300.00,
  temp0=300.00,
  ntp=1,
  pres0=1.01325,
 /

from the Amber20 manual:

ntb This variable controls whether or not periodic boundaries are imposed on the system ... = 2 constant pressure (default when ntp > 0) If NTB is nonzero then there must be a periodic boundary in the topology file.

I'm not 100% sure where in the topology file periodic boundaries are indicated, but if it's DIHEDRAL_PERIODICITY then I'm seeing a lot of differences between the original and the BSS topology. Could that be it?

lohedges commented 3 years ago

The topology file has a %FLAG BOX_DIMENSIONS record, although this is now deprecated. Instead, box dimensions are read from the coordinate file.

I'll check the AMBER configs. This was one of the first things I set up, so I might have missed some options, or they might not be set correctly when other options are turned on / off. I know that I set ntb=0 if there is no box.

lohedges commented 3 years ago

It looks like I don't explicitly set ntb if constant pressure is turned on, since it's set to 2 by default if ntp>0. (As in the docs that you posted.) However, I don't seem to set if it it's not constant pressure. I think it's ntb=1 by default, but will check in the manual.

jmichel80 commented 3 years ago

Might be a red herring, you need iwrap=1 (default 0) to wrap coordinates in the minimum image, see p 342 of https://ambermd.org/doc12/Amber20.pdf

@AdeleLip what happens if you run an MD simulation on the BSS prm7/rst7 inputs using a default config file from a relevant Amber20 tutorial ?

lohedges commented 3 years ago

The difference in the DIHEDRAL_PERIODICITY could be due to the ordering of the dihedral terms. I would have thought that this would surely lead to a difference in the single point energies if it was important, though.

lohedges commented 3 years ago

I built the configs using some best practice from a few tutorials, but it would be good to test against some sane defaults from elsewhere to see if anything is inconsistent.

lohedges commented 3 years ago

Is it preferable to set iwrap=1? I would have thought that it would be possible in post-processing and some people might want actual coordinates to compute displacements, etc. (If image numbers aren't available.)

AdeleHardie commented 3 years ago

I just ran a test with BSS topology and the following cfg from this tutorial:

Production.
 &cntrl
   imin=0, irest=1, ntx=5, 
   ntpr=2500, ntwx=2500, ntwr=2500, nstlim=2500000, 
   dt=0.002, ntt=3, tempi=300, 
   temp0=300, gamma_ln=1.0, ig=-1, 
   ntp=1, ntc=2, ntf=2, cut=9, 
   ntb=2, iwrap=1, ioutfm=1,
 /

after 1.5 ns I see unfolding.

lohedges commented 3 years ago

I guess you're using pmemd here. Could you run the single-point calculations using that? I wonder if there are some rounding issues that are only apparent on the GPU?

AdeleHardie commented 3 years ago

You're on to something here. I get small differences in energies for all frames ( pmemd_energies.zip). I used pmem.cuda. From a quick inspection it looks like the differences are in 1-4 VDW and EEL energies.

jmichel80 commented 3 years ago

It could be useful to run an OpenMM MD simulation on the same inputs to check numerical stability.

AdeleHardie commented 3 years ago

Do you mean single point energies or full simulations? I was using this same system for openMM sMD testing and had no issues using Amber topologies.

michellab / Sire

Sire::IO::AmberPrm topology file giving strange MD results #338