michellab / Sire

Sire Molecular Simulations Framework
http://siremol.org
GNU General Public License v3.0
95 stars 26 forks source link

Scale factors lost converting from AMBER to GROMACS #337

Open jthorton opened 3 years ago

jthorton commented 3 years ago

Expected Behavior

When converting an AMBER system with non-default scale factors to GROMACS these are reset back to the amber defaults when I think they should be kept. The output grotop file has this default section but I think based on the input below which uses opls style scaling both fudge factors should be 0.5.

; nbfunc      comb-rule       gen-pairs      fudgeLJ     fudgeQQ
 1             2               yes            0.5         0.833333

Steps to Reproduce

import BioSimSpace as BSS

system = BSS.IO.readMolecules(["MOL.rst7", "MOL.prm7"])
BSS.IO.saveMolecules("MOL", system, ["gro87", "grotop"])

Context

Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.

lohedges commented 3 years ago

Hi there,

I'll be able to take a better look at this tomorrow. Just to check, here are the forcefled properties for the two molecules:

import BioSimSpace as BSS

# Load the original system.
system = BSS.IO.readMolecules(["MOL.rst7", "MOL.prm7"])
BSS.IO.saveMolecules("MOL", system, ["gro87", "grotop"])
system[0]._sire_object.property("forcefield")
MM ForceField{ amber::ff,
               combining_rules = arithmetic,
               1-4 scaling = 0.833333, 0.5,
               nonbonded = coulomb, lj,
               bond = harmonic, angle = harmonic,
               dihedral = cosine }

# Convert to GROMACS format.
BSS.IO.saveMolecules("MOL", system, ["gro87", "grotop"])

# Load system back in.
system2 = BSS.IO.readMolecules(["MOL.grotop", "MOL.gro87"])
system2[0]._sire_object.property("forcefield")
MM ForceField{ amber::ff,
               combining_rules = arithmetic,
               1-4 scaling = 0.833333, 0.5,
               nonbonded = coulomb, lj,
               bond = harmonic, angle = harmonic,
               dihedral = cosine }

As you can see, the two forcefield objects are identical. It doesn't look like information is being lost during conversion. Are the scale factors inferred from the initial AMBER files incorrect, i.e. perhaps it is always setting the same value for anything loaded from AMBER format, irrespective of whether this actually was a GROMACS system converted to AMBER.

Apologies if I'm misunderstanding things. Just wanted to be clear before debugging further.

lohedges commented 3 years ago

Yes, I see the issue. As I mentioned, the scale factors are being set incorrectly when going to AMBER format. (Not yet sure if it's on read or write.) Here's a minimal example using some example files from the SireUnitTests repository:

import BioSimSpace as BSS

# Load a GROMACS system with OPLS parameters.
s0 = BSS.IO.readMolecules(["cage_quin1.gro", "cage_quin1.top"], {"GROMACS_PATH" : "./cage_quin1"})
s0[0]._sire_object.property("forcefield")
MM ForceField{ unknown,
               combining_rules = geometric,
               1-4 scaling = 0.5, 0.5,
               nonbonded = coulomb, lj,
               bond = harmonic, angle = harmonic,
               dihedral = gromacs }

# Convert to AMBER format and read back in.
BSS.IO.saveMolecules("test", s, ["prm7", "rst7"])
s1 = BSS.IO.readMolecules(["test.rst7", "test.prm7"])
s1[0]._sire_object.property("forcefield")
MM ForceField{ amber::ff,
               combining_rules = arithmetic,
               1-4 scaling = 0.833333, 0.5,
               nonbonded = coulomb, lj,
               bond = harmonic, angle = harmonic,
               dihedral = cosine }

(This assumes you are in the unittests/io/ directory.)

I'll transfer the issue to Sire and see if I can come up with a fix.

lohedges commented 3 years ago

If I analyse the AmberNB14 terms from the parameters object of the second system (the initial AMBER conversion) I see that the values are correct, even though those listed in the forcefield property are wrong, i.e.:

# Just extract the value of the first NB14 parameter, since they are all the same.
list(s1[0]._sire_object.property("parameters").nb14s().values())[0]
AmberNB14( cscl = 0.5, ljscl = 0.5 )

However, if we then convert to GROMACS format, read in, convert back to AMBER format, read in...

# Convert to GROMACS and read back.
BSS.IO.saveMolecules("test", s, ["grotop", "gro87"])
s2 = BSS.IO.readMolecules(["test.gro87", "test.grotop"])

# Convert to AMBER and read back.
BSS.IO.saveMolecules("test2", s2, ["prm7", "rst7"])
s3 = BSS.IO.readMolecules(["test2.prm7", "test2.rst7"])

# Now print the NB14 values.
list(s3[0]._sire_object.property("parameters").nb14s().values())[0]
AmberNB14( cscl = 0.833333, ljscl = 0.5 )

Here is the appropriate section from test.grotop:

[ defaults ]
; nbfunc      comb-rule       gen-pairs      fudgeLJ     fudgeQQ
  1           2               yes            0.5         0.833333

It looks like the issue is occurring on the second conversions, which is likely due to the inconsistency in the forcefield property from the first conversion.

lohedges commented 3 years ago

The following code in amberprm.cpp sets the forcefield property on read. Essentially, unless the property already exists in the user map, it generates a default based on assumptions about the AMBER format.

    if (map["forcefield"].hasValue())
    {
        ffield = map["forcefield"].value().asA<MMDetail>();

        if (not ffield.isAmberStyle())
            throw SireError::incompatible_error( QObject::tr(
                "This AmberPrm reader can only parse Amber parm files that hold molecules "
                "that are parameterised using an Amber-style forcefield. It cannot read "
                "molecules using the forcefield\n%1").arg(ffield.toString()), CODELOC );
    }
    else
    {
        //now guess the forcefield based on what we know about the potential
        ffield = MMDetail::guessFrom("arithmetic", "coulomb", "lj",
                                     1.0/1.2, 0.5, "harmonic", "harmonic",
                                     "cosine");
    }

It doesn't try to infer the scale factors from any of the AmberNB14 objects that it has constructed, which do contain the correct values. It also stores the incorrect combining rule (originally geometric, but set as arithmetic.) I'm not sure how the combining rule is implemented in the AMBER topology format, or if we can even handle this on conversion. (GROMACS explicitly sets it in the defaults section.)

I think I need some input from @chryswoods to decide how best to proceed. He's busy in meetings for the next few days so I'll revisit this on Friday, or early next week.

chryswoods commented 3 years ago

The issue is that the amber topology (prm) format does not specify the combining rules or the default 1-4 scale factors. The file format supports having different dihedrals having different 1-4 scaling factors, and the Amber default scale factors are used for dihedrals that don't specify their 1-4 terms, e.g. https://github.com/michellab/Sire/blob/369a7788365fe08b05367b5676fc941a7edaae46/corelib/src/libs/SireIO/amberprm.cpp#L4045

Because the combining rules and a "default" 1-4 scaling factor are not in the topology file, it is not possible for Sire to do anything other than guess at what those values should be for the "forcefield" object, hence why it defaults to guessing the Amber values and arithmetic combining rules (when OPLS should be geometric combining rules). In theory the code could scan all of the AmberNB14 objects created after it has read the topology file to find a consensus view of what the "default" 1-4 scaling factors should be. As different dihedrals "could" use different 1-4 terms (although not in any existing forcefield) it would be easiest just to use the 1-4 values from the AmberNB14 that is used most in a System (the defaults apply to the whole System, not just one molecule in the System). That way the guess could be informed by the majority-used AmberNB14 nb14 values.

This wouldn't solve the combining rules issue, as the combining rules cannot be written to the topology file, and are thus not available to be read back in. The only solution here is for the user to specify the combining rules via a user-mapping object on read (e.g. as they can now specify the forcefield manually via the map["forcefield"] parameter).

I should note that I don't believe that the Amber programs support running simulations using geometric combining rules, as I think they only support arithmetic rules. As such, they won't be using the "correct" OPLS forcefield. Last time I checked was a few years back, so things may have changed.

jmichel80 commented 3 years ago

Could this be solved by allowing the user to provide a keyword argument to BSS.IO.readMolecules() that specifies non-default scaling factors to be used ?

lohedges commented 3 years ago

Thanks, @chryswoods, that makes sense. I had written code to do what you suggested regarding scanning the AmberNB14 values after they had been created, then using that the determine the forcefield property, but had stumbled on the combining rules issue. I'm not sure what the best approach is here. Perhaps we can do the conversion GROMACS to AMBER to warn the user that the process isn't reversible when the GROMACS force field uses the geometric rule. Thankfully there are no issues with water models so we can happily swap between them.

lohedges commented 3 years ago

@jmichel80 Yes, that could be a good option.

chryswoods commented 3 years ago

I don't like the idea of printing warnings to the screen, but in this case I think it would be justified. Technically, we should not write a System to AmberPrm if it says it uses geometric combining rules, as these are not supported by the file format. I suggest that we add a warning to AmberPrm that writes a warning to the screen saying something like Amber topology files do not support forcefields that use geometric combining rules, as this cannot be specified in the file. When this file is re-read, then arithmetic combining rules will be assumed.

A very dangerous solution would be for us to add our own %FLAG% to the AmberPrm format that specifies the combining rules to be used. AmberPrm allows for customs flags, e.g. unrecognised flags are ignored, so you can add your own if needed (https://ambermd.org/FileFormats.php#topo.cntrl). It would be safe(ish) to call the flag "%FLAG BSS_COMBINING_RULES" and use "arithmetic" or "geometric" as strings. It does "cross the rubicon" of us modifying file formats, and, ultimately, the best solution would be for the Amber developers to add this flag themselves.

chryswoods commented 3 years ago

@jmichel80 - yes, that is effectively what Sire supports via the "map" keyword. This could be exposed in BioSimSpace and made easier to use (e.g. specifying "combining_rules" and "default_14_params" directly). However, the more we add, the more options would then need to be exported by the workflow components that read these files. So we would need to be careful. Ideally, we should use the information that exists in the file, and this does give the 14-scaling parameters, so we should use those. It may be necessary in the case of combining rules though...

lohedges commented 3 years ago

Thanks, those are good suggestions. I totally agree about not liking warning messages, which is why I have pretty much suppressed everything from third-party libraries, or caught the warning and expressed it in a more informative way.

lohedges commented 3 years ago

The map keyword is already fully exposed throughout BioSimSpace so we can use that if needed. I agree with @chryswoods that information would ideally be in the topology file, since that is the natural object that is passed between workflow components. It would be nice if we could just add our own %FLAG% until a better solution is found, although I worry that it might break external tools, e.g. if they simply fail when encountering an unknown flag.

jmichel80 commented 3 years ago

I suggest posting an example for @jthorton benefit and we will make sure this is well documented when we tidy up later this year. Best to avoid turning BSS into a tool for producing subtly flawed inputs that will run nonetheless...

lohedges commented 3 years ago

I've added the suggested warning to the BioSimSpace writer. I think everything would work if we updated Sire::MM::MMDetail to detect OPLS style force fields. These are essentially AMBER-like, other than the combining rules and scaling factors. At present, we have the code that I posted earlier:

    if (map["forcefield"].hasValue())
    {
        ffield = map["forcefield"].value().asA<MMDetail>();

        if (not ffield.isAmberStyle())
            throw SireError::incompatible_error( QObject::tr(
                "This AmberPrm reader can only parse Amber parm files that hold molecules "
                "that are parameterised using an Amber-style forcefield. It cannot read "
                "molecules using the forcefield\n%1").arg(ffield.toString()), CODELOC );
    }
    else
    {
        //now guess the forcefield based on what we know about the potential
        ffield = MMDetail::guessFrom("arithmetic", "coulomb", "lj",
                                     1.0/1.2, 0.5, "harmonic", "harmonic",
                                     "cosine");
    }

Which would fail if we passed through the forcefield property from the original GROMACS system. If we allowed for something like if (not ffield.isAmberStyle() and not ffield.isOPLS()) then it would be okay. The user could store the original property on read, then use this throughout.

Currently MMDetail.forcefields() returns:

['amber::gaff',
 'amber::ff14',
 'amber::ff99',
 'amber::ff12',
 'amber::ff03',
 'amber::ff']

On loading an OPLS system we get:

MM ForceField{ unknown,
               combining_rules = geometric,
               1-4 scaling = 0.5, 0.5,
               nonbonded = coulomb, lj,
               bond = harmonic, angle = harmonic,
               dihedral = gromacs }

However, the following code in mmdetail.cpp looks like it would do what we want:

/** Function used to guess the forcefield from the passed set of conditions.
    This returns a null MMDetail object if we can't guess */
MMDetail MMDetail::guessFrom(QString combrules, QString elecstyle,
                             QString vdwstyle, double elec14, double vdw14,
                             QString bondstyle, QString angstyle, QString dihstyle)
{
    //start with the internals
    if ( _isHarmonic(bondstyle) and _isHarmonic(angstyle) and _isCosine(dihstyle) )
    {
        if ( _isCoulomb(elecstyle) and _isLJ(vdwstyle) )
        {
            //looking like an amber or opls style forcefield...
            if ( _isArithmetic(combrules) )
            {
                if (elec14 == (1.0/1.2) and vdw14 == 0.5)
                {
                    //this is a standard amber::ff forcefield
                    return MMDetail("amber::ff", combrules, elec14, vdw14,
                                    elecstyle, vdwstyle, bondstyle, angstyle, dihstyle);
                }
                else
                {
                    //this is a weird amber::ff forcefield with strange 1-4 terms...
                    return MMDetail( QString("amber::ff[%1,%2]").arg(elec14).arg(vdw14),
                                     combrules, elec14, vdw14, elecstyle, vdwstyle,
                                     bondstyle, angstyle, dihstyle );
                }
            }
            else if ( _isGeometric(combrules) )
            {
                if (elec14 == 0.5 and vdw14 == 0.5)
                {
                    //this is a standard opls::ff forcefield
                    return MMDetail("opls::ff", combrules, elec14, vdw14,
                                    elecstyle, vdwstyle, bondstyle, angstyle, dihstyle);
                }
                else
                {
                    //this is a weird opls::ff forcefield with strange 1-4 terms...
                    return MMDetail( QString("opls::ff[%1,%2]").arg(elec14).arg(vdw14),
                                     combrules, elec14, vdw14, elecstyle, vdwstyle,
                                     bondstyle, angstyle, dihstyle );
                }
            }
        }
    }

    return MMDetail("unknown", combrules, elec14, vdw14, elecstyle, vdwstyle,
                    bondstyle, angstyle, dihstyle);
}

If looks like we are getting unknown because of the gromacs style dihedral terms. However, now that we can convert Ryckaert-Bellemans dihedrals to AMBER format this should be okay. (See here. for details)

lohedges commented 3 years ago

Okay, I've solved this by flagging Ryckaert-Bellemans dihedrals as a cosine dihedral in gromacsparams.cpp (after which the original force field is detected as opls::ff.) Following this I added an isOPLS method to MMDetail, and use this to allow reading from and AMBER PRM7 file if the user passes in an opls:ff MMDetail object for the forcefield property in the map.

With these changes in place you can do the following:

# This assumes that you are in SireUnitTests/unittests/io

import BioSimSpace as BSS

# Load an OPLS GROMACS system.
s0 = BSS.IO.readMolecules(["cage_quin1.gro", "cage_quin1.top"],
                          {"GROMACS_PATH" : "./cage_quin1"})

# Store the force field property for the only molecule in the system.
ff0 = s0[0]._sire_object.property("forcefield")

# Write to AMBER format. We don't need to pass the force field
# in the map since we can write OPLS without issue, although we
# are now warned that this process is potentially irreversible.
BSS.IO.saveMolecules("test", s0, ["prm7", "rst7"])

# UserWarning: AMBER topology files do not support force fields that
# use geometric combining rules, as this cannot be specified in the file.
# When this file is re-read, then arithmetic combining rules will be assumed."

# Read the files back in and specify the format of the original force field.
s1 = BSS.IO.readMolecules(["test.prm7", "test.rst7"], {"forcefield" : ff0})

# Store the new force field.
ff1 = s1[0]._sire_object.property("forcefield")

# Assert the force fields are the same.
ff0 == ff1
True

# Now write back to GROMACS formats to make sure that the scale factors and
# combining rules are preserved.
BSS.IO.saveMolecules("test", s1, ["grotop", "gro87"])

Check the test.grotop file:

[ defaults ]
; nbfunc      comb-rule       gen-pairs      fudgeLJ     fudgeQQ
  1           1               yes            0.5         0.5

Great, everything has worked as expected!

If needed, I could add a thin wrapper around MMDetail so that the user doesn't have to use the underlying Sire objects. We could also simplify things by allowing them to pass in something like forcefield="opls", which would get converted into the correct MMDetail object behind the scenes and passed through. Let me know what you think. This issue can be closed when we are happy with the solution.