Closed msuruzhon closed 2 years ago
Thanks for this. Ironically I've just stumbled across this issue myself. In addition, there appear to be a few other wrinkles when using OpenFF, e.g. molecules being returned with no ligand name when using SMILES, which would break writing of GROMACS topology files. (See here for details.
As you say, we'll need some tricks to uniquify the atom types to ensure that all atom definitions appear in the topology. I'll look into this when I get the chance. It is probably easiest to add this as a hard requirement of the merging code itself, rather than tweaking things when topology files are written. I guess we might need to be careful if specify atom names have special meanings for an engine, but this is generally been okay in practice. (Apart from naming of water molecules and dummy atoms, in my experience.)
Cheers.
Having looked at the Sire.IO.GroTop
parser I think there actually might be a bug in the way that the [ atomtypes ]
record is created in that it only considers the types from the lambda=0 state. (This is irrespective of types with the same name having different parameters.) This won't be a problem when types in the lambda = 1 state also exist at lambda = 0, but would if when the lambda = 1 state contains unique types. I'll check this in more detail and update when I've figured things out. (We probably need to combine atomtypes in both end states to write the record.)
No, I was wrong, we do inspect both end states. This is just an issue of dealing with atom types with the same name, but different parameters, which should be possible to solve in the merge
function.
I have a strategy to uniquify the atomtype
properties, which should hopefully work for GROMACS. Just to check, does this cause issues with the AMBER FEP setup too? If so, I might need to think about uniquifying the ambertype
property too.
I think that ambertype should be unique, as there is more agreement what those types mean (they are defined in the gaff or FFXX forcefields, and have unique names). Gromacs treats atom types differently, if I remember correctly.
So far I can't reproduce this issue using only AMBER so we might be good for now. I will let you know if I find anything interesting.
Sorry, I know that ambertype
will be unique if the molecule is parameterised using one of the AMBER force fields. The issue is that this won't be the case if using OpenFF, so you'll get duplicate types with different parameters during a merge. This isn't an issue for SOMD (at least, I don't think it is) due to the way the perturbation is set up, but causes problems for GROMACS since the [ atomtypes ]
record is generated from the atomtype
property of the two end states, with no reference to whether the parameters, e.g. LJ
, actually differ. I've fixed this by uniquifying the atomtypes
on merge, but I'm not sure if I also need to do the same for the ambertypes
, in case this causes issues in the way the AMBER FEP topology is set up.
Great, thanks for confirming. I'll push a fix once I've had a chance to test a little further. At the moment it just adds an x
to the atomtype
property if a duplicate is found. I only check for duplicates twice, since (presumably) there would only be one duplicate of a given type, i.e. because we are merging two molecules. (It shouldn't be possible to have a duplicate within an end state.)
(It shouldn't be possible to have a duplicate within an end state.)
Ah, but that's another potential issue - OpenFF atom type names clash in general with protein atom type names so if both of them are combined, this will be the case. Or are these automatically renamed when two systems are combined?
Oh dear, that's a whole other can of worms. If this is the case, then it isn't a problem specific to merging, rather combining OpenFF parametrised molecules in any way and writing to file. As you suggest, unique types will either need to be enforced in the system combining functions, or when writing to file within the topology parsers. If this is a GROMACS specific issue, then it is probably best resolved in the Sire.IO.GroTop
parser itself.
(It would be even nicer if OpenFF could just generate unique names in the first place :-) )
I agree it is a can of worms. The AmberPrm
parser assumes that the names refer to unique atom types (%FLAG AMBER_ATOM_TYPE
). However, the amber types are not actually used for anything in a prmtop file, so this doesn't matter (they would only be used by, e.g. antechamber, to assign LJ parameters and bond/angle/dihedral parameters). The actual LJ parameters are used to get unique LJ types (%FLAG ATOM_TYPE_INDEX
) which index into the arithmetically combined parameters in %FLAG NONBONDED_PARM_INDEX
. Similarly, the actual bond/angle/dihedral parameters are used to set the values in the prmtop file.
So, for AmberPrm
it is an annoyance (different amber atom types will have different LJ parameters, which may be unexpected) but it is not breaking. For GroTop
, it is an issue because this format requires that all atom types are consistent (have the same mass, vdw parameters, element etc). I think you are right that this needs fixing in the GroTop
parser, which should detect if an atom type is not consistent, and then rename it as needed.
This will actually need to be done at the System
level, since the types need to be self consistent between the Gro87
and GroTop
files and it would be a pain to implement the same logic in both. When I get a chance I'll work on writing a C++ wrapper to do this, which can then be called before writing to Gro87
or GroTop
format.
What a pain!
(It was actually fairly straightforward to fix the [ atomtypes ]
part, I just then realised that it was only one part of the problem.)
Painful. That will be really slow to do at the System
level, because it is O(n^2) - every atom type in the added molecules will need checking against every atom type in the system. Adding molecules one by one, and performing the check each time, will be expensive.
I wrote a similar function that fixes that in ParmEd and as long as you use sets, it is pretty fast even on protein-ligand systems, so I guess this shouldn't be that bad for Sire either? Usually there are not that many atom types anyway.
Yes, I agree it will be quick if we use sets, and only do this operation when we want to write the molecule to a file. Thanks :-)
Actually, I think we might be okay with my original solution of simply fixing the [ atomtypes ]
record in the topology file. I forgot that the Gro87
file just contains a list of named atoms by index, which are matched against the [ atoms ]
section from the GroTop
file in order. The entries in the Gro87
file give the atom names, not types, and it appears that it's fine to have duplicate names for different types. (At least grommp
doesn't complain and the simulation runs without error.)
I'll try to think of a simple test to check that things are working as expected.
I've just pushed a potential fix that renames types if the name matches an existing type but the parameters differ. This is done in the topology writer only, so the properties in the original system will remain unchanged. It resolves the issue for the minimal example that you've provided, but I'm not sure if there's anything else that I might have overlooked. If you could test it out when you get a chance, that would be great. (Perhaps there are other naming issues with OpenFF that we need to be careful with.)
To test, you should only need to update the Sire development package once the build has finished:
conda update -c conda-forge -c michellab/label/dev sire
(BioSimSpace remains unchanged, hence why I transferred the issue here.)
Cheers.
Many thanks for that @lohedges , I will let you know if I find any other issues.
I've just run a quick test combining two OpenFF molecules and it's not quite working as expected. (More atom types are generated than necessary.) Will try to figure out what I've done wrong and will push a fix.
Okay, I think this is working now. It was more complicated than I thought since it wasn't just a case of creating a new type if there was an existing type with the same name and different parameters, i.e. an H1
in one molecule having different LJ parameters to an H1
in another. It turns out that an atom type in one molecule might actually match an existing type in the other, so there is no need to create a new type, just re-use the one that it maps to, e.g. just updating the type names. For instance, an H1
in one molecule might actually have the same parameters as, say, an H2
in the other.
I've also added a test in BioSimSpace that creates two OpenFF parameterised molecules, translates one of them so that they are sufficiently far apart so they don't interact, then creates a combined system. I then check that the sum of the single point energies for the two molecules in isolation is the same as that of the composite system. This fails prior to my fix (one molecule has incorrect parameters for one of the atom types), but works following it.
Hi,
I have discovered an issue, where if one tries to merge two molecules which have common atomtype names but different associated parameters, the parameters of the second molecule effectively get lost during writing out.
For example, let the first atomtype be as follows (as taken from a GROMACS top file):
and the second atomtype be as follows:
The resulting merged topology only has the following definitions:
Files to reproduce this behaviour can be found here.
This is a problem that shouldn't occur when one uses GAFF, because of the strict atom type names. However, OpenFF doesn't have canonical atom type names, so an
H1
in one molecule will not in general be the same as anH1
in another molecule. I think we need to figure out how to uniquify the atom types of the two molecules if they have different parameters but the same name.