Closed kexul closed 2 years ago
It looks like something funny has happened with the water topology detection code, since it is stating that all molecules in the system are waters, even though they aren't. This is then crashing when converting to AMBER format, where the water topology is swapped.
mol = BSS.IO.readMolecules(['ligand.gro', 'ligand.top'])
print(mol, mol[0])
<BioSimSpace.System: nMolecules=653> <BioSimSpace.Molecule: nAtoms=27, nResidues=1>
waters = mol.getWaterMolecules()
print(waters, waters[0].isWater())
<BioSimSpace.Molecules: nMolecules=653> False
I'll try to figure out what's going wrong.
It looks like the topology file is wrong and all of the elements in the system are being loaded as dummy atoms for some reason. This is somehow breaking our water detection code. (I should fix this part, at least.)
mol = BSS.IO.readMolecules(['ligand.gro', 'ligand.top'])
print(mol[0]._sire_object.property("element"))
AtomProperty<SireMol::Element>( [ dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0) ] )
print(mol[1]._sire_object.property("element"))
AtomProperty<SireMol::Element>( [ dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0) ] )
Thanks, the files were taken from a Gromacs simulation tutorial, but I could not find its link right now 😥
Okay, I've pushed a fix to Sire that correctly accounts for the presence of dummy atoms when detecting water molecules, i.e. so a molecule that is entirely made of dummies isn't erroneously marked as a water. This means that you can now convert your file to AMBER format. However, whether this is a valid file is another matter.
Let me know if you manage to track down the source of the GROMACS topology. If it is valid and we should be able to read it correctly, i.e. with fully element information, let me know and I'll try to work out what's gone wrong.
Cheers.
Thanks for the quick fix. I've checked the files' header and it seems to be generated by acpype.
I just tested some other molecules generate by acpype
in gromacs format,all of them failed to convert.
I'm sure they are valid, we've run simulations with that files without any problem.
I think that this is likely the issue:
[ atomtypes ]
;name bond_type mass charge ptype sigma epsilon Amb
c3 c3 0.00000 0.00000 A 3.39967e-01 4.57730e-01 ; 1.91 0.1094
c c 0.00000 0.00000 A 3.39967e-01 3.59824e-01 ; 1.91 0.0860
o o 0.00000 0.00000 A 2.95992e-01 8.78640e-01 ; 1.66 0.2100
hc hc 0.00000 0.00000 A 2.64953e-01 6.56888e-02 ; 1.49 0.0157
OW OW 0.00000 0.00000 A 3.15075e-01 6.35968e-01 ; 1.77 0.1520
HW HW 0.00000 0.00000 A 0.00000e+00 0.00000e+00 ; 0.00 0.0000
[ moleculetype ]
;name nrexcl
ligand 3
[ atoms ]
; nr type resi res atom cgnr charge mass ; qtot bond_type
¦1 c3 1 HOS C1 1 0.111106 12.01000 ; qtot 0.111
¦2 c 1 HOS C2 2 0.609212 12.01000 ; qtot 0.720
¦3 c3 1 HOS C3 3 -0.480430 12.01000 ; qtot 0.240
¦4 c3 1 HOS C4 4 0.089598 12.01000 ; qtot 0.329
¦5 c3 1 HOS C5 5 -0.176806 12.01000 ; qtot 0.153
¦6 c3 1 HOS C6 6 -0.108588 12.01000 ; qtot 0.044
¦7 c3 1 HOS C7 7 0.441836 12.01000 ; qtot 0.486
¦8 c3 1 HOS C8 8 -0.496402 12.01000 ; qtot -0.010
¦9 c3 1 HOS C9 9 -0.496402 12.01000 ; qtot -0.507
10 c3 1 HOS C10 10 -0.470172 12.01000 ; qtot -0.977
11 o 1 HOS O1 11 -0.607595 16.00000 ; qtot -1.585
12 hc 1 HOS H1 12 0.148486 1.00800 ; qtot -1.436
13 hc 1 HOS H2 13 0.148486 1.00800 ; qtot -1.288
14 hc 1 HOS H3 14 0.009848 1.00800 ; qtot -1.278
15 hc 1 HOS H4 15 0.056431 1.00800 ; qtot -1.221
16 hc 1 HOS H5 16 0.056431 1.00800 ; qtot -1.165
17 hc 1 HOS H6 17 0.061127 1.00800 ; qtot -1.104
18 hc 1 HOS H7 18 0.061127 1.00800 ; qtot -1.043
19 hc 1 HOS H8 19 0.115452 1.00800 ; qtot -0.927
20 hc 1 HOS H9 20 0.115452 1.00800 ; qtot -0.812
21 hc 1 HOS H10 21 0.115452 1.00800 ; qtot -0.696
22 hc 1 HOS H11 22 0.116665 1.00800 ; qtot -0.580
23 hc 1 HOS H12 23 0.116665 1.00800 ; qtot -0.463
24 hc 1 HOS H13 24 0.116665 1.00800 ; qtot -0.346
25 hc 1 HOS H14 25 0.115452 1.00800 ; qtot -0.231
26 hc 1 HOS H15 26 0.115452 1.00800 ; qtot -0.115
27 hc 1 HOS H16 27 0.115452 1.00800 ; qtot -0.000
The mass and charge for all of the entries in the atomstypes
record are zero. I imagine that Sire might be using this mass to infer the element, rather than using the value in the atoms
section. I'll check when I get a chance.
Nope, that's not it. I replace the values in the atomtypes
section and it still loads everything as dummy atoms.
This is where things are going wrong:
if (ok_elem)
{
typ = GromacsAtomType(words[0], mass*g_per_mol, chg*mod_electron,
ptyp, ::toLJParameter(v,w,combining_rule),
Element(nprotons));
}
else
{
//some gromacs files don't use 'nprotons', but instead give
//a "bond_type" ambertype
typ = GromacsAtomType(words[0], words[1], mass*g_per_mol, chg*mod_electron,
ptyp, ::toLJParameter(v,w,combining_rule));
}
For the acpype
files we're falling into the else
block, and all types are being flagged as dummies.
Since the acpype
files use bond_type
, it looks like it is using the mass and charge from the atomtypes
section to infer the element. I think I may have ran into Sire's file caching when I tried modifying it locally.
It's not passing an element through to the constructor, so is using the default of dummy, i.e. SireMol::Element(0)
:
GromacsAtomType(QString atom_type, QString bond_type,
SireUnits::Dimension::MolarMass mass,
SireUnits::Dimension::Charge charge,
PARTICLE_TYPE particle_type,
const LJParameter &ljparam,
const SireMol::Element &element = SireMol::Element(0));
It looks like any GROMACS topology in this format will have the same issue. This might be fine if you only intend to use the file for a GROMACS simulation, but might cause issues if converting to a different format that expects element information to be correct.
This might be fine if you only intend to use the file for a GROMACS simulation, but might cause issues if converting to a different format that expects element information to be correct.
Thanks for the detailed information, I'll keep that in mind.
I just tried converting the format by parmed
, and it works!
mol = BSS.IO.readMolecules(['ligand.gro', 'ligand.top'])
print(mol[0]._sire_object.property("element"))
mol_parmed = parmed.load_file('ligand.top', xyz='ligand.gro')
mol_parmed.save('amb.parm7')
mol_parmed.save('amb.rst7')
mol_amber = BSS.IO.readMolecules(['amb.parm7', 'amb.rst7'])
print(mol_amber[0]._sire_object.property("element"))
Output:
AtomProperty<SireMol::Element>( [ dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0) ] )
AtomProperty<SireMol::Element>( [ Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),Oxygen (O, 8),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1) ] )
I'll take a look at how Parmed is inferring the element information. It might be one of those things that works if the atoms have standard GROMACS naming, but wouldn't work in general.
Yes, ParmEd just naively guesses the element from the atom name. See here for details. In Sire we make no assumptions about the atom naming since it is irrelevant for the molecular potential. It also doesn't generalise since the file might have been converted from a different format (with its own naming convention) or customised by the user.
I'll have a think about how to warn the user if they read a GROMACS topology in this format. We might also be able to use a similar element_by_name
function, but only in an optional way, i.e. if the user requests it. (We don't want to make mistakes by guessing things.)
Actually, it looks like this might be a fallback after using the mass, which would be more robust. Could try that instead when I get a chance.
Sire already has this functionality with Element::elementWithMass
, but this wouldn't work for the acpype
examples since all of the masses in the atomtypes
record are zero.
Yes, ParmEd just naively guesses the element from the atom name. See here for details.
Hi @lohedges , I just tested anther OPLS
type of gromacs topology file generated by acpype
. Sire failed to read it.
Parmed correctly inferred the atom type, but there is no atomtype
field in the topology file, maybe it's inferring from the atom
field?
test.zip
I've pushed a workaround to Sire. This will infer the element from the mass if the topology file uses a bond type atom type, rather than providing the number of protons. If the mass in the atom
record section doesn't match that of the atomtype
, then it is re-inferred using this mass, which would give the correct elements for your acpype
files.
Great!I'll test the updated version when it's available from conda.
Yes, everything worked like a charm! Many thanks for your work! Appreciated it!
Hi @lohedges , There is another edge case where BioSimSpace failed to identify the element type: ejm46.zip
Code to reproduce:
import BioSimSpace as BSS
import parmed
mol = BSS.IO.readMolecules(['ejm46_AC.prmtop', 'ejm46_AC.inpcrd'])
print(mol[0]._sire_object.property("element"))
mol_parmed = parmed.load_file('ejm46_AC.prmtop', xyz='ejm46_AC.inpcrd')
elements = [atom.element_name for atom in mol_parmed]
print(elements)
Output:
AtomProperty<SireMol::Element>( [ dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),Carbon (C, 6),Carbon (C, 6),Nitrogen (N, 7),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),dummy (Xx, 0),Chlorine (Cl, 17),Carbon (C, 6),Carbon (C, 6),Carbon (C, 6),dummy (Xx, 0),Chlorine (Cl, 17),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1),Hydrogen (H, 1) ] )
['O', 'C', 'N', 'C', 'C', 'C', 'N', 'C', 'N', 'C', 'O', 'C', 'C', 'C', 'C', 'C', 'C', 'Cl', 'C', 'C', 'C', 'C', 'Cl', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H']
I'm not sure whether it's a valid Amber parameter file, it's the result files I got by this package: https://github.com/Acellera/parameterize, they seem to use some customized atom type.
There is no %FLAG ATOMIC_NUMBER
section in the topology file, so it looks like Sire is inferring the element (incorrectly) in some other way. I imagine that we could play the same trick as with the GROMACS topology and get the element using the %FLAG_MASS
record instead. I'll try this when I get a chance.
There is no
%FLAG ATOMIC_NUMBER
section in the topology file
Hi @lohedges , there is a ATOMIC_NUMBER section, but with some -1
in it.
%FLAG ATOMIC_NUMBER
%FORMAT(10I8)
-1 -1 -1 -1 6 6 7 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 17 6 6
6 -1 17 1 1 1 1 1 1 1
1 1 1 1 1 1
I imagine that we could play the same trick as with the GROMACS topology and get the element using the
%FLAG_MASS
record instead. I'll try this when I get a chance.
Thanks, I just find that I could use parmed
to read and save it back, then the atomic_number
section is corrected, and BioSimSpace could infer the element type correctly. 😄
Ah, sorry, I must have grep'ed the wrong thing! Yes, if that record is present and the information is incorrect, then BioSImSpace will set things as a dummy. I can just catch all atomic numbers equal or than zero and use the mass to infer the element instead. (Was trying with zero, but that obviously doesn't work here.)
There we go, all sorted using the same approach as used for the GROMACS issue. Keep your janky topology files coming ;-)
Wow, ultrafast!!! I'll try it soon and report. 😋
The build failed since there are unit tests that compare a parsed AMBER system against the new and old topology parser. Looks like I need to fix this in the old parser too, which might not be as easy. Will let you know when it's sorted.
Okay, I've updated it in the old parser too (it was equally easy). I've checked that tests pass locally so it should build this time around. (I always forget that there are two AMBER parsers, which isn't desirable from a maintenance perspective.)
Actually, we might not be able to do this since it will break our ability to load perturbable systems written to file by BioSimSpace. (Elements writen as dummies will be read as something else.) I'm now wondering if the GROMACS fix might cause issues too, e.g. with the way in which you read split GROMACS files. I think this will need more testing since I don't want to break something fundamental to BioSimSpace for the purposes of handling some atypical files.
For now I think I'll change the code to only apply the fix when the atomic number is -1. This will catch these weird files, but leave massed dummy atoms required by FEP simulations unchanged.
Actually, we might not be able to do this since it will break our ability to load perturbable systems written to file by BioSimSpace.
That sounds horrible, thankfully the unit test catches the case.
Keep your janky topology files coming ;-)
Hi @lohedges , here it is: test_files.zip which was generated by this package https://github.com/selimsami/qforce .
It fails because the angle parameter cannot be parsed correctly:
Cannot convert a Urey-Bradley Gromacs angle into an expression of only the angle size if the UB bond force constant is non-zero - GromacsAngle( functionType() = Urey-Bradley, parameters() = [ 123.521, 518, 0.24792, 37938.8 ] )
Loading by parmed
is fine, but the dihedral parameters are not parsing correctly. After loading and saving by parmed, Sire complains about the dihedral parameter:
Cannot convert the 'improper angle' type Gromacs dihedral 'GromacsDihedral( functionType() = improper dihedral, parameters() = [ 180, 203.756 ] )' to an expression using only the torsion angle 'phi'
Here is the code I've used:
import BioSimSpace as BSS
import parmed
mol = BSS.IO.readMolecules(['gas.gro', 'gas.top'])
print(mol[0]._sire_object.property("element"))
mol_parmed = parmed.load_file('gas.top', xyz='gas.gro')
elements = [atom.element_name for atom in mol_parmed]
print(elements)
print(mol_parmed.dihedrals)
mol_parmed.save('new.top', overwrite=True)
mol_new = BSS.IO.readMolecules(['new.top', 'gas.gro'])
print(mol_new[0]._sire_object.property("element"))
I've run short energy minimization and MD simulation with the following .mdp
file by Gromacs, the output seems to be reasonable, so the topology file should be valid.
integrator = steep
nsteps = 500
integrator = md
nsteps = 10000
Looking at the Sire code it looks like we don't support this dihedral type. We can parse the file, but can't convert this to an computer algebra expression internally:
/** Return this function converted to a SireCAS::Expression using the passed symbol
to represent the torsion size */
SireCAS::Expression GromacsDihedral::toExpression(const SireCAS::Symbol &phi) const
{
const double kj_per_mol = kJ_per_mol.value();
const double deg = degree.value();
if (func_type == 1 or func_type == 4 or func_type == 9)
{
//standard periodic / non-periodic proper / improper dihedral
// k ( 1 + cos( n phi - phi_s ))
const double k0 = k[1] * kj_per_mol;
const double phi_s = k[0] * deg;
const double n = k[2];
return k0 * ( 1.0 + Cos( (n*phi) - phi_s ) );
}
else if (func_type == 2)
{
throw SireError::incompatible_error( QObject::tr(
"Cannot convert the 'improper angle' type Gromacs dihedral '%1' to an expression "
"using only the torsion angle 'phi'")
.arg(this->toString()), CODELOC );
}
...
I'll take a look at the GROMACS manual to read up on function type 2. I know we only have full support for AMBER style force fields, but can convert formats where possible, e.g. Ryckaert-Bellemans dhedrals.
The build failed since there are unit tests that compare a parsed AMBER system against the new and old topology parser.
Hi @lohedges , I just checked the files in conda: https://anaconda.org/michellab/sire/files,
The latest update seems to be 5 days ago, are the recent updates of file parsing (for example https://github.com/michellab/Sire/commit/0122d3fdbf8da61a5967722ac7bf83f259abb1a9) available through conda installation?
Not yet, I'm afraid. We have a problem with the Sire Azure build following an update to the macOS 10.15 image (10.14 is now deprecated). The Linux build should be unaffected so I can run that in isolation later today if I can't fix it otherwise.
The latest GROMACS issue looks like a bug in the GROMACS topology parser. It is always trying to resolve the potential expressions in the same way, irrespective of the function type. There is specific code to deal with the different formats, but it isn't being invoked. I'll try to switch it to calling a different method depending on the function type to see what happens.
I can fix the GROMACS dihedral issue, but it looks like we don't support Urey-Bradley type Angle functions if the UB force constant is non-zero:
else if (func_type == 5)
{
//Urey-Bradley : 0.5 k_t (theta - theta0)^2 + 0.5 k_b ( r - r0 )^2
const double k0 = k[1] * kj_per_mol_per_rad2;
const double theta0 = k[0] * deg;
const double kb = k[3] * kj_per_mol_per_nm2;
//const double r0 = k[2] * nm;
if (kb != 0)
throw SireError::incompatible_error( QObject::tr(
"Cannot convert a Urey-Bradley Gromacs angle into an expression of only "
"the angle size if the UB bond force constant is non-zero - %1")
.arg(this->toString()), CODELOC );
return 0.5 * k0 * SireMaths::pow_2(theta - theta0);
}
I think it's not mathematically possible to do an exact UB to harmonic potential conversion in general. Parmed probably silently fails to convert correctly.
Yes, I agree. That said, I didn't think Parmed stored things internally as an expression, so there wouldn't be an issue if you didn't try to convert it to a different format, i.e. UB would be okay if you just read and write to GROMACS format, but would fail if you tried to convert to AMBER. I'll look at the Parmed output file to see if has indeed incorrectly converted the terms.
Just to note that the errors (angle and dihedral terms) are both present for the original and ParmEd files. The reasons that you sometimes see one or the other is that the topology is parsed in parallel, so sometimes the angles trigger an exception before the dihedrals, and vice versa. As expected, it looks like Parmed is simply preserving the topology since it is being written back to the same format (no conversion going on). The only thing it seems to do is zero certain UB angle terms, depending on the values of the force constant, e.g.
# Original (ejm46_minimised_qforce.itp)
6 5 25 5 121.784 281.620 0.21799 0.000
# Parmed (new.top)
6 5 25 5 121.7840000 281.620000 0.0000000 0.000000
Thanks for the detailed information, very clear!
Hi there, this is an edge case that could be read by BioSimSpace but could not be converted to amber format:
gmx-amb.zip