michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Are atomtypes meant to be updated when using repartitionHydrogenMass? #418

Open noahharrison64 opened 1 year ago

noahharrison64 commented 1 year ago

Hi,

I'm attempting to use HMR with FEP calculations for longer sampling times. I noticed when using the repartitionHydrogenMass method that the hydrogen masses in the [atomtypes] directive of the top file are not updated to reflect the scaled hydrogen masses (i.e., they are labelled as ~1Da rather than 4Da), whereas in the [atoms] directive they are labelled as ~4Da.

Example:

; Gromacs Topology File written by Sire
; File written 02/06/23  16:36:15
[ defaults ]
; nbfunc      comb-rule       gen-pairs      fudgeLJ     fudgeQQ
  1           2               yes            0.5         0.833333

[ atomtypes ]
; name      at.num        mass      charge   ptype       sigma     epsilon
     c           6   12.010700    0.000000       A    0.331521    0.413379
    c3           6   12.010700    0.000000       A    0.339771    0.451035
    ca           6   12.010700    0.000000       A    0.331521    0.413379
    cl          17   35.453000    0.000000       A    0.346595    1.103739
    h4           1    1.007940    0.000000       A    0.253639    0.067362
    ha           1    1.007940    0.000000       A    0.262548    0.067362
    hc           1    1.007940    0.000000       A    0.260018    0.087027
    hn           1    1.007940    0.000000       A    0.110650    0.041840
     n           7   14.006700    0.000000       A    0.318086    0.684502
    nb           7   14.006700    0.000000       A    0.338417    0.393714
     o           8   15.999400    0.000000       A    0.304812    0.612119

[ moleculetype ]
; name  nrexcl
MOL  3

[ atoms ]
;   nr   type  resnr residue  atom   cgnr     charge         mass
     1     ca      1     MOL     C      1  -0.143679    12.010000
     2     ca      1     MOL     C      2   0.052321    12.010000
     3     ca      1     MOL     C      3  -0.129079     8.986000
     4     ca      1     MOL     C      4   0.127521    12.010000
     5      c      1     MOL     C      5   0.691621    12.010000
     6     ca      1     MOL     C      6  -0.303379     8.986000
     7     ca      1     MOL     C      7   0.444121     8.986000
     8     ca      1     MOL     C      8   0.052321    12.010000
     9     ca      1     MOL     C      9  -0.096079     8.986000
    10     ca      1     MOL     C     10  -0.129079     8.986000
    11     cl      1     MOL    Cl     11  -0.059979    35.450000
    12      o      1     MOL     O     12  -0.550179    16.000000
    13      n      1     MOL     N     13  -0.460179    10.986000
    14     nb      1     MOL     N     14  -0.737079    14.010000
    15     ca      1     MOL     C     15   0.550121    12.010000
    16     ca      1     MOL     C     16  -0.320379     8.986000
    17      n      1     MOL     N     17  -0.551479    10.986000
    18      c      1     MOL     C     18   0.669021    12.010000
    19      o      1     MOL     O     19  -0.589179    16.000000
    20     cl      1     MOL    Cl     20  -0.059979    35.450000
    21     ha      1     MOL     H     21   0.156921     4.032000
    22     ha      1     MOL     H     22   0.182921     4.032000
    23     h4      1     MOL     H     23   0.025021     4.032000
    24     ha      1     MOL     H     24   0.147921     4.032000
    25     ha      1     MOL     H     25   0.156921     4.032000
    26     hn      1     MOL     H     26   0.330421     4.032000
    27     ha      1     MOL     H     27   0.177921     4.032000
    28     hn      1     MOL     H     28   0.330421     4.032000
    29     c3      1     MOL     C     29  -0.132779     8.986000
    30     c3      1     MOL     C     30  -0.091179     2.938000
    31     c3      1     MOL     C     31  -0.091179     2.938000
    32     hc      1     MOL     H     32   0.066621     4.032000
    33     hc      1     MOL     H     33   0.047121     4.032000
    34     hc      1     MOL     H     34   0.047121     4.032000
    35     hc      1     MOL     H     35   0.047121     4.032000
    36     hc      1     MOL     H     36   0.047121     4.032000
    37     hc      1     MOL     H     37   0.047121     4.032000
    38     hc      1     MOL     H     38   0.047121     4.032000

It's not clear to me whether this is intended, or whether this is a problem, hence why I'm raising this issue.

Thanks, Noah

lohedges commented 1 year ago

Thanks, @noahharrison64. I'll take a look. The code just adjusts the mass property, so I'll see how this is used by the AMBER topology parser. It may be pre-filling some info from the atomic number, e.g. just using the internal element information. If the entries in the [atoms] section take precedence, then all should be okay, I'm just not 100% certain if this is the case in all circumstances.

lohedges commented 1 year ago

(Sorry, meant to say the GROMACS topology parser.)

lohedges commented 1 year ago

Yes, the mass is coming from the mass of the element, which isn't modified. I think this is okay, though, since the mass in the [atoms] section takes precedence. For example, this would be the case for an FEP simulation, since the mass in the [atoms] section gives the modified mass in the given end state, whereas the [atomtypes] section gives you the mass of the unmodified atom type. I'll see if I can think of something simple that will check that this is the case.

noahharrison64 commented 1 year ago

Hi Lester,

Thanks for getting back to me and checking this. Reassuring that [atoms] directive takes precedence over [atomtypes] and this shouldn't make a difference to FEP calcs.

Just to run something else by you - I was having instability issues with running a certain transformation (Methyl -> Propyl R group transformation) and this only evolved when I was using HMR and long timestep (i.e. was fine with 2fs and no HMR). This happens during initial equilibration, following energy minimisation. Do you think I could solve this by running a EM and short EQ with no HMR / short timestep, followed by repartitioning hydrogens and then a longer equilibration?

lohedges commented 1 year ago

Hi there, It's hard to say, but it certainly wouldn't hurt to try. I could certainly imagine that HMR might cause issues if the system was poorly equilibrated to begin with.

jmichel80 commented 1 year ago

@annamherz may be able to comment further. I would suggest a gentle equilibration protocol that raises temperature gradually with restraints on ligand and protein atoms to begin with, followed by gradual release. The following node may be useful https://github.com/michellab/BioSimSpace/blob/devel/nodes/playground/BSSEqBoundNode.ipynb

noahharrison64 commented 1 year ago

I would suggest a gentle equilibration protocol that raises temperature gradually with restraints on ligand and protein atoms to begin with, followed by gradual release. The following node may be useful https://github.com/michellab/BioSimSpace/blob/devel/nodes/playground/BSSEqBoundNode.ipynb

Hi Julien, thanks for weighing in.

Are you suggesting gentle equilibration with HMR and 4fs timestep? Or gentle equilibration followed by repartitioning and further equilibration?

Thanks, Noah

jmichel80 commented 1 year ago

Equilibration with 4 fs HMR

annamherz commented 1 year ago

I noticed when using the repartitionHydrogenMass method that the hydrogen masses in the [atomtypes] directive of the top file are not updated to reflect the scaled hydrogen masses (i.e., they are labelled as ~1Da rather than 4Da), whereas in the [atoms] directive they are labelled as ~4Da.

I think this is okay, though, since the mass in the [atoms] section takes precedence.

I think this is also true for the other atom types as well, eg c3 - the mass at the top cannot be the mass used as the mass used changes depending on how many H are attached and how it is repartitioned.

I was having instability issues with running a certain transformation (Methyl -> Propyl R group transformation) and this only evolved when I was using HMR and long timestep (i.e. was fine with 2fs and no HMR)

I have also been having instability issues for this type of perturbation as well, and it is able to proceed at a lower timestep with no HMR. I'm still not sure entirely what is causing the instabilities, but am currently investigating!

noahharrison64 commented 1 year ago

Hi @annamherz

I have also been having instability issues for this type of perturbation as well, and it is able to proceed at a lower timestep with no HMR. I'm still not sure entirely what is causing the instabilities, but am currently investigating!

Yep same here - no HMR and 2fs timestep is no problem. I've tried using gentle equilibration as suggested by Julien - restraint on the ligand and protein, followed by 500ps slow warming but I get crashes within the first 30ps (at 0K...) of simulation. I'm going to try with a HMR scaling factor of 3 and see if I can get anywhere.

annamherz commented 1 year ago

Sounds good! I'm currently trying 2fs with HMR to see if it is the HMR itself that is the issue.

lohedges commented 1 year ago

What do the trajectories look like when they crash? Are there any weird fluctuations or steric clashes?

fjclark commented 1 year ago

I noticed that the default hydrogen mass with perses has been changed from 4 to 3 amu fairly recently due to stability issues. The example given in the issue looks very similar - a CH3 to CH2 perturbation resulting in both carbon and Hs of ~ 4 amu.

noahharrison64 commented 1 year ago

Okay good news - switching from HMR scaling factor of 4 to 3 prevents the solvent leg of the simulation breaking, while retaining the 4fs timestep! Feeling optimistic that this will be the same for the complex side as it was the ligand that was causing the issues - not the protein.

What do the trajectories look like when they crash? Are there any weird fluctuations or steric clashes?

My trajectories were crashing so quickly I actually wasn't producing any frames. The starting structures looked reasonable other than the fact it's a hybrid topology and so strange bond connectivity.

annamherz commented 1 year ago

Okay good news - switching from HMR scaling factor of 4 to 3 prevents the solvent leg of the simulation breaking, while retaining the 4fs timestep! Feeling optimistic that this will be the same for the complex side as it was the ligand that was causing the issues - not the protein.

Hello! Doing this also meant my solvent leg was no longer failing, however my bound leg is still failing for the CH3 -> CH2-cyclopropyl with a factor of 3. I was wondering if you had any more luck?

My trajectories were crashing so quickly I actually wasn't producing any frames. The starting structures looked reasonable other than the fact it's a hybrid topology and so strange bond connectivity.

This is the same for me.

noahharrison64 commented 1 year ago

Hello! Doing this also meant my solvent leg was no longer failing, however my bound leg is still failing for the CH3 -> CH2-cyclopropyl with a factor of 3. I was wondering if you had any more luck?

Hey, the complex side was fine when using a re-scaling factor of 3. Odd that it's failing for you when the problematic part is the ligand, not the protein (at least this was the case for me). I am using simulated annealing in NVT to initially equilibrate my systems, followed by NPT at constant temperature.