TI Gives Non-Zero Free Energy Change with Dummy Perturbation

fjclark commented 2 years ago

Hello,

When calculating the free energy change for a perturbation involving only dummy atoms to dummy atoms, MBAR gives exactly zero kcal/ mol, as expected, whereas TI gives positive free energies which vary dramatically with delta_lambda. My version of Sire is only two commits behind devel and the files to reproduce the issue are here.

With the default delta_lambda = 0.001, and two lambda windows at 0 and 1, I get a free energy change of 0.8 kcal / mol. This drops to < 0.1 kcal / mol if I increase delta_lambda to 0.01, and increases substantially when I drop delta_lambda to 0.0005. If I add a lambda window at 0.001, the free energy increases to > 10^6 kcal / mol due to a gradient of > 10^7 kcal / mol for this window. Checking the reduced perturbed energies shows that the energies at the end points are identical, although there is a difference compared to the energies at lambda = 0.001.

By modifying this function I'm able to see that the non-zero gradients are exclusively due to changes in "lamhd". However, because all final and initial parameters are identical, it looks like the lamhd-dependent terms here should be symmetrical about lamdh = 0.5, and therefore I would expect equal and opposite gradients giving an overall free energy change of 0 when lambda windows are symmetrical about 0.5.

I've come across another situation where the gradient seems unreliable, although this could be due to my changes to sire. When turning on Boresch restraints between a protein and ligand (again using a dummy pert file) using this version of Sire and this input, I see very infrequent jumps to extremely negative gradients, while the reduced perturbed energies for neighbouring lambda values remain very similar. I only see this at lambda = 0.000, and I see it for each of five runs. The spikes are at the same values of total simulation time for each run (x-axis should show simulation time from 0 to 5 ns):

The MBAR estimate for the free energy change is as expected.

Any help with this would be greatly appreciated.

Thanks, Finlay

jmichel80 commented 2 years ago

Hi @fjclark

Could you check if the issue is replicated across platforms (OpenCL, CPU) ?
Check also if the issue occurs if you set the initial and final types to a non dummy type (e.g. 'c3') . The 'du' type is used to tell the code to apply softcore potentials, meaning interactions are scaled using lam(t/f)d flags. It is possible there is a bug meaning we do not cover this edge case where an atom in a dummy state initially is converted into a dummy atom in a final state.
Can you also print the single point energies (system.energies() ) from Sire (in OpenMMMD.py) at the beginning of each simulation?). This will help understand which forcefield terms are lambda dependent.

fjclark commented 2 years ago

Hi @jmichel80

When using OpenCL, the free energy estimate halves to ~ 0.4 kcal mol-1. The gradient at lambda = 1 stays very similar, so this is due to a reduction in the gradient at lambda = 0. With CPU, this seems to be running extremely slowly and I still don't have any results.
I get the same issue when running with initial and final atoms both have non-dummy types, with exactly the same gradient. Also, the figure above showing spikes in the gradient was obtained for a perturbation between non-dummy atoms with the same charges/ LJ parameters. I notice that there is no non-dummy to non-dummy group in OpenMMMD.py, and that when using dummy atoms, I see

Num bonds 1-4 From Dummy = 73 Num bonds 1-4 From Dummy To Dummy = 0

But this doesn't seem to be the cause of the gradient issue.

Checking the energies at the start of the simulation, on gpu, the energies at lambda = 0 and 1 seem to be identical, as expected. However, the energies at lambda = 0.001 also seem to be identical, but I would have expected a difference because the reduced perturbed energies are slightly different at each evaluation in simfile.dat. The initial energies also seem to be identical between gpu and opencl at lambda = 0 (see outputs here).

jmichel80 commented 2 years ago

CPU will be very slow for a protein-ligand simulation, but running just a few timesteps should be enough to find out if the gradients are null.
Can you also check if the same behaviour is observed in a vacuum or free simulation ?

fjclark commented 2 years ago

Thanks.

Yes, I am just waiting to see the first gradient evaluation, but even this hasn't happened.
For a free simulation, the TI free energy change is very small, but still non-zero (-0.001461 kcal mol-1)

fjclark commented 2 years ago

The issue is very likely due to custom_intra_14_clj (https://github.com/michellab/Sire/blob/2ed5eb41558403ee0fae920f778ca9a6bf823b5c/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L1094) used between atoms which are hard at the start and end. This is because:

It is the only force affected by lamhd, and when I modify the code so that lamhd is not modified when the gradient is calculated, the gradient is zero in all cases.
The free energy change scales with the number of 1-4 interactions between hard atoms. The change is greatest in the protein-ligand system, where the entire error is likely due to interactions in the protein only (no hard atoms in ligand), is much smaller in free calculations where the only 1-4 interactions are due to the ligand (when the .pert file maps fully-interacting to fully-interacting), and is zero in free calculations when I set all ligand atoms to dummies.

michellab / Sire

TI Gives Non-Zero Free Energy Change with Dummy Perturbation #383