rinikerlab / reeds

This pipeline is executing a RE-EDS run for relative free energy calculations. It can be used for example for calculation of hydration free energies or ligand binding free energies.
https://rinikerlab.github.io/reeds/
MIT License
30 stars 8 forks source link

lowest s-value in step b is 1 for vacuum simulations #11

Closed SalomeRonja closed 3 years ago

SalomeRonja commented 3 years ago

The new version of choosing the lowest s-value leads to the s-values being cut at 1 in my vacuum simulations

sampling_undersample_matrix

The s-values are already cut at 1. This leads to other problems in the pipeline, because no intermediate s-values can be added. I think we should add a criterion to say that the cut cannot be made higher than e.g. 0.01. @SchroederB & @candidechamp what do you think?

SalomeRonja commented 3 years ago

One suggestion would be to change the number of states from

undersampling_limit = sampling_analysis_results[
                              "undersamlingThreshold"] + 2

to

undersampling_limit = max(10, sampling_analysis_results[
                              "undersamlingThreshold"] + 2)

so we always have at least 10 s-values other than 1.

SalomeRonja commented 3 years ago

comment from @candidechamp :

This is an interesting issue. Your solution sounds like a good fix, but we should always be careful to test (on multiple systems ideally) whenever we want to add a change like this to see what happens. (i.e. would this change the results of the simulation in water, or would the result stay identical?) I also think that the criterion to determine whether a state is sampled (threshold energy and fraction of time-steps in which the state is sampled) are at the root of small problems like this. For ex. if your fraction threshold was 0.95, we wouldn't see a single s-value but ~ 5(although having just a few s-values might still be problematic in vacuum?).

SalomeRonja commented 3 years ago

Thanks for your input! Yes I agree that we should be careful with thia change, and I'm not adding anything to the master branch for now. I'll have a closer look at the influence of the fraction threshold as well.

SalomeRonja commented 3 years ago

comment from @SchroederB :

Ach the vacuum again :) Hmmmm... This is a very interesting problem. I agree with @candidechamp that the root of this issue is the sampling criterium. Thoughts:

SalomeRonja commented 3 years ago

To the first point: yes, even my simple benzene system in water still has 10 replicas with the new method of detecting undersampling To the second point: there is a slight improvement in the sampling and number of roundtrips after the 2nd s-optimization, but it's very slight for pnmt: pnmt_gaff_vacsopt_sopt_analysis for the benzene set: hyd_set_vacsopt_sopt_analysis If we decide that we don't need s-optimization for systems which are sampled "well enough" at s=1, we should make sure that the pipeline can handle going directly from step c to step e. I haven't tested this yet

SalomeRonja commented 3 years ago

comment by @SchroederB :

wow, that are beautiful plots! Sorry another question:

Thoughts: The plots look really good, like the systems don't necessarily need round trip optimization (RTO). From an Pipeline perspective, it still could make sense to run 1-3 steps, to check that there are round trips (RT), I think current criterium of convergence would be met already here in the first step. Convergence criterium will be available with !46 (merged) . (note by @SalomeRonja: this is an old merge from gitlab)

SalomeRonja commented 3 years ago

As an example, here are the energy offsets for the pnmt system in vacuum with the gaff topology:

after step c:

1.     0.0000   +-    0.0000
2.   -72.4001   +-    3.2641
3.     5.1894   +-    2.4373
4.    36.2632   +-    1.8295
5.    29.7072   +-    3.4684
6.  -665.2771   +-   14.5640
7.  -668.0656   +-   12.9445
8.   -56.4456   +-    3.1442
9.   141.4565   +-    4.5455

for sopt1:

1.     0.0000   +-    0.0000
2.   -70.9350   +-    0.6696
3.     1.4199   +-    0.4952
4.    34.2385   +-    0.6103
5.    26.2542   +-    0.7622
6.  -663.9284   +-    3.7298
7.  -663.1638   +-    6.1608
8.   -58.1913   +-    0.3749
9.   138.2061   +-    1.2140

for sopt2:

1.     0.0000   +-    0.0000
2.   -76.8620   +-    0.1396
3.     1.5848   +-    0.2840
4.    34.9445   +-    0.1947
5.    24.2528   +-    0.7482
6.  -665.3406   +-    4.0420
7.  -668.3811   +-    3.5151
8.   -55.2547   +-    0.6318
9.   134.9456   +-    1.0489

The energy offsets change by up to ~5kJ/mol between the different steps

This is the plot of the energy offsets for the different s-values for the same system in step c:

RE-EDS_eoffs_vs_s

Here, it can also nicely be seen that the average energy offset changes considerably depending on the lowest s-value that is used. I think this is a point that @cchampion also mentioned in our last meeting. So maybe we can skip the s-optimization if we have such great sampling already, but it still might make sense to use lower s-values to estimate more robust energy offsets.

SalomeRonja commented 3 years ago

comment by @candidechamp :

Very interesting data Salome. As you mention I noticed that depending on the replicas (number of s-values, and the values they take) used in the determination of energy offsets, we could get quite different results. (In water, both CHK1 and PNMT show deviations that can be more around ~ 10kJ/mol), and comparing energy offsets calculated for sopt1/sopt2 data, we can get even larger deviations. One other thing, is that I ran simulations with different sets of energy offsets, and found that even a "minor" difference of ~ 8 kJ/mol could result in very large differences in sampling (% occurrence of a state, where the state is considered to be sampled if it has the lowest pot. energy). I interpret this as follows:

RiesBen commented 3 years ago

I think this will be resolved with #22