Closed SalomeRonja closed 3 years ago
One suggestion would be to change the number of states from
undersampling_limit = sampling_analysis_results[
"undersamlingThreshold"] + 2
to
undersampling_limit = max(10, sampling_analysis_results[
"undersamlingThreshold"] + 2)
so we always have at least 10 s-values other than 1.
comment from @candidechamp :
This is an interesting issue. Your solution sounds like a good fix, but we should always be careful to test (on multiple systems ideally) whenever we want to add a change like this to see what happens. (i.e. would this change the results of the simulation in water, or would the result stay identical?) I also think that the criterion to determine whether a state is sampled (threshold energy and fraction of time-steps in which the state is sampled) are at the root of small problems like this. For ex. if your fraction threshold was 0.95, we wouldn't see a single s-value but ~ 5(although having just a few s-values might still be problematic in vacuum?).
Thanks for your input! Yes I agree that we should be careful with thia change, and I'm not adding anything to the master branch for now. I'll have a closer look at the influence of the fraction threshold as well.
comment from @SchroederB :
Ach the vacuum again :) Hmmmm... This is a very interesting problem. I agree with @candidechamp that the root of this issue is the sampling criterium. Thoughts:
To the first point: yes, even my simple benzene system in water still has 10 replicas with the new method of detecting undersampling To the second point: there is a slight improvement in the sampling and number of roundtrips after the 2nd s-optimization, but it's very slight for pnmt: for the benzene set: If we decide that we don't need s-optimization for systems which are sampled "well enough" at s=1, we should make sure that the pipeline can handle going directly from step c to step e. I haven't tested this yet
comment by @SchroederB :
wow, that are beautiful plots! Sorry another question:
Thoughts: The plots look really good, like the systems don't necessarily need round trip optimization (RTO). From an Pipeline perspective, it still could make sense to run 1-3 steps, to check that there are round trips (RT), I think current criterium of convergence would be met already here in the first step. Convergence criterium will be available with !46 (merged) . (note by @SalomeRonja: this is an old merge from gitlab)
As an example, here are the energy offsets for the pnmt system in vacuum with the gaff topology:
after step c:
1. 0.0000 +- 0.0000
2. -72.4001 +- 3.2641
3. 5.1894 +- 2.4373
4. 36.2632 +- 1.8295
5. 29.7072 +- 3.4684
6. -665.2771 +- 14.5640
7. -668.0656 +- 12.9445
8. -56.4456 +- 3.1442
9. 141.4565 +- 4.5455
for sopt1:
1. 0.0000 +- 0.0000
2. -70.9350 +- 0.6696
3. 1.4199 +- 0.4952
4. 34.2385 +- 0.6103
5. 26.2542 +- 0.7622
6. -663.9284 +- 3.7298
7. -663.1638 +- 6.1608
8. -58.1913 +- 0.3749
9. 138.2061 +- 1.2140
for sopt2:
1. 0.0000 +- 0.0000
2. -76.8620 +- 0.1396
3. 1.5848 +- 0.2840
4. 34.9445 +- 0.1947
5. 24.2528 +- 0.7482
6. -665.3406 +- 4.0420
7. -668.3811 +- 3.5151
8. -55.2547 +- 0.6318
9. 134.9456 +- 1.0489
The energy offsets change by up to ~5kJ/mol between the different steps
This is the plot of the energy offsets for the different s-values for the same system in step c:
Here, it can also nicely be seen that the average energy offset changes considerably depending on the lowest s-value that is used. I think this is a point that @cchampion also mentioned in our last meeting. So maybe we can skip the s-optimization if we have such great sampling already, but it still might make sense to use lower s-values to estimate more robust energy offsets.
comment by @candidechamp :
Very interesting data Salome. As you mention I noticed that depending on the replicas (number of s-values, and the values they take) used in the determination of energy offsets, we could get quite different results. (In water, both CHK1 and PNMT show deviations that can be more around ~ 10kJ/mol), and comparing energy offsets calculated for sopt1/sopt2 data, we can get even larger deviations. One other thing, is that I ran simulations with different sets of energy offsets, and found that even a "minor" difference of ~ 8 kJ/mol could result in very large differences in sampling (% occurrence of a state, where the state is considered to be sampled if it has the lowest pot. energy). I interpret this as follows:
I think this will be resolved with #22
The new version of choosing the lowest s-value leads to the s-values being cut at 1 in my vacuum simulations
The s-values are already cut at 1. This leads to other problems in the pipeline, because no intermediate s-values can be added. I think we should add a criterion to say that the cut cannot be made higher than e.g. 0.01. @SchroederB & @candidechamp what do you think?