Multiple References - Githubissues

ramess101 commented 7 years ago

@mrshirts I am trying to modify the code so that it can handle multiple references. In other words, say we start with TraPPE and get a new optimal with lambda = 14. Then we do direct simulation of this new optimal. What we have currently been doing is optimizing the parameters by using reruns/MBAR with just the new optimal (lambda = 14) as the reference. Previously, we discussed that the first few iterations we might not need additional references but later on we probably would want multiple references to help reduce the MBAR uncertainties. So what I am currently working on is just making the code ALWAYS use each of the previous references with MBAR.

The python code "optimization_Mie_ITIC_multiple_refs" is where I have been working on this. I have it now that it will submit the Gromacs scripts to generate the energy_press_refrr.xvg files and I have the directory hierarchy to find those files appropriately. My question is regarding the shape and order of the u_kn and N_k arrays that are passed to MBAR.

For example, say we have two reference systems and a single new system. We have 1000 snapshots from each of the two reference systems and 0 from the new system. Should u_kn be a 3x2000 matrix? The three is from the three force fields and the 2000 is the 2000 snapshots (1000 from system 0 and 1000 from system 1). Then the first column of u_kn would be the u values evaluated with reference force field 0, the second column would be with reference force field 1 and the third column would be with the new force field. The first 1000 rows of each column would correspond to the configurations from reference 0 while the second 1000 rows would be from reference 1, correct? Then N_k should just be a three element array with [1001, 1001, 0], right?

mrshirts commented 7 years ago

Yes, entirely correct. u_kn should be 3x2000. three forcefields = the one the first one was simulated at, the 2nd was the 2nd one was simulated at, and 3 the one you are interested in. 2000 for 2000 sampled states. Actually, it can be (2+n) x 2000, since you can determine the free energies at any of n new states simultaneously (the iteration is only over sampled states). I think N_k should be [1000,1000,0],though.

ramess101 commented 7 years ago

Great.

Actually, it can be (2+n) x 2000, since you can determine the free energies at any of n new states simultaneously

True, for our purpose of optimization we do it one at a time though because we don't know the n new states ahead of time. They are only known after the previous step in the iteration is completed.

I think N_k should be [1000,1000,0],though.

Yes, that was a typo. Fortunately, the number of snapshots are not hard coded as they are read from the energy file.

ramess101 commented 7 years ago

Preliminary results suggest that we might need to have a single reference point at each value of lambda to get better MBAR results. Specifically, using the lambda of 12 (TraPPE) and lambda of 16 (Potoff) as the reference systems did not provide significantly better interpolation for lambdas of 13-15. The results are essentially the same as when lambda of 16 was the only reference, and the Neff are relatively unchanged. That being said, it is possible that we could select sigma appropriately so that a lambda of 12 or lambda of 16 has better overlap. So we should not think of sigma and lambda as totally independent choices.

Here is a plot of the objective function for lambda of 13-16 (ignore 17 as this crashed before it converged. Not sure why lambda of 17 crashed, might have been a resources issue on calc1).

The main point is that, upon comparison with the plots pressented in Issue #4, the optimal and SSE values are the same as when only Potoff was used as a reference.

ramess101 / MBAR_ITIC

Multiple References #5