ramess101 / MBAR_LJ_two_states

0 stars 1 forks source link

Argon questions #2

Closed ramess101 closed 7 years ago

ramess101 commented 7 years ago

Michael,

I have invited you to collaborate on two github repositories, "MBAR_LJ_two_states" and "MBAR_sampling_LJ_methane".

MBAR _LJ_two_states is a very simple python code that helps demonstrate what I was trying to explain yesterday. Essentially, if I only use configurations from state 0 I get a very poor prediction of internal energy and density for state 1. You were correct in your assumption that the reason why this is happening is because state 1 only has a single configuration that has a non-negligible weight (sample 986). The sigma and epsilon values are different by about 1% and 3%, respectively, for states 0 and 1. I guess I am surprised that this small of a deviation in sigma and epsilon would result in such poor overlap.

MBAR_sampling_LJ_methane is a modified version of the code that you provided me. It requires a few of the additional files that you provided me in lj_bayesian (outlined in the README file). I believe that this file demonstrates that the previous conclusion described for MBAR_LJ_two_states is a general one. In other words, the expectation values for all 91 other states are not accurate when they are obtained using configurations from only state 0. Again, it looks like this is a weighting/overlap issue. (Note that I have changed line 465 so that the data are only imported from state 0 when load_ukln is set to true.)

I guess you already explained why this is the case, my only questions are:

  1. Am I implementing MBAR correctly?
  2. Is this a typical result when only a single state is used for generating configurations?

Let me know if the code or plots are unclear.

Thanks

Rich

ramess101 commented 7 years ago

Hi, Rich-

Sorry to take so long to respond. Hope graduation went well! Looking like an icy drive for us to Utah tomorrow.

Ø MBAR _LJ_two_states is a very simple python code that helps demonstrate what I was trying to explain yesterday. Essentially, if I only use configurations from state 0 I get a very poor prediction of internal energy and density for state 1. You were correct in your assumption that the reason why this is happening is because state 1 only has a single configuration that has a non-negligible weight (sample 986). The sigma and epsilon values are different by about 1% and 3%, respectively, for states 0 and 1. I guess I am surprised that this small of a deviation in sigma and epsilon would result in such poor overlap.

Yep, that’s what I’m getting as well. They key statistic to look at here is the distribution of the difference in dimensionless energies, in this case, u_kn[1,:]-u_kn[0,:] has a standard deviation of 5-6 kT, which is a little too big. In particular, there’s one at -67 (which is the one that has the highest weight of 0.66), where the average is about -80, so 13 kT away; the next nearest is at -70, so it’s ~exp(-3) lower in weight – and the rest, even lower.

So, from the distribution of energy differences, this is what one would expect. So the questions is, is that what one would expect from the changes in sigma and epsilon?

Partly it has to do with the number of particles involved. The larger the system, the larger a change of energies will be. So as the system becomes bigger, any change in parameters will result in fluctuations that will be arbitrarily large, and result in poor estimates.

That’s the overall theory. On a practical side, the question is whether one can ask the questions of interest with a system of a small enough size.

Ø MBAR_sampling_LJ_methane is a modified version of the code that you provided me. It requires a few of the additional files that you provided me in lj_bayesian (outlined in the README file). I believe that this file demonstrates that the previous conclusion described for MBAR_LJ_two_states is a general one. In other words, the expectation values for all 91 other states are not accurate when they are obtained using configurations from only state 0. Again, it looks like this is a weighting/overlap issue. (Note that I have changed line 465 so that the data are only imported from state 0 when load_ukln is set to true.)

That’s probably true. In order to span a large range of states, we needed to sample from 91 states. Levi’s 2016 paper showed how could pick new states to span a phase space volume in a relatively straightforward manner. But if you start with just one state, then your radius of prediction is kind of low.

ramess101 commented 7 years ago

Michael,

Thanks, graduation was great! The weather for our drive back to Colorado on Sunday was actually really nice. Hopefully the weather cooperated for your drive to Utah as well.

Thank you for looking over these files for me. That is an interesting and very useful insight that the larger the system, the larger the change in energies. This makes sense as the total energy is an extensive property. I guess I am so used to utilizing larger system sizes for GEMC so that the fluctuations in VLE densities are smaller. Is there no way for MBAR to account for changes in energy on a per molecule basis? If I understand you correctly, this means we should try to use the smallest system size possible without introducing significant finite-size effects, right? Maybe a system of 100-200 molecules?

ramess101 commented 7 years ago

Hi, Rich-

Looks like I drafted this email, but never sent it . . . let me know if you have questions on this or what I sent yesterday.


Yep, we drove back on Sunday as well, so great weather (I70 to US40).

Ø That is an interesting and very useful insight that the larger the system, the larger the change in energies. This makes sense as the total energy is an extensive property. I guess I am so used to utilizing larger system sizes for GEMC so that the fluctuations in VLE densities are smaller.

Right, for larger system systes, the fluctuations in extensive quantities are larger, but the fluctuation in intensive quantities are smaller ( N^1/2 vs N^1/2 / N = N^-1/2)

Ø Is there no way for MBAR to account for changes in energy on a per molecule basis?

Really good question. I’ve played around with this a little, and haven’t ended up getting anything that I would consider useful. You basically have to treat correlations in a mean field way, i.e. the system can be understood as N copies of a smaller system. So then it’s not clear whether one should just run smaller systems. Something to talk about later.

Ø If I understand you correctly, this means we should try to use the smallest system size possible without introducing significant finite-size effects, right? Maybe a system of 100-200 molecules?

Yep, that’s what we’ve been doing. And precisely, it’s not so much the size of the system as the size of the perturbation. You can do free energy calculations just fine with 100K atom systems if you are only perturbing 100 of them, or if you only perturb one atom type out of 1000.