Reviewer 1 Comment 4 - Githubissues

ramess101 commented 5 years ago

@mrshirts @jpotoff @msoroush

On page 5 in the third paragraph the authors admit that GCMC-MBAR is very similar to HS-GCMC in nature. They are both powerful tools to optimize force field parameters and the authors list several advantages of GCMC-MBAR over HS-GCMC in their opinion. I think this is a very important comparison that the authors need to address more. Does HS-GCMC possess the same capability as GCMC-MBAR and does it suffer from the same problem that GCMC-MBAR has when predicting the liquid phase saturation properties at 𝜃𝑟𝑟 ≉ 𝜃𝑟𝑒𝑓? It will be very helpful to see some simulation result comparison in the paper between these two methods.

I agree that the comparison with HS-GCMC is important, but I certainly do not intend to perform simulations with the HS-GCMC method. From my understanding, HS-GCMC is limited to values of theta that are considered a priori. So the question of whether it works for 𝜃𝑟𝑟 ≉ 𝜃𝑟𝑒𝑓 is somewhat misguided, i.e., HS-GCMC only works for theta_ref which is why it uses a set of theta_ref. @jpotoff is there anything you think we need to add to our comparison between GCMC-MBAR and HS-GCMC?

jpotoff commented 5 years ago

@ramess101 That is correct. You have to specifcy the theta values ahead of time.

Even if we wanted to do HR-GCMC, we can't code it up and run all the simulations to do a meaningful comparison in 14 days. We could do some hand waving, but that isn't fair to either method. The ultimate method might even be a combination of HR-GCMC and MBAR.

In light of this, perhaps we make a comment to the reviewer that this is a good idea, but due to the time we have to address the reviewer comments, and lack of available open-source codes that have this feature, we are unable to do this now, but intend to study it in the future.

ramess101 commented 5 years ago

@jpotoff

I agree, it is not feasible to run simulations with HS-GCMC considering we don't have a code for it. So a rigorous comparison is not practical. I will try to satisfy the reviewer and consider adding a sentence to the existing discussion.

mrshirts commented 5 years ago

What is HS-GCMC? If I knew what it was, I could comment a bit more about the relationships.

ramess101 commented 5 years ago

@mrshirts

It is the Hamiltonian scaling GCMC approach that was utilized by Jeff Errington and Jeff Potoff back within Panagiotouplos to develop the Exp-6 models. Basically, you perform a single simulation with multiple Hamiltonians (force field parameters) and reweight post-simulation to extract the VLE estimates for a given Hamiltonian. It is not nearly as straightforward as MBAR in that you have to develop some algorithm for determining which Hamiltonian to sample from and then switch mid-simulation.

mrshirts commented 5 years ago

This is basically just a variant of a basis function approach, right? You have basis functions

U = lam_1 F_1(x) + lam_2 F_2(x) + . . .

And you can run at any value of the vector (lam_1,lam_2,...) and reweight to any value of the lambda vector that has good overlap. For our basis function approach, then we simulated at (lam_1, lam_2, ... ) = (1,1, . .. ) and tweaked around (1+/-\delta, 1+/-\delta). In hamiltonian scaling, it sounds like you run at (1,0,...) and then change to different lambdas. The MBAR approach says you can do this, or just rerun the simulation at any other function you care to afterwards. Basis functions just speed things up because you can postcalculate them cheaper.

I may not have gotten the exact details right, but I think you see the point.

ramess101 commented 5 years ago

@mrshirts

This is the comparison that helps me the most. Say we are interested in predicting VLE properties for 3 different force field parameters (theta1, theta2, and theta3).

For GCMC-MBAR we have a few different options. For simplicity, I will discuss just two of them. 1) We could run a single simulation with theta1 and reweight N1 configurations to predict properties for theta1, theta2, and theta3. 2) We could run 3 separate simulations with theta1, theta2, and theta3 and reweight N1+N2+N3 configurations (where typically N1=N2=N3) to predict properties for theta1, theta2, and theta3.

For HS-GCMC you really only have one option. You run a single simulation where you sample from all three force fields sequentially at different stages. The code to determine when you sample configurations from theta1 or theta2 or theta3 is the key. Afterwards, you will reweight N1+N2+N3 configurations (where N1, N2, and N3 might be different depending on the sampling routine) to predict properties for theta1, theta2, and theta3.

The obvious disadvantages of HS-GCMC are: 1) You have to determine the thetas of interest a priori 2) The additional complexity to determine which theta to sample from 3) The unknown speed-up compared to just running independent simulations for each theta in parallel

The obvious advantages of GCMC-MBAR are 1) You don't need to perform new simulations for a theta that you did not consider pre-simulation 2) Simulation scheme does not change 3) Running simulations in parallel

Does that make sense?

mrshirts commented 5 years ago

Just want to make sure I have things clear in my mind,.

Are theta_1, theta_2, and theta_3 three separate variables (epsilon, sigma, lambda) or three specific values of variables (epsilon_1, espilon_2, epsilon_3).

You run a single simulation where you sample from all three force fields sequentially at different stages. I mean, this is really the same thing as three simulations carried out in sequence, right?

The code to determine when you sample configurations from theta1 or theta2 or theta3 is the key.

Not sure I understand this.

ramess101 commented 5 years ago

@mrshirts

OK, sorry, let me clarify. theta represent a set of parameters. For example, theta1 would be (epsilon1, sigma1, lambda1) and theta2 would be (epsilon2, sigma2, lambda2), etc. So theta represents a different force field or Hamiltonian (hence the name).

The HS-GCMC approach works something like this:

1) Randomly choose a starting force field (let's say it is theta1) 2) Accept standard MC moves using probabilities based on energies computed with theta1 3) After some time (t_switch) propose changing the Hamiltonian, e.g., theta2 becomes your force field that you sample from 4) Depending on which theta is chosen, change the chemical potential (mu) and temperature (T) accordingly 5) Repeat steps 2-4 for entirety of simulation 6) Reweight configurations

The algorithm for step 3, i.e., how to determine which force field you are sampling from is the most complicated aspect and it impacts how you need to reweight the configurations at the end of the simulation. Determining what mu and T to use is also a non-trivial (but important) step.

Did that help?

mrshirts commented 5 years ago

Ah, it's Hamiltonian switching, not scaling? I was assuming scaling meant scaling the hamiltonian and determining energies at the other states.

So all this is is expanded ensemble in Hamiltonian, which can improve mixing vs. running individual simulations, but doesn't otherwise help improve efficiency. As well as needing specialized code (MBAR doesn't).

So MBAR can always be done on top of this (I meean, if you are proposing jumps, you can use that, too). However you collect, you can go back and postprocess at more simulations. So MBAR is necessarily more efficient, as it can take the simulations of HS-GCMC and work with those, too.

ramess101 commented 5 years ago

@mrshirts

The name is, in fact, Hamiltonian-scaling. Perhaps it is a misnomer? Or it could be referring to how the chemical potential (mu) and temperature (T) are also modified (scaled?) whenever you change Hamiltonians (I forgot this important detail in the algorithm I described above).

Here is the description we provided in the manuscript:

Do you see any reason why this description is inaccurate or insufficient?

mrshirts commented 5 years ago

Can you link the paper to me? I'd like to make sure I get the details right.

jpotoff commented 5 years ago

It's actually a bit of both. There is switching of Hamiltonians during the simulation, but all of the data are being combined to predict the properties of each Hamiltonian. i.e. there is also reweighting of the Hamiltonians.

The paper is here: https://aip.scitation.org/doi/10.1063/1.476652

ramess101 commented 5 years ago

@jpotoff

Good point. The term "scaling" probably refers to the reweighting step.

mrshirts commented 5 years ago

Sorry for the delay on this. So, it looks like, basically, HS says you are sampling from, essentially, a mixture distribution (since \Pi = \sum W_i q_i(x). MBAR reweights from the mixture distribution (regardless of the method used to collect the data). So you could perform HS sampling, and then use MBAR to analyze. They aren't really competing strategies, but strategies you could synergize together. If you want even sampling at each Hamiltonian, then the weight of state i should just be proportional to 1/(partition function_i). (I would need to double check that). There's some MBAR-like reweighting going on, in the determination of the W_i, but it's all sort of messy.

ramess101 commented 5 years ago

@mrshirts

Not a problem.

I agree with your assessment. I will try to make this more clear in the manuscript and in our response to the reviewer. Should have something for you to read over by tomorrow morning.

ramess101 commented 5 years ago

@mrshirts @jpotoff @msoroush

Again, sorry for the delay. Here is my response:

mrshirts commented 5 years ago

I think it's more straightforward than that HS-GCMC is essentially a method to sample multiple hamiltonians at the same time by simulating a mixture model. MBAR is a method for optimal analysis of the data that you gather. There is no real need to compare, because MBAR-style analysis could be added right on top of HS-GCMC, and be better than the combination. Best to think about sampling and analysis (assuming it is not adaptive) as two separate parts that can be mixed and matched, rather than a unitary approach.

mrshirts commented 5 years ago

Though since HS-GCMC reweights from the sampled mixture model, it has some of the advantages of MBAR. But not all - the fact that it's using histograms in energy causes some bias, and you can't easily add additional simulations if you don't get the distribution right the first time.

ramess101 commented 5 years ago

@mrshirts

I see your point, and I tried to make it clear that MBAR could be applied on top of HS. How do you think I could make it more clear (in both the manuscript revision and the response to the reviewer) that HS is all about sampling while MBAR is all about analysis?

ramess101 commented 5 years ago

@jpotoff @msoroush

After speaking with @mrshirts , we decided that there is no need to compare the efficiency of HS-GCMC and GCMC-MBAR because they are not really competing methods. HS is all about sampling while MBAR is all about data analysis. Therefore, it is certain that one could combine HS-GCMC with GCMC-MBAR by performing an HS-GCMC simulation but applying the MBAR post-simulation analysis.

I am still working on modifying the response to the reviewer, but here is how I modified the paragraph in the discussion section. I hope this makes it clearer that HS and MBAR are addressing the same problem from two totally different angles.

Let me know what you think. (Note I am thinking about removing the last sentence to make this a bit more succinct.)

ramess101 commented 5 years ago

@mrshirts @jpotoff @msoroush

Here is the response that I have right now:

ramess101 commented 5 years ago

@mrshirts

Could you look over my response for HS-GCMC and GCMC-MBAR?

ramess101 / JCED_FOMMS_Manuscript

Reviewer 1 Comment 4 #22