Scoring function when varying lambda

ramess101 commented 5 years ago

@mrshirts @msoroush @jpotoff

We need to decide on what we would recommend when predicting the scoring function for different lambda. Currently, the end of Section 3.3 suggests decreasing the weight on the liquid phase properties. However, the results in Section 3.4 are actually obtained without modifying the scoring function at all. If I remove the liquid phase properties altogether (set their weights to zero) the optimal is poorly defined as it only depends on vapor density and vapor pressure. This typically leads to much lower values of sigma. Even though the liquid phase properties are poorly estimated when lam_rr \neq lam_ref, including the liquid phase keeps the optimal sigma near TraPPE (which is what experience suggests it should not change much).

For example, here are the results where I include both liquid and vapor properties:

And here is just rhovap and Psat (note that I renormalized the scoring function such that the weights add to 1):

Clearly, the optimal sigma with just vapor density and vapor pressure is not going to provide an accurate liquid density. Even if you then simulated at this new eps/sig for each lambda you are starting further from the likely optimal region.

For comparison, here is just the liquid density (again I renormalized the weights):

And here is just the enthalpy of vaporization (renormalized):

Because the rhovap/Psat contours run essentially parallel to the DeltaHv contours while the rholiq contours do not, we really need to include rholiq in the scoring function.

For example, here is rhovap, Psat, and enthalpy of vaporization (renormalized), i.e., only excluding rholiq:

We could still try ways to penalize parameter sets with low number of effective samples in the liquid phase, but I don't think we should throw them out altogether.

In brief, I think we should modify our recommendation at the end of Section 3.3 to be consistent with Section 3.4.

mrshirts commented 5 years ago

We could still try ways to penalize parameter sets with low number of effective samples in the liquid phase, but I don't think we should throw them out altogether.

If one is trying to come up with good parameters, then we should be doing extra simulations to make sure there are high number of effective samples everywhere - the cost in not that high if one is trying to find parameters that people can use from that time forward!

I.e. don't bother with trying to optimally use low effect effective number of samples, make sure there are large enough number of effective samples.

ramess101 commented 5 years ago

@mrshirts

I agree that we would want to perform additional simulations to obtain refined optimal parameters. However, the goal here is to determine WHERE we should perform those additional simulations (i.e., we want them to be close to the true optimum). By including rholiq (even with low Neff) we get an estimate for the optimum that is much closer to the true optimum and, therefore, simulating at this new point should provide very accurate estimates of the relevant region.

mrshirts commented 5 years ago

The strategy that Levi and I used in (https://pubs.acs.org/doi/abs/10.1021/acs.jctc.5b00869) was to sample at edges where N_eff is low. I.e. first be certain, then optimize. In this case, one could imagine just looking at the local gradient (which direction is the optimum) and then carrying out an extra simulation at the edge there.

ramess101 commented 5 years ago

@mrshirts

That does sound like a good approach. In fact, in this case that is sort of what we get when we include all four properties. Here are the plots with Neff where the star is the optimal parameter set for each lambda (score computed with all properties). Note that I replaced lam =12 with lam =20 in the top-left plot.

The optimal parameter set (star) is pretty close to the Neff = 50 (log10(50) = 1.7, cyan) contour. The optimal parameter set does get fewer and fewer Neff with increasing lam, but it appears to appropriately penalize those state points where Neff is much too low.

I think this could be good to discuss in the manuscript. Specifically, the poor estimates of rholiq for low Neff appropriately penalize parameter sets with poor overlap and, thus, by including rholiq in the scoring function we obtain the best parameter set that also has a reasonable value of Neff. And that makes sense, right?

mrshirts commented 5 years ago

Specifically, the poor estimates of rholiq for low Neff appropriately penalize parameter sets with poor overlap and, thus, by including rholiq in the scoring function we obtain the best parameter set that also has a reasonable value of Neff.

But I don't know that we should be recommending any parameters that depend on parameters with low N_eff. The might be "the best parameters to run next". but I don't think we want an parameterization process that doesn't end with high values of N_eff near the minimum.

poor estimates of rholiq for low Neff appropriately penalize parameter sets with poor overlap

One issue could be that low values of N_eff results in property estimates that push the parameters in the wrong direction, though.

The optimal parameter set (star) is pretty close to the Neff = 50 (log10(50) = 1.7, cyan) contour.

So It seems like the next phase in the process should be to carry a new simulation there, and recalculate everything with that extra simulation?

ramess101 commented 5 years ago

But I don't know that we should be recommending any parameters that depend on parameters with low N_eff. The might be "the best parameters to run next". but I don't think we want an parameterization process that doesn't end with high values of N_eff near the minimum.

I completely agree. We should never report a parameter set that we have not actually simulated directly or that doesn't at least have a high value of N_eff.

One issue could be that low values of N_eff results in property estimates that push the parameters in the wrong direction, though.

What I am trying to say is that, for GCMC-MBAR, low N_eff is always going to increase the scoring function error because it predicts erroneous rholiq. Thus, the optimizer will try to find the lowest scoring function value where N_eff is reasonably high. So the next parameter set to simulate at should be along this path of the optimal scoring function valley at the point where Neff becomes too low for a somewhat reasonable estimate.

So It seems like the next phase in the process should be to carry a new simulation there, and recalculate everything with that extra simulation?

Yes. @msoroush is working on this right now and we should be able to have this ready by the deadline.

jpotoff commented 5 years ago

@ramess101 @mrshirts @msoroush Hard to keep up with you two. Agree with the last statement on running at the current "optimum" and using those data to finish the optimization.

msoroush commented 5 years ago

Here is the preliminary phase diagram for lambda 14-20. I need to fix some chemical potential. I should have better data by tomorrow.

Lambda = 14: eps = 61.5, sig = 0.393 Lambda = 16: eps = 70, sig = 0.389 Lambda = 18: eps = 77, sig = 0.389 Lambda = 20: eps = 84, sig = 0.388

density_liq density_vap pressure

ramess101 commented 5 years ago

@msoroush

Looks great. I'd say that is pretty impressive that we obtained such a well tuned Mie 16-6 potential just from simulating the TraPPE potential.

ramess101 commented 5 years ago

@msoroush

Could you plot DeltaHv too? Since that was also included in the scoring function.

ramess101 commented 5 years ago

@mrshirts @jpotoff @msoroush

OK, so Mohammad simulated the 14-6, 16-6, 18-6, and 20-6 "optimal" parameter sets. Clearly the 16-6 has the best Psat and rhovap. They 16-6 and 18-6 are pretty similar for rholiq.

This is our plan:

Simulate the "optimal" parameter sets for each lambda (simulations are running and will be done by Monday)
Re-optimize for constant lambda (similar to what we did for lam = 12 when we simulated TraPPE)
Plot the final direct simulation results for the optimal of each lambda (like Mohammad did for the pseudo-optimal parameter set).

One question, @jpotoff how precise do we want to report the optimal sigma and epsilon? I could get extremely precise values (probably more precise than the uncertainties merit) or we could report them to the same precision as your other publications. What is your preference?

jpotoff commented 5 years ago

@ramess101 We should stick with the same level of precision as in prior Mie papers. While not a rigorous as what you've done in the past, we have been able to pull a reasonable estimate of the uncertainty from looking at the heat maps. We could do the same here.

ramess101 commented 5 years ago

@jpotoff

OK, that is reasonable. Could you elaborate a little more on how you use the heat maps to determine uncertainty? For example, if the optimum epsilon is 50.3461 but the scoring function is less than 1 (or some other value) for epsilon between 50.2 and 50.5, would you report epsilon as like 50.3?

jpotoff commented 5 years ago

@ramess101 pretty much that is what we do. Although I might be tempted to report that as 50.35. If you look at the darkest red sections of your heat maps, everything in that region is going to produce very similar results for the VLE. Pick the center as the "optimum", and the uncertainties in sigma and epsilon are the distances to the border of the dark red region. Technically this is not an "uncertainty", but a range of parameters where the simulation will produce essentially the same results.

ramess101 commented 5 years ago

@jpotoff

OK, that makes sense. I would say that compared to what is typically done with reporting uncertainties in force field parameters (i.e., nothing at all) this is an acceptable practice that we can follow in this manuscript.

ramess101 commented 5 years ago

@jpotoff

By the way, did you get to read the manuscript yet?

jpotoff commented 5 years ago

@ramess101 Indeed. Continuing to work on it this morning.

msoroush commented 5 years ago

Here are the updated results:

Lambda = 14: eps = 61.5, sig = 0.393 Lambda = 16: eps = 70, sig = 0.389 Lambda = 18: eps = 77, sig = 0.389 Lambda = 20: eps = 84, sig = 0.388

density_liq density_vap heatofvap pressure

and the VLE data: phaseData.tar.gz

ramess101 commented 5 years ago

@msoroush @jpotoff

JCED wants percent deviation plots. So I am going to include this figure (after including the MiPPE results) in the text:

And this one in SI;

ramess101 commented 5 years ago

@msoroush @jpotoff

A couple interesting observations. The Keasler (Siepmann, GEMC) values have slightly different vapor properties. The Yiannourakou values have some erroneous points.

The pseudo-optimal lambda values are already quite a bit better than TraPPE.

I will have the MiPPE parameters by tomorrow morning. And we will want to simulate MiPPE directly as soon as we can tomorrow.

ramess101 commented 5 years ago

@msoroush

OK, I used a grid search with 0.5 K and 0.0005 nm precision, and the optimal parameter set was 69.5 K, 0.391 nm, and lam = 16. Could you locate the mu/T state points and start simulating 20 replicates?

This is what the final heat map looks like:

Again, I could go to higher precision, but since so many parameter sets have similar scores in this valley, the noise can impact the optimal parameter set.

Here is what the percent deviation plots are expected to look like (the reference is 70 K and 0.389 nm):

@jpotoff Do the green (optimal) percent deviations look like MiPPE level accuracy? The DeltaHv and near critical rhol are my main concerns. We could try playing around with the scoring function if you want.

ramess101 commented 5 years ago

@msoroush @jpotoff

And here is a comparison of TraPPE, the pseudo-optimals for each lambda and the "final" MiPPE parameters:

ramess101 commented 5 years ago

@jpotoff @msoroush

Clearly the MiPPE force field is much more accurate than TraPPE for Psat, rhov, and DeltaHv. rhol is similar

msoroush commented 5 years ago

@ramess101 the difference between pseudo-optimal and optimal is 0.5 K in epsilon and 0.002 nm in sigma. I think we can use the same T and Mu. I started the simulations.

runs 1, 2 vapor, run3 bridge, run4-7 liquid

R # T(K) Chem. Pot

R 1 450 -4367
R 2 500 -4367
R 3 550 -4367
R 4 500 -4150
R 5 460 -4025
R 6 410 -3890
R 7 360 -3790

ramess101 commented 5 years ago

@msoroush

OK, sounds good. Also, I think we should test the second best parameter set (since it had a very similar score and probably is within the uncertainty). Do you think we could use the same state points for: 69 K, 0.3925 nm, 16?

msoroush commented 5 years ago

Probably. You can run it using the same chemical potential and Temperature. After patching, we can fix them. Do you still have the patching programs from GOMC workshop? I dont have any free cpu to calculate it for you.

ramess101 commented 5 years ago

@jpotoff @msoroush Here is what the second best parameter set looks like:

This has a better DeltaHv but a worse rhol. Since MBAR has some inherent uncertainty, I think the only way to really say which is better is to simulate both directly.

ramess101 commented 5 years ago

@msoroush

After patching, we can fix them. Do you still have the patching programs from GOMC workshop? I dont have any free cpu to calculate it for you.

Not sure I know what you are referring to as the "patching" program. I have not attended a GOMC workshop before, but I do have the tutorial files.

msoroush commented 5 years ago

@ramess101 The example for GCMC-HR method in GOMC workshop. I can walk you through that. Let's try to set it up and run the simulation. Then I will help you to fix the chemical potential and Temperature.

ramess101 commented 5 years ago

@msoroush

OK, I will get the simulations running at the same mu and T. Do these need to be long simulations or can I run a shorter simulation prior to the patching step?

msoroush commented 5 years ago

Short simulation with 4 million production steps and 1 million equilibrium steps should be sufficient.

msoroush commented 5 years ago

@jpotoff @msoroush Here is what the second best parameter set looks like:

This has a better DeltaHv but a worse rhol. Since MBAR has some inherent uncertainty, I think the only way to really say which is better is to simulate both directly.

For calculating the standard deviation for lambda 14, 16, 18, and 20, which data are you using? The one that I recently sent or these are your data? It seems like the liquid density for lambda 18 and 20 at lower temperature (350-380 K) is not correct.

ramess101 commented 5 years ago

@msoroush

These are your data...why do you think they are not correct?

ramess101 commented 5 years ago

@msoroush OK, I see that your plots did not have lambda 18 and 20 going below the experimental line. It could be that our experimental data are different

ramess101 commented 5 years ago

@msoroush

This is what I get for just lam = 20 using your data:

What is your experimental rhol value at 350 K?

msoroush commented 5 years ago

If you look at the figure that I sent, the liquid density does not cross the experimental data and both lambda 18 and 20 are over predicting the liquid density. bot in your standard deviation, I see negative number.

msoroush commented 5 years ago

I think you are using the first data set that I sent. Later I sent the updated one. You can find it here phaseData.tar.gz

ramess101 commented 5 years ago

@msoroush

OK, I thought I used your updated data, but I will replot this.

msoroush commented 5 years ago

What is your experimental rhol value at 350 K?

I zipped all my data, including experimental data.

ramess101 commented 5 years ago

@msoroush

OK, I think these are correct now:

msoroush commented 5 years ago

@ramess101 it looks much better now. Thanks

ramess101 commented 5 years ago

@msoroush

So looking at the second and third best parameter sets again, they are clearly worse in rholiq (shifted by 0.5 to 1%). Maybe we don't need to simulate those directly. I am just wary of the deadline looming and trying to cram in to much at the very end. I don't think we would be able to run 20 replicates for each parameter set (most of my cores are being used right now).

I guess the question is whether @jpotoff is satisfied with the performance of our optimal parameter set.

msoroush commented 5 years ago

@ramess101 I agree. I think the optimal parameter

sigma = 0.391 nm, epsilon = 69.5 K, lambda = 16

is the best one. Usually we dont want the error in liquid density be greater than 0.5%.

ramess101 commented 5 years ago

@msoroush @jpotoff

Yeah. I think I could get slightly better performance if I used a more refined grid. For example, I could locate an optimal of something like 69.2 K and 0.3913 nm, but I think we agreed that we want to keep the precision similar to other studies, especially considering the uncertainty in the scoring function.

jpotoff commented 5 years ago

@ramess101 @msoroush We need to be careful not to overdo it with the precision of the fitting as we are approaching the limit of uncertainty of the simulations. If we had more time, I might spend a week agonizing about 0.3913 vs. 0.3910 nm, but I think in this case we need to go with these parameters and move forward.

One concern I have is about the data in the near critical region, which are probably showing finite size effects. We should consider only showing data for 0.95Tc and below. Also, have we calculated the compressibility factor and verified it approaches Z=1 at low temperatures? We need to do that to further verify that there are no significant system size effects in the gas phase at low temperatures.

ramess101 commented 5 years ago

@jpotoff

We need to be careful not to overdo it with the precision of the fitting as we are approaching the limit of uncertainty of the simulations. If we had more time, I might spend a week agonizing about 0.3913 vs. 0.3910 nm, but I think in this case we need to go with these parameters and move forward.

Agreed.

One concern I have is about the data in the near critical region, which are probably showing finite size effects. We should consider only showing data for 0.95Tc and below.

For some reason I thought that cyclohexane's critical temperature was higher. I just realized it is around 554 K, so we should definitely not be including points up to 550 K. This was a serious oversight on my part. This could impact our optimal parameter set. I will reoptimize the parameters with data up to 525 K. Then we will only plot data below 0.95 Tc.

Also, have we calculated the compressibility factor and verified it approaches Z=1 at low temperatures? We need to do that to further verify that there are no significant system size effects in the gas phase at low temperatures.

I think we did this, right @msoroush? But I will look into it again.

msoroush commented 5 years ago

We have minor system size effect.

msoroush commented 5 years ago

For some reason I thought that cyclohexane's critical temperature was higher. I just realized it is around 554 K, so we should definitely not be including points up to 550 K. This was a serious oversight on my part. This could impact our optimal parameter set. I will reoptimize the parameters with data up to 525 K. Then we will only plot data below 0.95 Tc.

I agree. We should exclude any data point above 520 K in scoring function.

ramess101 commented 5 years ago

@msoroush

We have minor system size effect.

Are you referring to the odd shape near the critical temperature or the fact that Z is not 1 at 350 K? Because it looks to me like it is approaching 1 rather smoothly, but is the idea that it should already be 1?

jpotoff commented 5 years ago

@ramess101 @msoroush The issue is in the near critical region. The simulation data are smoothly going to Z=1 at low temperatures. We also calculated Z using the Peng-Robinson EOS at 350 K and found Z=0.9692, compared to Z=0.9685 predicted by simulation.

ramess101 / MBAR_GCMC

Scoring function when varying lambda #14

runs 1, 2 vapor, run3 bridge, run4-7 liquid

R # T(K) Chem. Pot