ramess101 / MBAR_ITIC

Merging the MBAR and ITIC methodologies to optimize a TraPPE force switch model
0 stars 2 forks source link

ITIC #2

Open ramess101 opened 7 years ago

ramess101 commented 7 years ago

Currently the code only uses U and/or P at every state point along ITIC to optimize the parameters. I need to implement the ITIC approach into the python code so that we are actually optimizing to Psat and rhol (or Tsat). I have an ITIC code that I just need to debug some before it is ready.

ramess101 commented 7 years ago

ITIC code has been debugged and is ready for inclusion in MBAR_ITIC.

ramess101 commented 7 years ago

Richard,

So I have a few questions regarding how I am implementing ITIC. I believe there are three primary differences between my code, your code and/or Mostafa’s code:

  1. Instead of obtaining B2 from DIPPR or REFPROP along the IT, I simply take Adep from REFPROP up to the lowest density simulated along the IT and add this value to the Adep integration from simulation.
  2. Instead of using Trapezoid or Simpson for integrating, I fit the different properties to a model and then perform an “exact” integration of the model. For example, I fit the Z-1/rho vs rho curve to a third order polynomial, Z vs 1/T to a second order polynomial, and U/T vs 1/T to a straight line.
  3. Instead of using ZL = 0.001 to find Tsat from Z vs 1/T, I use an iterative self-consistent approach. In other words, I start with an initial guess for ZL and calculate Tsat, then rhov, then Psat and then I can calculate ZL_new from Psat/rhol/Rg/Tsat. I use this new ZL value and repeat the process until the ZL from the previous iteration is essentially the same as the new ZL.

The first two changes have a very small impact on the results. The first change is nice because COOLPROP has a bug where it won’t give you B2 at temperatures about TC. The second change is nice because you do not need to worry about how to integrate but instead can focus on choosing good fitting models (which I believe is easier to visualize and quantify). Plus, this approach is not limited to evenly spaced densities and inverse temperatures.

The third change can have a noticeable (though relatively small) impact, especially at higher Tsats. From what I can tell, in your VpITICjre17.f90 code you utilize this iterative approach by calculating a new Tsat from ZL, right? The excel sheet you gave me (which I believe is Mostafa’s code) only used ZL of 0.001. I also considered using ZL from REFPROP, but I think the self-consistent approach is more appropriate since it relies more on the force field itself.

Do you have any concerns with these modifications?

ramess101 commented 7 years ago
  1. That should give good accuracy, but we are in the process of switching to using the B2 from the force field. We believe it is more self-consistent to use the B2 from the force field.
  2. Hmm… we have tried various functional forms including polynomials, but never felt good about the accuracy, especially if you are using a single cubic over the entire range of density. One method we were somewhat content with was what we called the quick and dirty EOS, which was basically something like Z = 1 + (arho+brho^2+crho^3)/(1-drho)^3 –e*rho/T. But this was not as reliable as the simpson method when we tested it. you might want to read the paper by Barlow and Kofke in which they argue that nonclassical critical behavior is observable in the supercritical region.
  3. If you look closely at my VpITIC code, I think you will see that I am doing the same thing. I only take Z=0.001 as an initial guess. Maybe Mostafa wrote something different in his thesis. If he did, he made a boo-boo.
ramess101 commented 7 years ago
  1. OK, I agree that use the B2 from the force field is the right way to go.

  2. Since I am fitting a polynomial to just the IT or IC independently it fits pretty well. I did notice the Q&D fit but I think since I am not trying to fit all the state points (just along a single state path) the empirical models are sufficient. Z-1/rho vs rho has a nice cubic shape, at least for ethane. Z vs 1/T is pretty linear but with enough curvature to merit a second order polynomial. And U/T vs 1/T is also very linear. I am not familiar with what those plots look like for different types of compounds, but this figure from Mostafa’s thesis for trans-2-butene seems to suggest that Z-1/rho has a similar shape:

  3. Yes, I think you have coded the same thing, I just wanted to make sure I was interpreting that correctly.

ramess101 commented 7 years ago
  1. We only use Q&D to fit the isotherm. Maybe there is no T term. Simpson’s method for the isochore gives the exact same result as assuming a quadratic for the regression. That is inherent in the derivation of Simpson’s method.
ramess101 commented 7 years ago

OK, thanks for clarifying that part about the isotherm. I guess one reason I like fitting the isotherm to a model is that it helps smooth out any fluctuations in the simulation output. Also, would Simpson’s method still be the same if we had more than three points along the isochore?

ramess101 commented 7 years ago

When you have 3 points, it is better to use the Simpson’s 3/8 rule. It equates to a cubic polynomial. That’s eq 2.19 in Mostafa’s thesis. See also Wikipedia Simpson’s Rule.

ramess101 commented 7 years ago

These figures show the agreement between my implementation of ITIC and the reported values of TraPPE and Potoff in the literature.

TraPPE comparison:

image

Potoff comparison:

image

I would say that the ITIC rhoL and Psat look a little bit too good for Potoff, i.e. they are better than the author reported. TraPPE also has a slight anomaly in rhov at the highest temperature.

ramess101 commented 7 years ago

@jrelliottoh @mostafa-razavi How significant are finite-size effects in ITIC. Traditional VLE studies (GEMC, GCMC) can have significant finite-size effects, especially near the critical point. My gut feeling says ITIC should not suffer from these same finite-size effects because we are only concerned with U and P (or Z) for a single phase NVT. The fluctuations will obviously be larger for a smaller system. But do you know if the rhol and Psat values will change considerably? For example, we have been using 400 molecules for ethane. Would you suspect a large difference if we used just 50? Obviously, with 50 molecules the boxes will be too small to use a 1.4 nm cutoff, so maybe that would be the largest system size effect.

The reason I am asking is because MBAR is actually better at extrapolating with fewer molecules. This is primarily because the fluctuations are larger so more configurations are sampled that may be less favorable for the reference system but more favorable for the target system. Is that right @mrshirts? In addition, fewer molecules would obviously lead to faster simulations although we might want to run them longer to get better averages.

mrshirts commented 7 years ago

This is primarily because the fluctuations are larger so more configurations are sampled that may be less favorable for the reference system but more favorable for the target system.

Well, they are not less favorable for the reference system (they are perfectly good for the reference system of that size), but they ARE more favorable for the target system, because the natural fluctuations are larger.

mostafa-razavi commented 7 years ago

Richard, I'm running a series of LAMMPS simulations to figure out the effect of system size on vapor pressure and liquid densities. I'll post the results here as soon as they are done. I'm simulating TraPPE-UA ethane.

mostafa-razavi commented 7 years ago

I performed a series of ITIC simulations for ethane in LAMMPS using TraPPE-UA force field as shown in list below:

Note: ts = timestep in femtosecond mts = million timesteps run ns = nanosecond simulation time (ns=ts*mts) cpu = simulation run time in hours rc = cutoff distance in Angstrom

image

Rigid means SHAKE algorithm was used in LAMMPS Flexible means harmonic bond was used with kb=191.75 kcal/mol/A^2 taken from de Pablo paper (Nath, Escobedo et al. 1998)

Simulation 1) N=50, rigid, rc=14 itic4

Simulation 2) N=100, rigid, rc=14 itic4

Simulation 3) N=400, rigid, rc=14 itic4

Simulation 4) N=1600, rigid, rc=14 itic4

Simulation 5) N=50, flexible, rc=14 itic4

Simulation 6) N=100, flexible, rc=14 itic4

Simulation 7) N=400, flexible, rc=14 itic4

Simulation 8) N=1600, flexible, rc=14 itic4

Simulation 9) N=50, rigid, rc=8 itic4

Simulation 10) N=100, rigid, rc=10 itic4

Simulation 11) N=400, rigid, rc=10 itic4

Simulation 12) N=1600, rigid, rc=10 itic4

==================================================== Conclusions:

Conclusion 1- By comparing simulations 3 and 7, shake makes equilibration much faster. 1 ns is enough to reproduce TraPPE-UA results. When SHAKE is used equilibrium is reached at all densities and temperatures within first 1 ns, and it takes 3.7 hrs to perform all 19 ITIC points, whereas without using shake even 12 ns is not enough for some data points to reach true equilibrium, and it took 27.7 hrs to perform ITIC.

Below are plots of running averages vs time for:

Simulation 3) rigid eqplot

Simulation 7) flexible eqplot

Conclusion 2- When using SHAKE, N=50, and N=100 do not give accurate vapor pressure and liquid density. N=400 is sufficient.

Simulation 1) N=50, rigid image

Simulation 2) N=100, rigid image

Simulation 3) N=400, rigid image

Simulation 4) N=1600, rigid image

Conclusion 3- When ethane is flexible 12 ns simulation time using ts = 1 femtosecond is not enough to get accurate liquid density especially at lower temperatures. Figure below clearly shows that system has not reached equilibrium after 12 ns. 16 ns might be enough.

Simulation 7) N=400, flexible 9

(Black lines represent running averages)

Conclusion 4- Effect of system size on P, U, Psat, and rhoL Here is a comparison between Simulations 10, 11, and 12 (N=100/400/1600,rigid, rc=10). itic-rc10

Conclusion 5- Effect of cutoff distance on P, U, Psat, and rhoL N=50, rigid itic-rc8

N=400, rigid itic-n400

N=1600, rigid itic-n1600

ramess101 commented 7 years ago

@mostafa-razavi This is very useful and thorough work. Conclusions 1 and 3 seem to confirm what we have been doing with Gromacs, namely, to use rigid bond lengths. We too have observed very fast equilibration (< 1ns). Conclusion 2 is a bit disheartening, because MBAR works better with smaller system sizes. That being said, we could still use smaller system sizes for the first couple iterations (when the LJ/Mie parameters are changing significantly) and then use a larger system size (~400) for the refinement iterations. Of course, this system is still very fast computationally, even with 400 molecules, so it might not be worth the hassle of having a varying system size. I was wondering, for your smaller system simulations (~50, 100) some of your box lengths were likely less than 2.8 nm, correct? So using a cutoff of 1.4 nm would not be permitted since it is more than half the box length. What did you do in these situations?

On a side note, if I recall correctly, @jrelliottoh mentioned that there should be a publication on ITIC coming out soon. Is that correct? If so, I think these results for system size should go in there. Also, I would be very interested in giving a pre-review of the manuscript, both for my benefit and hopefully for yours as well.

mostafa-razavi commented 7 years ago

I was wondering, for your smaller system simulations (~50, 100) some of your box lengths were likely less than 2.8 nm, correct?

You're right. I overlooked the fact that LAMMPS does not give any error in these cases. I'm repeating SHAKE simulations with rcut=10A for N=100, 400, and 1600 and rcut=8 for N=50. As far as I remember, rCut=10 gives similar results to rcut=14, but I might be wrong.

On a side note, if I recall correctly, @jrelliottoh mentioned that there should be a publication on ITIC coming out soon. Is that correct?

I totally agree. System size discussion belong to ITIC paper. And I appreciate your offer. I'll be happy to use your review.

jrelliottoh commented 7 years ago

I haven't tried 50 molecules for much except maybe nC80. It would make me a little nervous as a basis for a final estimate, but it might be a reasonable basis for preliminary searches for optimal parameters.JRE

On Thursday, August 17, 2017, 12:40:38 PM EDT, mostafa-razavi notifications@github.com wrote:

I was wondering, for your smaller system simulations (~50, 100) some of your box lengths were likely less than 2.8 nm, correct?

You're right. I overlooked the fact that LAMMPS does not give any error in these cases. I'm repeating SHAKE simulations with rcut=10A for N=100, 400, and 1600 and rcut=8 for N=50. As far as I remember, rCut=10 gives similar results to rcut=14, but I might be wrong.

On a side note, if I recall correctly, @jrelliottoh mentioned that there should be a publication on ITIC coming out soon. Is that correct?

I totally agree. System size discussion belong to ITIC paper. And I appreciate your offer. I'll be happy to use your review.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ramess101 commented 7 years ago

You're right. I overlooked the fact that LAMMPS does not give any error in these cases. I'm repeating SHAKE simulations with rcut=10A for N=100, 400, and 1600 and rcut=8 for N=50. As far as I remember, rCut=10 gives similar results to rcut=14, but I might be wrong.

That is unfortunate that LAMMPS does not provide a warning in that scenario. I have observed small but clear deviations between rcut =10 and rcut=14 for GEMC. So it will be interesting to see how this affects ITIC. To quantify the effect of rcut, we can compare the rcut = 10 to rcut = 14 for 400 and 1600 (from before). Then, the comparison of N=100,400, and 1600 with rcut=10 will be a clear comparison of the impact of just changing N. The rcut = 8 and N = 50 run will be a wildcard, since it might show that if we scale rcut with N the results are maintained, or it might compound the two effects.

I totally agree. System size discussion belong to ITIC paper. And I appreciate your offer. I'll be happy to use your review.

Great. I look forward to it.

ramess101 commented 7 years ago

I haven't tried 50 molecules for much except maybe nC80. It would make me a little nervous as a basis for a final estimate, but it might be a reasonable basis for preliminary searches for optimal parameters.

Yes, that is what I am thinking. Then we get the benefit of both shorter simulations of initial parameters but better extrapolation with MBAR. Also, if finite-size effects end up being relatively small, this could be an added benefit of the ITIC approach. By contrast, if finite-size effects remain significant, it would be instructive to know if the internal energies or the pressures are most affected by system size. This would be useful knowledge for optimizing the Mie potential, since I have discovered that going from lambda=12 to lambda=16 in a single jump requires using internal energy as the initial target property because MBAR does not extrapolate very well when predicting pressure. So if smaller system sizes do not impact internal energy much, I could use smaller system sizes for the initial reparameterization and then larger system sizes when I include pressure (rhol, Psat, etc.).

So @mostafa-razavi when you plot up your new results for rcut = 10 and N=100, 400, and 1600, could you include plots that show Udep*T for the different system sizes on the same plot? And the same for Z-1/rho vs rho and Z vs 1000/T? If that is a hassle I can just eye-ball the comparison if you provide the same plots as in your previous comment. Or, perhaps just calculate the percent deviation in U and P (or Z) at the 19 ITIC state points for N = 100, 400 relative to N = 1600?

jrelliottoh commented 7 years ago

On second thought, I think clarification is required about which systems are applying the small system size. We have already established that large system sizes are required for B2,B3 by MD. I presume that B2 is being determined with N>799 in all of the above discussion. Is that correct? JRE

On Thursday, August 17, 2017, 1:25:00 PM EDT, Richard Messerly notifications@github.com wrote:

I haven't tried 50 molecules for much except maybe nC80. It would make me a little nervous as a basis for a final estimate, but it might be a reasonable basis for preliminary searches for optimal parameters.

Yes, that is what I am thinking. Then we get the benefit of both shorter simulations of initial parameters but better extrapolation with MBAR. Also, if finite-size effects end up being relatively small, this could be an added benefit of the ITIC approach. By contrast, if finite-size effects remain significant, it would be instructive to know if the internal energies or the pressures are most affected by system size. This would be useful knowledge for optimizing the Mie potential, since I have discovered that going from lambda=12 to lambda=16 in a single jump requires using internal energy as the initial target property because MBAR does not extrapolate very well when predicting pressure. So if smaller system sizes do not impact internal energy much, I could use smaller system sizes for the initial reparameterization and then larger system sizes when I include pressure (rhol, Psat, etc.).

So @mostafa-razavi when you plot up your new results for rcut = 10 and N=100, 400, and 1600, could you include plots that show Udep*T for the different system sizes on the same plot? And the same for Z-1/rho vs rho and Z vs 1000/T? If that is a hassle I can just eye-ball the comparison if you provide the same plots as in your previous comment. Or, perhaps just calculate the percent deviation in U and P (or Z) at the 19 ITIC state points for N = 100, 400 relative to N = 1600?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ramess101 commented 7 years ago

On second thought, I think clarification is required about which systems are applying the small system size. We have already established that large system sizes are required for B2,B3 by MD. I presume that B2 is being determined with N>799 in all of the above discussion. Is that correct?

Yes, I agree that for B2 and B3 we need to keep the system size large.

mostafa-razavi commented 7 years ago

@ramess101, I updated the simulation table, and added Simulation 8-12 and Conclusion 4 and 5 to my original comment

ramess101 commented 7 years ago

@mostafa-razavi These results are very insightful. Let me see if I interpret them in the same way you do.

So from what I can tell in Conclusion 4, system size of 400 molecules agrees very closely with 1600 molecules (using same cutoff). Specifically, they are nearly identical when comparing rhol, rhov, Psat, Z, although the Adep does appear to have some system-size effects. System size of 100 molecules has significant differences in rhol, rhov, Psat, Z, and Adep but Udep appears to be quite similar for all system sizes. If you agree with this interpretation, that could be beneficial for MBAR because we are considering using Udep as an initial objective to localize the optimal region before including additional target properties. In other words, when extrapolating from LJ to Mie we get better MBAR results using just Udep.

For conclusion 5, it appears that the cutoff of 10 vs 14 has a negligible impact on rhol, rhov, Psat, Z, and Udep but is noticeable for Adep. The cutoff of 8 appears to cause significant deviations in all properties for the 50 molecule system. However, I wonder if that is because the 14 Ang system was not being properly calculated on LAMMPS? Having said that, I find it surprising that the 50 molecule system with 14 Ang cutoff actually has an Adep that agrees closely with NIST (and the other larger system simulations). Still, I would suspect that a 400 or 1600 molecule system with 8 Ang cutoff would be more similar to the 400 and 1600 molecule systems with 14 Ang cutoffs, although it could still show some deviations. That is not too important but if we want a truly fair comparison of 50 molecules with 100, 400, and 1600 we might want to do all four at 8 Ang. Again, not sure if that merits our time but it could be useful.

So in summary, for Udep you can use a 100 molecule system with 10 Ang cutoff and be OK. But for the rest of the properties you would want to use 400 molecules with either 10 or 14 Ang cutoffs. Does that sound like a fair assessment?

mostafa-razavi commented 7 years ago

So from what I can tell in Conclusion 4, system size of 400 molecules agrees very closely with 1600 molecules (using same cutoff). Specifically, they are nearly identical when comparing rhol, rhov, Psat, Z, although the Adep does appear to have some system-size effects. System size of 100 molecules has significant differences in rhol, rhov, Psat, Z, and Adep but Udep appears to be quite similar for all system sizes. If you agree with this interpretation, that could be beneficial for MBAR because we are considering using Udep as an initial objective to localize the optimal region before including additional target properties. In other words, when extrapolating from LJ to Mie we get better MBAR results using just Udep.

I should also point out that the reason for aDep difference between 1600 and 400 is mainly because of difference in (Z-1)/rho at lowest density. According to my GOMC simulations which I will post in https://github.com/ramess101/MBAR_ITIC/issues/3 soon, the true value of TraPPE ethane B2 at 360 K is ~ -3.6 cc/g which is in a better agreement with N=1600. That's why we need more molecules for lower densities. Please note that Psat and Adep of these simulations are not "exact", because I used DIPPR B2 at 360 K. I will update the plot with TraPPE B2 value soon. But as far as other properties are concerned your conclusions are correct.

So in summary, for Udep you can use a 100 molecule system with 10 Ang cutoff and be OK. But for the rest of the properties you would want to use 400 molecules with either 10 or 14 Ang cutoffs. Does that sound like a fair assessment?

Yes, I think it does.

ramess101 commented 7 years ago

Please note that Psat and Adep of these simulations are not "exact", because I used DIPPR B2 at 360 K. I will update the plot with TraPPE B2 value soon.

How much of an impact do you expect this to make? I have assumed that the value of B2 makes a minor contribution as long as it is a reasonable estimate. Is that not the case?

jrelliottoh commented 7 years ago

B2 on the isotherm makes a fairly substantial difference. Mostafa is doing a more detailed study right now, but a 10% change in B2(isotherm) might change the Psat's by more than 10%. JRE

On Tuesday, August 22, 2017, 10:59:08 AM EDT, Richard Messerly notifications@github.com wrote:

Please note that Psat and Adep of these simulations are not "exact", because I used DIPPR B2 at 360 K. I will update the plot with TraPPE B2 value soon.

How much of an impact do you expect this to make? I have assumed that the value of B2 makes a minor contribution as long as it is a reasonable estimate. Is that not the case?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ramess101 commented 6 years ago

B2 on the isotherm makes a fairly substantial difference. Mostafa is doing a more detailed study right now, but a 10% change in B2(isotherm) might change the Psat's by more than 10%. JRE

@jrelliottoh @mostafa-razavi

I have come to the same conclusion as I have started to simulate longer n-alkanes. I think this can partially explain why my ITIC results for the Mie potential differ from those reported by Potoff et al. (obtained using GCMC). If you just make typical VLE or log(Psat) plots you might not be able to see the difference:

Propane:

image

image

image

Butane:

image

image

image

Also, since my results and Potoff's are at different temperatures you can't easily compare them directly. But if you plot the percent deviations compared to REFPROP you can see systematic differences.

Propane:

image

image

image

Butane:

image

image

image

It is also possible that if I used GOMC instead of GROMACS we would see improvement. In other words, perhaps the U and Z values obtained by GROMACS are slightly different than what GOMC would give. In this case the difference could be attributed to the simulation engine itself, and not the data analysis.

Also, as you might have noticed, I did not include error bars on these plots. It is possible, though perhaps unlikely, that the uncertainties in the simulation output could reconcile the differences observed between the two methods. Potoff et al. did not report any error bars, but in subsequent studies they have mentioned that the GCMC results might have statistical uncertainties around 0.2% for rhol and 1% for Psat. This alone is not large enough to overlap with my ITIC values but it is a start. My ITIC results don't have error bars mainly because that requires a bit more effort than your typical replicates/block averaging simulation methods.

For theses reasons, I think it would be a great exercise (one that I am willing to contribute to) to perform a sensitivity analysis or Monte Carlo uncertainty propagation for the ITIC approach. For a Monte Carlo approach we would just: 1. Assign some standard deviations to the U and Z values along IT and IC and to B2 and B3 2. Plot the distribution for the resulting rhol, rhov, and Psat values. A sensitivity analysis could also be performed by just varying a single U or Z and seeing how much it impacts rhol, rhov, and Psat. This type of information is crucial I think and could be included in the publication. Specifically, the results from this analysis could help determine if my ITIC values for rhol, rhov, and Psat are statistically different than those reported by Potoff et al.

Here is a demonstration of a simple sensitivity-like analysis that I performed out of necessity:

I wanted to verify that my ITIC analysis was working for pentane, so I used the REFPROP values for U and Z at the ITIC conditions and recalculated the rhov, rhol, and Psat. I had done this previously for ethane, propane, and butane and got very good agreement. However, there was a problem for pentane because the REFPROP equation-of-state can not be evaluated at the highest two densities along the IT. So I tried estimating U and Z. Turns out that my estimates needed to be very precise to get accurate results.

These figures show the percent deviation between my ITIC rhol, rhov, Psat values and those in REFPROP (they should be identical in the limit of infinite ITIC points).

These top figures are the original points that I got using poor estimates of U and Z at the highest two densities of the isotherm:

image

image

image

These bottom figures are after some trial and error of how to best extrapolate U and Z:

image

image

image

Clearly the rhov and Psat at the lowest two Tsats (that correspond to the highest two isochores) are affected strongly by the difference in U and Z. In this case the U and Z values differ by about 5% (which hopefully is larger than simulation uncertainties). But this gives you an idea as to why we need a rigorous UQ analysis.

ramess101 commented 6 years ago

I decided to compare my propane and butane results for the TraPPE model with the values reported by Siepmann, and the results are a bit more encouraging. That is, the ITIC results agree closely with the GEMC values Siepmann obtained.

Propane:

image

image

image

Butane:

image

image

image

These results are acceptable, especially when you consider the fairly large uncertainties assigned to Siepmann's values (1998 GEMC had uncertainties in liquid density around 1-2% and in vapor pressure of 5-10%).

mostafa-razavi commented 6 years ago

These are interesting observations. Thanks for sharing them with us.

I agree that the differences between ITIC and Potoff could be due to use of different simulators. Even Cassandra and GOMC do not always agree with each other (e.g. at high densities) even though they use the same methods. MD simulations especially at low densities suffer from high fluctuations which is why I tend to use MC for those points. One source of disagreements could be B2 and B3 values. I am assuming that you are using REFPROP for B2 and B3. Is that right? If yes, I would expect that to be a source of inconsistency. I use GOMC/Cassandra to get B2 values.

Regarding uncertainty analysis, I tried to generate new values of Z and U using normal distribution function and standard deviation of Z and U. The results were not useful at all because of the large standard deviation values. Our current approach is to calculating Psat and rhoL uncertainties by simply performing ITIC multiple times using different random number generator seeds.

ramess101 commented 6 years ago

@mostafa-razavi @jrelliottoh

One source of disagreements could be B2 and B3 values. I am assuming that you are using REFPROP for B2 and B3. Is that right? If yes, I would expect that to be a source of inconsistency. I use GOMC/Cassandra to get B2 values.

I am using REFPROP for the B2 and B3 values. I have also used MD simulations (Gromacs) to get B2 along the supercritical isotherm, but I did not see much improvement in the case of Potoff because the B2 values were fairly similar. Getting B2 (and B3) along the isotherm from simulation is straightforward, however, how do you get B2 from GOMC/Cassandra at the saturation temperatures? Don't you solve for Tsat iteratively? So you don't know the temperature at which you should perform the low density simulations a priori, right? Do you just use the REFPROP Tsat as a best guess for where you want B2? Also, B2 shouldn't matter much for low Tsat, but I see systematic deviations over the entire temperature range when compared to Potoff. So I don't think the B2 values at low Tsat or along the supercritical isotherm are to blame for the discrepancy with Potoff. By contrast, since the Siepmann results agree very well at lower Tsat, I think B2/B3 could be the reason the results start to diverge at higher Tsat.

Regarding uncertainty analysis, I tried to generate new values of Z and U using normal distribution function and standard deviation of Z and U. The results were not useful at all because of the large standard deviation values.

OK, glad to know you have already looked into this. So if the issue was that you assigned too large of standard deviation values, how did you determine the standard deviations? Based on the fluctuations of a single simulation or replicates at a single state point? The large fluctuations in the liquid-phase virial might cause a single simulation standard deviation for Z to be much larger than what the replicates standard deviation in Z would be.

Were you able to determine which property/state point was causing your analysis results to not be useful? For example, if you set the standard deviations for Z along the isochore to zero, perhaps the uncertainties in rhol and Psat would become reasonable, which would indicate that the standard deviations for Z along the IC were the cause of the poor results.

Also, did you try to account for the correlation between Z and U? Z and U are strongly correlated, so you probably want to have a multivariate distribution function to sample Z and U simultaneously. I think that will reduce the uncertainty somewhat. However, the uncertainty in Z along the isochore is probably more significant than the internal energy. So the real issue might be the large fluctuations in the liquid-phase virial.

Our current approach is to calculating Psat and rhoL uncertainties by simply performing ITIC multiple times using different random number generator seeds.

OK, so now you just perform replicate simulations at each ITIC condition and then perform the ITIC analysis for the first set, second set, etc, and compare the Psat and rhoL results, correct? Have you tried bootstrapping the uncertainties from the replicates, i.e. randomly selecting which replicates are included in the ITIC analysis and repeating this hundreds of times? Bootstrapping is helpful when you only have a handful of replicates at each rho/Temp. It also eliminates the human element of manually deciding which replicates are included in a specific ITIC analysis.