ramess101 / MBAR_LJ_two_states

0 stars 1 forks source link

Pressures in condensed states #5

Open ramess101 opened 7 years ago

ramess101 commented 7 years ago

Michael,

I have worked a lot on trying to predict pressures from the PCF-PSO approach. Unfortunately, we need these pressures to be for liquid and/or supercritical states where it becomes very difficult to obtain reliable estimates of P. Basically, even with a very accurate PCF you can get very large differences in P using the PCF-PSO approach.

Because of this short coming we (Richard Elliott and I) are trying to use MBAR to predict pressures for different values of epsilon from a single reference state epsilon (with the same sigma). The phase space overlap is fairly good (i.e we have Neff > 10 and sometimes ~100 from a total of N=1000). We get good estimates of U (when compared to the direct U value for a given epsilon). However, the pressures we obtain do not appear to be very good estimates. Although the uncertainties in P from MBAR are larger than they are for U, they still do not compensate for the discrepancy between the direct simulation P and the MBAR P. I have three questions that might shed some light on the issue.

  1. When scaling the internal energy U by beta, should we be using the instantaneous T? Or the NVT ensemble value of T that is used in the thermostat? Previously I used the fixed value of T but in this example we actually used the snapshot values of T. The fluctuations in T were fairly large so I thought this might lead to poor weighting.

  2. Do the MBAR uncertainties adequately represent the very large scatter in the virial that you obtain from liquid simulations? The MBAR uncertainties just seem so much smaller than the scatter in the virial (pressure) would suggest.

  3. Is it important to make sure the snapshots are not highly correlated? Currently I am using snapshots every 1 ps, just so that we have 1001 snapshots from a 1 ns run. However, I noticed in your original lj_shirts MBAR code you only captured every 20 snapshots (i.e. every 20 ps). I assume this is to account for correlated data. Would correlated snapshots (i.e. sampled every 1 ps) lead MBAR to underestimate the error?

Thanks

Rich

mrshirts commented 7 years ago

When scaling the internal energy U by beta, should we be using the instantaneous T? Or the NVT ensemble value of T that is used in the thermostat? Previously I used the fixed value of T but in this example we actually used the snapshot values of T. The fluctuations in T were fairly large so I thought this might lead to poor weighting.

The instantaneous T is not a T, it's a function of the kinetic energy, but not related to the thermodynamic T except indirectly. The beta should always correspond to the reservoir T, which is a constant (hence constant N, V, and T).

Do the MBAR uncertainties adequately represent the very large scatter in the virial that you obtain from liquid simulations? The MBAR uncertainties just seem so much smaller than the scatter in the virial (pressure) would suggest.

They should. MBAR is pretty good about uncertainties in the large N limit (which should be satisfied with ~100 or more samples). What equations are you using? I would probably suggest reweighting the virial, and adding the ideal gas term in after (since that's exact -- though I guess it's a constant, so it's irrelevant). I would have to see some more data to know. The MBAR uncertainty should be bigger than the uncertainty in the virial in the straight pressure calculation, by (approximately) a factor of sqrt(N_simple_average/N_eff). So the scatter in the virial is not the important thing to compare to, but the scatter/sqrt(N_simple_average).

Would correlated snapshots (i.e. sampled every 1 ps) lead MBAR to underestimate the error?

Yes, definitely. For the same reason that the uncertainties will be too high for a simple average if you used correlated data.

mrshirts commented 7 years ago

If you provide a bit more math, I can try to provide some more information. I assume you are just using MBAR to reweight the virials?

Also, this is where NPT might be more useful, and where rmin-scaling would work. But if course it depends on if the method requires NPT vs NVT simulations.

ramess101 commented 7 years ago

OK, Dr. Elliott and I are going to repeat the MBAR analysis this afternoon. Once we have those values I will get back to you with more precise numbers and uncertainties.

Hopefully using the reservoir T instead of the T_KE corrects the weighting of the different samples. It is a bit surprising though since the internal energies were quite accurate while the pressures were off. Maybe the pressures are more sensitive to the configurations.

mrshirts commented 7 years ago

It is a bit surprising though since the internal energies were quite accurate while the pressures were off. Maybe the pressures are more sensitive to the configurations.

The pressures may indeed be more sensitive. I would try to reweight just virials, and then correct analytically to pressures.

ramess101 commented 7 years ago

I wanted to clarify one point on the correlated data question. In my question I said that correlated snapshots would lead to too low of uncertainties while in your response you said that correlated data would lead to too high of uncertainties. Did you mean to say "too low"?

The problem might be that we are not being specific enough in our definition of "uncertainty." In my experience, by using larger block sizes (by decorrelating the data) you get smaller standard deviations but larger standard errors. This is because even though standard deviation decreases you also have a smaller number of samples (N). So, in other words, I believe that correlated data leads to too high of standard deviations but too low of standard errors.

This plot shows how standard deviation decreases with increasing block size (this was originally included in my first publication):

image

And this plot (from the famous paper of Flyvbjerg) shows the same trend but they are plotting standard error (sigma) with respect to number of block transformations (note that the previous figure has 4 block transformations when going from block size of 5000 to block size of 80000)

image

Flyvbjerg's paper is a bit dense and I have not read it in a few years. It is not entirely clear as to whether sigma is standard error or standard deviation, but this is how I interpret it. Equation 31 in the Flyvbjerg paper seems to clarify that sigma(m) is different than sigma(x), suggesting that sigma(m) is standard error and sigma(x) is standard deviation.

image

I admit that it is not normal to use sigma for standard error but whenever I perform Flyvbjerg's analysis I get a result that is an order of magnitude different than the replicate approach. For example:

image

This suggests to me that, despite the strange syntax, Flyvbjerg's sigma is actually standard error.

At any rate, let me know if you have a different way of thinking about the uncertainties of correlated data.

ramess101 commented 7 years ago

Also, this is where NPT might be more useful, and where rmin-scaling would work. But if course it depends on if the method requires NPT vs NVT simulations.

Unfortunately, with the approach that we are using (ITIC) it does need to be NVT. ITIC integrates over isothermal and isochoric paths to obtain the saturated liquid density and vapor pressure. All that is needed for ITIC is the U and P at a given V,T. If we can get either PCF or MBAR to work with ITIC we could predict saturated state properties very quickly.

mrshirts commented 7 years ago

I wanted to clarify one point on the correlated data question. In my question I said that correlated snapshots would lead to too low of uncertainties while in your response you said that correlated data would lead to too high of uncertainties. Did you mean to say "too low"?

Yes, apologies. Using correlated data makes estimated uncertainties lower than if one ran the simulation N times.

mrshirts commented 7 years ago

suggesting that sigma(m) is standard error and sigma(x) is standard deviation.

My understanding is that "standard error of the mean" is strictly dependent on the number of samples (N^(-1/2)), and the standard deviation of an experiment is (roughly, with enough data) independent of the number of samples. Saying "standard error of the mean" makes things clearer. With MBAR, of course, it's not a "mean", though it is an expectation, and it is still dependent on N^(-1/2).

I haven't studied the extent to which using correlated data can make the standard deviation too high, unless it's just the small N sample limit -- if there are too few blocks, then the standard deviation is often an underestimate. But this goes away with 4-10 samples.

mrshirts commented 7 years ago

All that is needed for ITIC is the U and P at a given V,T. If we can get either PCF or MBAR to work with ITIC we could predict saturated state properties very quickly.

Got it. One could do NPT simulations and invert the P(V) curve, but I'm guessing that's more expensive than is worth it.

So reweighting the virial . . . one need to collect samples that properly estimate the integral \int \sum rdU/dr exp(-beta U(r)) . . . I'm not sure if there are any clever coordinate scalings that would work. it would be interesting to see a correlation between the magnitude of of the weight and the virials samples, compared to the correlations of weights and energies.

How bad is PCF at calculating the virial using the virial equation + scaling? It's not that clear to me how bad it is versus, say predicting the average potential energy. Maybe some sample code to play with would help.

mrshirts commented 7 years ago

Seems like should be able to calculate the pressure through MBAR + http://www.sklogwiki.org/SklogWiki/index.php/Test_volume_method + including using scaling . . . I can think about this if it test volume seems like a reasonable thing to be doing. Note that rescaling by rmin in PCF should have some parallels to test volume.

ramess101 commented 7 years ago

How bad is PCF at calculating the virial using the virial equation + scaling? It's not that clear to me how bad it is versus, say predicting the average potential energy. Maybe some sample code to play with would help.

The error for the PCF approach is manageable when just varying epsilon and lambda and they at least have the right trends. However, the dependence of P on sigma is completely wrong, so you can get errors around 100%. For example, increasing sigma should increase P for a given NVT but it instead decreases the pressure. This is unfortunate because for ITIC the pressure with respect to T for a given isochore is how the saturated liquid density is obtained, which is intimately connected to sigma.

I have noticed that the problem is only really observed when trying to predict liquid pressures. A vapor phase pressure is typically quite accurate. I think this is because for the vapor densities the ideal portion is the primary contribution, so the error of histogramming the PCF is not as significant for vapor phase (even though the PCF is much smoother for the liquid phase). Again, this is unfortunate because for ITIC we only care about the condensed phase pressures.

I do have some questions/concerns about the Gromacs output. The calculated P from the PCF can differ by around 100% even for the same force field that was used for determining the ensemble average. In other words, it appears that the fundamental equation for estimating P is incorrect. Some deviation between PCF and ensemble averages is expected, but for U the deviations are about 0.2%. I have been trying to figure out why this is the case. I have gone through my code numerous times and I think it is working properly. So I have now been focusing on how Gromacs obtains P and I have two questions. First, I looked at kinetic energy since KE is used for calculating P. I noticed that the KE in the Gromacs output does not appear to be the value I expected. Specifically, Gromacs gives a value around a factor of 2 greater than the KE_IG of 3/2NkT. I looked in the Gromacs manual and found on page 25 that the value of "3N" in KE is actually 3N - Ncons - Ncom where Ncons is the number of constraints and Ncom is usually 3. This still does not agree with KE from Gromacs output (it actually makes things worse). Is there a reason KE does not agree with the 3/2NkT value? Second, I was wondering if the virial that Gromacs calculates accounts for the constraint forces to keep the ethane bonds fixed (I am using LINCS). This could explain why the pressure is different since I only account for nonbonded forces. Basically, I am trying to figure out why my P calculated with the PCF is so different from the ensemble average. Again, this problem is primarily observed in the condensed phase.

I will try to get a file for you to look at soon. Right now Richard and I have been working on ITIC and MBAR so I haven't had much time to figure out how to get PCF to work better.

ramess101 commented 7 years ago

Seems like should be able to calculate the pressure through MBAR + http://www.sklogwiki.org/SklogWiki/index.php/Test_volume_method + including using scaling . . . I can think about this if it test volume seems like a reasonable thing to be doing. Note that rescaling by rmin in PCF should have some parallels to test volume.

This is an interesting idea that could work better than the virial approach. I will think more about this.

mrshirts commented 7 years ago

Second, I was wondering if the virial that Gromacs calculates accounts for the constraint forces to keep the ethane bonds fixed (I am using LINCS).

Yes, it does account for those forces, since those are forces involved in the integration. This has been checked pretty carefully; if you don't include those, all the predictions go haywire.

I noticed that the KE in the Gromacs output does not appear to be the value I expected.

For, say, 100 ethane molecules, I would expect the kinetic energy to be 1/2 kBT N_DOF, where N_DOF would be 2003 - 3 - 100 = 497, since there are 100 bonds, and each rigid bond removes one DOF. I'd have to see numbers to see if this agreed.

As I said a couple of times, I think you want to try reweighting the virial, and then add the KE term analytically (removes noise from reweighting). Whenever I used MBAR (or any method), I always try to do kinetic energy stuff analytically (since it's exact), and just calculate corrections.

The fact that the virial includes constraints might make this a bit tougher -- I hadn't thought about that.

ramess101 commented 7 years ago

Yes, it does account for those forces, since those are forces involved in the integration. This has been checked pretty carefully; if you don't include those, all the predictions go haywire.

OK, so that probably explains why the pressures I get from Gromacs show such strong discrepancies from using the PCF pressure equation. By contrast, when I use Towhee (Monte Carlo) the pressures agree quite nicely. I think this is because in MC you do not need to use constraints so the forces are strictly from the nonbonded interactions. Also, MC makes the ideal gas contribution more straightforward as it does not depend on KE (since MC knows nothing about velocities).

For, say, 100 ethane molecules, I would expect the kinetic energy to be 1/2 kBT N_DOF, where N_DOF would be 2003 - 3 - 100 = 497, since there are 100 bonds, and each rigid bond removes one DOF. I'd have to see numbers to see if this agreed.

OK, I see the mistake in my math. It is working now. In your example I was using 100 3-3-100 instead of 200 3-3-100. Although, I guess I am confused why this worked since I thought kinetic theory says that for a monatomic, diatomic, or polyatomic KE is still just 3/2 RT. In statistical mechanics the translational energy contribution is always 3/2 NkT, right?

As I said a couple of times, I think you want to try reweighting the virial, and then add the KE term analytically (removes noise from reweighting). Whenever I used MBAR (or any method), I always try to do kinetic energy stuff analytically (since it's exact), and just calculate corrections.

Yes, I agree. Reweighting the virial and then adding the KE term is a better approach.

The fact that the virial includes constraints might make this a bit tougher -- I hadn't thought about that.

Yes, the constraints are a bit of a nuisance, especially for the PCF-PSO approach. I knew the value of the constraint force would depend on temperature but I was hoping that the constraint forces would be constant with respect to the nonbonded interactions. But that does not seem to be the case. For example, this plot shows the constraint force with respect to temperature for TraPPE and Potoff (both use a 0.154 nm fixed bond length):

image

Although these deviations are small relative to the total pressure of the condensed phase, they do lead to erroneous results. Namely, if you use the constrained force pressure from the TraPPE model with the Potoff model, you get a lower saturation temperature. This is because the deviations are strongest at low T, which is most important for ITIC. You can see that on this plot where the intersection with Z=0 is an approximation for Tsat. The difference is only about 1.5 K, but this is a "best case" scenario.

image

So I guess my question is, is there any way to "predict" what the constraint force is going to be post simulation? I.e. if I know the constraint force for TraPPE, could I predict the constraint force for Potoff? I assume the difference is that for TraPPE the molecules prefer a slightly more compact configuration while for Potoff they spread out a little bit more. So maybe it scales with sigma or rmin as well. Although, at higher T the difference seems to disappear.

mrshirts commented 7 years ago

OK, so that probably explains why the pressures I get from Gromacs show such strong discrepancies from using the PCF pressure equation.

Hmm. The pressures for similar configurations should still be similar with and without constraints -- this can be seen because the density is pretty similar with constraints and with harmonic bonds oscillators. The virial will obviously be different, since the kinetic energy is different -- the virial essentially must cancel out the different kinetic energy So I think there's something weird going on here that we are missing. Do you have some numbers on the virial?

KE is still just 3/2 RT. In statistical mechanics the translational energy contribution is always 3/2 NkT, right?

The hard and fast rule is that the KE is 1/2 kBT times the number of DOF. If there are no constraints and N atoms, this is 3/2 N kBT. (3/2 N-3 * KbT fir periodic boundaries, if COM motion is neglected). Adding harmonic bonds does not change the number of DOF, it just changes them from translational to vibrational.

For example, this plot shows the constraint force with respect to temperature.

By constraint force, you mean "contribution to the virial due to constraint forces?" I think that it should only be the total virial that matters, not the constraint part, but I need to think through this more (which means I need to be fiddling with an example ;))

Also, are harmonic bond constants defined in the topology? I wonder if that affects things. It certainly will affect the constraint forces, since the corrections will be smaller, and more of the virial will be shifted into the "regular virial". So what is in the constraint virial and force virial will change, even if the constraints are the same. This makes me think that really, the total virial is the one to worry about.

When comparing to PCF, probably you should make sure it all works for "standard averages" before looking at changes in parameters. Does it work in that "null change" case?

ramess101 commented 7 years ago

When comparing to PCF, probably you should make sure it all works for "standard averages" before looking at changes in parameters. Does it work in that "null change" case?

Let me clarify how I am making my comparisons and conclusions. I am performing a simulation of ethane with the TraPPE model using LINCS constraints. I am generating the RDF for the CH3 interactions. I am using the "pressure equation" that allows you to calculate P from the RDF. And I am comparing that P with the P you get from the ensemble average. Now we already know that there is inevitably going to be some difference between the Ps. We saw this for U, and we know that it has to do with the averaging in the RDF, etc. But for P the deviation between the P_RDF and P_ens is greater (on a percent basis). For U it was always less than 0.5% while for P (in the liquid phase) it is ~25%. I know P is very sensitive to the RDF but this is still larger than I would have thought. That is why I believe that the difference is because in my pressure equation (i.e. P = P_ideal -2/3pi rho^2 \int g du/dr r^3 * dr) I do not have a P_constraint. So this figure was obtained by taking P_ens - P_RDF (which strictly speaking is not just the constraint force, it also includes the uncertainties in the RDF method):

image

Hmm. The pressures for similar configurations should still be similar with and without constraints -- this can be seen because the density is pretty similar with constraints and with harmonic bonds oscillators. The virial will obviously be different, since the kinetic energy is different -- the virial essentially must cancel out the different kinetic energy. So I think there's something weird going on here that we are missing. Do you have some numbers on the virial?

Here are some numbers to help you get a feel for what I am seeing.

TraPPE at 155 K

From Gromacs (I have uploaded the log file as TraPPE_155.log): KE: 1286.57 kJ/mol or 393.63 bar Total virial (the average of the three main diagonal elements): 736.009 kJ/mol or -674.53 bar Pressure: -280.893 bar

From PCF: KE: 1286.57 kJ/mol or 393.63 bar (in exact agreement with Gromacs) PCF integral: 812.89 kJ/mol or -746.12 bar Pressure: -352.41 bar

The deviation between the pressures is about 69.8 bar (which is roughly 25% of P_ens). This deviation arises entirely from the difference in the total virial and the PCF integral.

If we instead use a different KE (where we do not account for constraints) for the PCF value we can get:

From PCF: KE: 1544.56 kJ/mol or 472.56 bar (this is without subtracting the constraint degrees of freedom) PCF_integral: 812.89 kJ/mol or -746.12 bar (same as before) Pressure: -273.56 bar

The deviation in this case is only 9 bar, which would be encouraging EXCEPT that the deviation gets much worse with increasing temperature:

image

In other words, if we use the KE without subtracting the constraint DOF we get a much larger deviation from the ensemble pressure.

In conclusion, the deviation between P_ens and P_PCF is due to one of two sources. 1. Not properly accounting for constraint forces 2. The natural deviation that arises in using a PCF.

Is there anyway to improve the precision that gmx rdf is providing? The output only has r and g(r) to three decimals. Maybe improved precision could improve the agreement? Or maybe more bins? The g(r) looks really smooth and worked well for U but perhaps P is more sensitive, especially in the initial ascent.

ramess101 commented 7 years ago

Also, are harmonic bond constants defined in the topology? I wonder if that affects things. It certainly will affect the constraint forces, since the corrections will be smaller, and more of the virial will be shifted into the "regular virial". So what is in the constraint virial and force virial will change, even if the constraints are the same. This makes me think that really, the total virial is the one to worry about.

Harmonic bond constants are defined but I saw no effect on the pressures when I changed the bond constant. I assumed that LINCS didn't use the bond constant since the bonds are fixed. Is that not the case?

ramess101 commented 7 years ago

By constraint force, you mean "contribution to the virial due to constraint forces?" I think that it should only be the total virial that matters, not the constraint part, but I need to think through this more (which means I need to be fiddling with an example ;))

Hopefully my previous comment explained this. Constraint force was really P_ens - P_PCF. Is there a way in Gromacs to have it output the constraint virial separated from the nonbonded virial?

mrshirts commented 7 years ago

Is there a way in Gromacs to have it output the constraint virial separated from the nonbonded virial?

It would have to be hacked in -- not too difficult, but my point above is that the separation isn't that meaningful, since you can shuffle the components back and forth.

mrshirts commented 7 years ago

Hmm. Still thinking about the above.

Quick question to try to rule out different issues 1) What happens for methane? Do MBAR and the virial equation agree there?
2) What happens for harmonic bonds for ethane? Do MBAR and the virial equation agree there?

ramess101 commented 7 years ago

Quick question to try to rule out different issues

What happens for methane? Do MBAR and the virial equation agree there? What happens for harmonic bonds for ethane? Do MBAR and the virial equation agree there?

So I have actually not run MBAR with these, since I already observed that TraPPE and Potoff had N_eff of 1 (so maybe this issue should be in a different project since it has nothing to do with MBAR). But I guess the point of all of this is just trying to figure out why the P_ens and P_PCF are so different (without changing the force field).

Rerunning with methane is a logical way to address where the issue really lies. I will look into this.

Harmonic bonds do actually change the pressure quite a bit. This is something that Richard Elliott has done a lot of research on. He has concluded that the pressures obtained with fix bond lengths and harmonic bonds are systematically different. For example, if I use the harmonic force constant that de Pablo recommended for ethane, I get these results at the same conditions as described before (155 K TraPPE):

Ensemble averages: KE: 1544.02 kJ/mol (note that now KE agrees with what I calculated before without constraints) Virial: 983 kJ/mol or -902 bar Pressure: -430.3 bar

So the deviation looks slight worse now using a harmonic bond (around 80 bar). We figured that using the fixed bond length was the best approach. But I guess I just need to understand exactly how the bond constraints are impacting the virial, i.e. how I could predict the virial from the constraints for a different force field (if it should change at all).

ramess101 commented 7 years ago

Rerunning with methane is a logical way to address where the issue really lies. I will look into this.

OK, so when I performed this analysis with argon (I used argon since it is the same purpose as UA-CH4 and I already had the argon files) I still see a significant error.

Ensemble: Pressure: -5.345 bar

RDF: Pressure: -7.115 bar

Although the difference is only 1.77 bar it is about 33%. Maybe this is as good as we can do because the fluctuations in pressure are so large, since this is a liquid simulation at a temperature near the saturation temperature. So I think this suggests that it is not the constraints that are to blame, it is just the RDF approach for estimating pressure has a strong deviation from the ensemble approach.

I guess I need to look into ways to improve the precision of the RDF that I get from Gromacs with the goal being to reduce this deviation in P.

mrshirts commented 7 years ago

Ah, great. I suspected constraints weren't the source problem. When you say the difference is 1.77 bar -- I do think that it's probably better to look at the differences in the virial, since the kinetic energy difference should be independent -- although the difference should still be 1.77 bar, the percent error would be a bit different. More a matter of using the right reference to understand how big an approximation is.

Hmm. If the PCF is too negative, that would indicate that the integral is too large in magnitude, which would suggest that it's not a problem with, say, omitting a tail correction, which should have a positive contribution (-(-6r^-6) g(r)r^2) to the virial. Seems like this is a problem people must have noticed in the past for Lennard-Jones. Or possibly you are omitting the r^-12 term?

mrshirts commented 7 years ago

So I have actually not run MBAR with these, since I already observed that TraPPE and Potoff had N_eff of 1 (so maybe this issue should be in a different project since it has nothing to do with MBAR).

Yes, sorry, I was misspeaking - I just meant comparing the pressure with application of PCF and direct calculation of the virial from simulations.

mrshirts commented 7 years ago

Somewhat off this topic (in trying to find PCF papers), but still relevant. https://spiral.imperial.ac.uk/bitstream/10044/1/262/1/edm2006jcp.pdf

mrshirts commented 7 years ago

Shorter from above: The pressure calculated from the PCF and from the direct virial computation carried out by GROMACS (or other codes) are the same up to bookkeeping, so if they don't agree, we're missing something.

ramess101 commented 7 years ago

Ah, great. I suspected constraints weren't the source problem. When you say the difference is 1.77 bar -- I do think that it's probably better to look at the differences in the virial, since the kinetic energy difference should be independent -- although the difference should still be 1.77 bar, the percent error would be a bit different. More a matter of using the right reference to understand how big an approximation is.

That is a good point about how to look at the error. I should have been taking the percent error relative to the virial only. In that case, since the virial is about 284 bar for argon, the error is less than 1% for argon. By contrast, if we take the ~80 bar error for ethane relative to the virial that is around ~800 bar, we see an error of about 10%. So the error does look a lot bigger (an order of magnitude) for ethane than for argon. So maybe the constraint virial is actually significant... Or, the PCF pressure equation is not exact for diatomics, i.e. there are some approximations that we are neglecting. I believe the derivation is still true for diatomics but that might be something to look into. I did also run the argon simulations for 6 times as long and with twice as many molecules. So maybe the PCF just needs to be extremely accurate. I will try running ethane for longer and with more molecules.

mrshirts commented 7 years ago

I would suggest doing some bootstrapping error analysis for the PCF and for the virial. If you have decorrelated samples then you can

  1. Select with replacement from the frames ~200 times
  2. compute ~200 PCF's from these samples
  3. calculate the ~200 pressures from the PDFS.

The std of the 200 pressures should give a good sense of the error.

In general - good error estimates make it easier to see if things are statistically convergent or not

mrshirts commented 7 years ago

I believe the derivation is still true for diatomics but that might be something to look into

Yes, I think it is, too, from what I recall.

mrshirts commented 7 years ago

It shouldn't be totally surprising the pressure should have relatively large error, since fluids are fairly incompressible.

ramess101 commented 7 years ago

I would suggest doing some bootstrapping error analysis for the PCF and for the virial. If you have decorrelated samples then you can

Select with replacement from the frames ~200 times compute ~200 PCF's from these samples calculate the ~200 pressures from the PDFS. The std of the 200 pressures should give a good sense of the error.

In general - good error estimates make it easier to see if things are statistically convergent or not

Rather than performing the bootstrap analysis, I decided to just run 10 independent simulations (so I wouldn't need to worry about correlated data). The result was that all 10 simulations had approximately the same deviation.

For example, at 155 K the deviation from P_ens and P_PCF was between 65-70 bar for all 10 replicates. P_ens varied between -280 and -282 while P_PCF varied between -349 and -352.

I think this demonstrates that there is a definitive difference between P_ens and P_PCF. I do not know what to attribute this to. The KE contribution and the tail corrections are identical from the PCF and Gromacs calculations. So the only difference is in the virial.

ramess101 commented 7 years ago

It shouldn't be totally surprising the pressure should have relatively large error, since fluids are fairly incompressible.

Based on the replicate simulations I performed, the P_ens uncertainty is fairly small. Although the fluctuations are quite large the averages from the independent runs are within 1 bar of eachother.