Uncertainties - Githubissues

ramess101 commented 5 years ago

@mrshirts @msoroush @jpotoff

I am comparing the uncertainties obtained from bootstrap resampling combined with MBAR and those reported for histogram reweighting (by Mick et al.). This figure shows that the percent uncertainties agree fairly well for saturated liquid density and heat of vaporization. But the MBAR bootstrapped uncertainties for vapor density and vapor pressure are much larger than those reported for histogram reweighting.

@msoroush and @jpotoff, can you provide some details regarding how the histogram reweighting uncertainties were obtained? I would be very surprised if HR was more precise than MBAR, so I think there must be another explanation for this.

Here is how I compute the MBAR uncertainties:

Choose a random sample (with replacement) of configurations from the total array of snapshots
Perform MBAR analysis
Repeat steps 1-2 approximately 30 times
Compute the 95% confidence interval from distribution of VLE properties

ramess101 commented 5 years ago

@jpotoff

Mick et al. states that they run 5 independent sets of simulations and compute the 90% confidence level from standard deviation and t-statistic. @msoroush the histfiles that you provided me, were those from a single set of simulations? Or were those all five combined?

msoroush commented 5 years ago

@ramess101 The histograms that I uploaded to GitHub was only for a single set of simulation.

ramess101 commented 5 years ago

@msoroush

OK, well then I guess it makes some sense that the MBAR uncertainties are not as small (since they have 1/5th the data).

mrshirts commented 5 years ago

I would say in general that 1) bootstraps should probably have at least 50 to get decent standard deviations (I usually run about 200), and significantly higher (1000?) to get quantiles. 2) computing a standard deviation from 5 simulations is not very accurate; there's a very high error rate there.
Note that it should be quite fast to do bootstrap with histograms. I would very highly recommend making the error analysis consistent between methods.

ramess101 commented 5 years ago

@mrshirts

bootstraps should probably have at least 50 to get decent standard deviations (I usually run about 200), and significantly higher (1000?) to get quantiles.

I agree that 30 bootstraps is not ideal. The reason I don't use more is because solving the MBAR equations is rather slow since each compound has over 1,000,000 frames. It typically takes around a minute to converge, and I wanted to process as many compounds as possible. I am running these computations on my local machine, so I have to run each compound in sequence. Also, the results I plotted are the average uncertainties of the 10-20 compounds that I have processed. So any poorly represented uncertainties should be averaged out. I could probably increase the bootstraps to 50, and/or move my process to our cluster to run these in parallel. That would just take up cores that I was using for simulations.

computing a standard deviation from 5 simulations is not very accurate

I agree, but since we are comparing the 90/95% confidence intervals, using the t-statistic (with only 4 degrees of freedom) should help account for the large uncertainty in the standard deviation to some extent.

I would very highly recommend making the error analysis consistent between methods.

I agree. However, I'm not sure how feasible it is to repeat the histogram reweighting analysis. This would require @msoroush running HR for just the single set of simulations he provided me and including the bootstrap feature. I also don't know if we really need to include such a rigorous comparison of uncertainties for the manuscript, or if we should just report the MBAR uncertainties to demonstrate that MBAR and HR agree to within the combined uncertainty. The validation of MBAR and HR is just the first third of the paper. I don't want to lose sight of the final two thirds, namely, applying MBAR to scale epsilon and applying MBAR when modifying all non-bonded parameters.

mrshirts commented 5 years ago

So any poorly represented uncertainties should be averaged out. I could probably increase the bootstraps to 50, and/or move my process to our cluster to run these in parallel. That would just take up cores that I was using for simulations.

I think for testing, it's fine to do whatever. For a paper, you save time in the long run by collecting enough data and making things consistent. That said, if you are averaging over a lot of molecules, then the average in the standard deviation will cancel out a bit (especially if the standard deviations are similar in magnitude).

I also don't know if we really need to include such a rigorous comparison of uncertainties for the manuscript, or if we should just report the MBAR uncertainties to demonstrate that MBAR and HR agree to within the combined uncertainty.

I think that if the comparison isn't consistent, it will end up confusing people and giving them the wrong impression (MBAR uncertainties are larger than histogram uncertainties), which is not something you want to do when introducing new methodologies and ideas.

I agree, but since we are comparing the 90/95% confidence intervals, using the t-statistic (with only 4 degrees of freedom) should help account for the large uncertainty in the standard deviation to some extent.

Using the t-statistic will correct the bias of using such as small number of samples to calculate statistics, but not the uncertainty.

ramess101 commented 5 years ago

@mrshirts

I think for testing, it's fine to do whatever. For a paper, you save time in the long run by collecting enough data and making things consistent. That said, if you are averaging over a lot of molecules, then the average in the standard deviation will cancel out a bit (especially if the standard deviations are similar in magnitude).

I will average over many molecules, but I can repeat the analysis with 50 bootstraps as well. I could also test the bootstraps we obtain from 30/50 with those obtained with 200 for a single molecule.

I think that if the comparison isn't consistent, it will end up confusing people and giving them the wrong impression (MBAR uncertainties are larger than histogram uncertainties), which is not something you want to do when introducing new methodologies and ideas.

Yeah, I certainly don't want the reader to think that MBAR has larger uncertainties. The real purpose for considering the uncertainties was to show that the deviations between MBAR and HR are typically within the uncertainties. I can demonstrate this without comparing the MBAR and HR uncertainties. I think I could just explain that the MBAR uncertainties are estimated from a single replicate simulation and, therefore, are larger than those reported in the literature.

Using the t-statistic will correct the bias of using such as small number of samples to calculate statistics, but not the uncertainty.

What I meant is that, while the standard deviation of 5 replicates is meaningless, the 90/95% confidence interval computed with a t-statistic will account for the large uncertainty in the standard deviation. Thus, the confidence intervals should be meaningful despite the small sample size.

ramess101 / MBAR_GCMC

Uncertainties #11