mostafa-razavi / ITIC-paper

0 stars 0 forks source link

Reviewer 3 Comment 17,18, 19: Finite Size Effects, uncertainty #24

Open mostafa-razavi opened 5 years ago

mostafa-razavi commented 5 years ago

Reviewer 3 Comment 17:

The manuscript states that "we need 3200 ethane molecules to achieve low enough uncertainties..." It's unclear why the authors have coupled the uncertainty to the system size. The authors have apparently chosen to run each system size for the same number of MD steps, but this is unfair and especially so if the MC simulations were also run for the same number of MC steps regardless of system size. MD CPU time per step increases with system size, but the simulation will yield increasing precision with system size (commensurate with the additional effort). In MC, the CPU time and uncertainty are both relatively independent of system size for a given number of steps. It would make for a fairer comparison to run the MD simulations for a number of steps that is inversely proportional to system size. Although the number of steps for MC is described in section 4, the number of MD steps does not seem to be specified.

Our response First, we would like to emphasize that the statement "we need 3200 ethane molecules to achieve low enough uncertainties..." only refers to MD rigid. We modified second paragraph in Section 6 to clearly reflect this point. Furthermore, in light of reviewer’s comment, we replotted Figure 9a and 9b by modifying the number of MD steps depending on the number of molecules. The number of MD steps used in calculating the averages is now inversely proportional to the number of molecules, as specified in the caption of Figure 9. We also increased the number of samples by splitting each run into two blocks, hence eight blocks of data are used to calculate error bars based on 95 % confidence intervals.

image

mostafa-razavi commented 5 years ago

Reviewer 3 Comment 17: Comment 18

I find the behavior in figure 9b and the attribution to lack of equilibration a bit disturbing. United-atom ethane is a very simple molecule and the bond length should not need much time to equilibrate in the vapor phase. Have the authors been able to collect better data using longer equilibration times? Does the divergence at low density at least decrease with increasing equilibration length? This should not be hard for the smallest system size (N=120). If not, then I suspect something is just broken.

Our response: Running long simulations revealed that the behavior in Figure 9b is not due to lack of equilibration. Therefore, we removed the sentence in the manuscript that attributed this behavior lack of equilibration. As discussed in our response to Comment 6.1 of Reviewer 2, the divergence of pressure is due to inaccuracies of intramolecular bonded virial at low densities. In Supplementary Material, we illustrate a method to correct for these inaccuracies if the simulation package provides different virial contributions, i.e. pair virial, bond virial, kintetic virial, etc. Below is the paragraphs containing the above explanation.

image

mostafa-razavi commented 5 years ago

Reviewer 3 Comment 19:

The manuscript describes the relative standard deviation between runs (STD2) being much smaller than the average standard deviation within a run (STD1). Is STD1 then the standard deviation of the samples (how much Z fluctuates from configuration to configuration)? Or is it based on block averages? And STD2 shows how much the average value (from a simulation) fluctuates when the simulation is repeated? It's unclear why these values are being reported in this way, or why STD1 is reported at all. The authors chose to use STD2 for uncertainty (error bars), but this is inappropriate if the value being plotted is the average across the runs. Please compute a proper uncertainty ("standard deviation of the mean", "standard error", etc.) for error bars. Also, computing a standard devation from 4 points is not likely to yield an accurate result. Uncorrelated block averages are more likely to yield a good estimate of the uncertainty.

Our response" We agree with the reviewer that 4 samples is not enough for computing reliable uncertainties, however we believe that block averaging can also be problematic when the block lengths are too small that the data remain correlated. Therefore, we now combine the two approaches, i.e. calculating uncertainties from replicate simulations and from block averaging. In the current text, uncertainties are calculated based on 95 % confidence intervals from eight decorrelated samples, i.e. two blocks on four separate runs. Note that, since most of the data in the table that contained STD1 and STD2 data is plotted in Figure 9, we decided to move it to Supplementary Material.

ramess101 commented 5 years ago

@mostafa-razavi

I like each of these responses. They are very clear, succinct, respectful, and they show that we are not afraid to do a little extra work.

ramess101 commented 5 years ago

@mostafa-razavi

I do have one concern though. This statement:

Note that, since most of the data in the table that contained STD1 and STD2 data is plotted in Figure 9, we decided to move it to Supplementary Material.

I was hoping this meant that you had moved a similar table to the SI, but not the exact same one with STD1 and STD2. The version of supporting information that is on GitHub still has the STD1 and STD2 values. Are we going to change those? We should really only provide the table that uses 4 separate runs with 2 blocks in each. Plus we want to report the 95% confidence intervals so that it accounts for only having 8 independent samples.

ramess101 commented 5 years ago

@mostafa-razavi

The caption to Figure 9 has a typo. B2+B3eff "rho" instead of \rho for the symbol rho.

mostafa-razavi commented 5 years ago

I was hoping this meant that you had moved a similar table to the SI, but not the exact same one with STD1 and STD2. The version of supporting information that is on GitHub still has the STD1 and STD2 values. Are we going to change those? We should really only provide the table that uses 4 separate runs with 2 blocks in each. Plus we want to report the 95% confidence intervals so that it accounts for only having 8 independent samples.

It's in my todo list :)

The caption to Figure 9 has a typo. B2+B3eff "rho" instead of \rho for the symbol rho.

Corrected. Thanks

ramess101 commented 5 years ago

@mostafa-razavi

It's in my todo list :)

OK, let me know if I can help with anything

mostafa-razavi commented 5 years ago

@ramess101 I found this figure in MBAR/ITIC repository. Can we just include this in SI?

image

ramess101 commented 5 years ago

@mostafa-razavi

Yeah I think we just need to include this figure in SI and mention in response

ramess101 commented 5 years ago

@mostafa-razavi

Could you not find a figure that was a little cleaner than this one?

mostafa-razavi commented 5 years ago

@ramess101 No this was the only relevant figure. I found it from here https://github.com/ramess101/MBAR_ITIC/issues/2#issuecomment-323055255

ramess101 commented 5 years ago

@mostafa-razavi

Alright, well in the manuscript we should mention somewhere in Section 6 that a preliminary investigation of finite size effects in the condensed phase is presented in supplementary material.

mostafa-razavi commented 5 years ago

Alright, well in the manuscript we should mention somewhere in Section 6 that a preliminary investigation of finite size effects in the condensed phase is presented in supplementary material.

I'm on it.

mostafa-razavi commented 5 years ago

@jrelliottoh @ramess101

I performed very long simulations of ethane (48 ns) using flexible bond in LAMMPS and replotted Figure 9b. It seems like flexible-bond C2 eventually converges if you run it long enough. I also plotted different pressure and energy contributions of flexible C2 simulations. The need for such long simulation times for a simple molecule such as C2 is still due intramolecular energies. What should we do now? Should be include the new Figure 6b or keep old Figure 6b? Our general recommendation for preferring fixed-MD is still true, because even with 48 ns of simulation flexible-MD still suffers from large uncertainties and noise at low densities for such a simple molecule. I think we need to update Figure 6b and reformulate this paragraph carefully so that we don't have to suffer another round of review. I'm not sure what we should do about the Figure 15 in SI. Should we remove it or keep it?

image

Here is the new plot (Figure 6b was run for 48 ns): image Here is the old plot (Figure 6b was run for 16 ns): image

Plots of different pressure and energy contributions of flexible C2

N=120

Eq-50ns

N=400

Eq-50ns

N=1600

Eq-50ns

N=3200

Eq-50ns

ramess101 commented 5 years ago

My vote would be to mention the need for much longer simulations, move the 48 ns to the SI and get rid of the Figure 15 in SI. Not sure what our ethical obligation it is to mention this in our review, since our previous review satisfied this concern.

On Tuesday, April 2, 2019, mostafa-razavi notifications@github.com wrote:

@jrelliottoh https://github.com/jrelliottoh @ramess101 https://github.com/ramess101

I performed very long simulations of ethane (48 ns) using flexible bond in LAMMPS and replotted Figure 9b. It seems like flexible-bond C2 eventually converges if you run it long enough. I also plotted different pressure and energy contributions of flexible C2 simulations. The need for such long simulation times for a simple molecule such as C2 is still due intramolecular energies. What should we do now? Should be include the new Figure 6b or keep old Figure 6b? Our general recommendation for preferring fixed-MD is still true, because even with 48 ns of simulation flexible-MD still suffers from large uncertainties and noise at low densities for such a simple molecule. I think we need to update Figure 6b and reformulate this paragraph carefully so that we don't have to suffer another round of review. I'm not sure what we should do about the Figure 15 in SI. Should we remove it or keep it?

[image: image] https://user-images.githubusercontent.com/16358113/55443698-b2c66780-5581-11e9-9a9a-14352e13a407.png

Here is the new plot (48 ns): [image: image] https://user-images.githubusercontent.com/16358113/55443424-b3aac980-5580-11e9-8be0-7c5e50217ade.png Here is the old plot (~16 ns): [image: image] https://user-images.githubusercontent.com/16358113/55440210-f9fa2b80-5574-11e9-92e3-e317e2d8193e.png Plots of different pressure and energy contributions of flexible C2 N=120

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440368-67a65780-5575-11e9-9893-d92b92b99019.png N=400

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440377-6ffe9280-5575-11e9-991c-fcae951fc819.png N=1600

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440391-7a209100-5575-11e9-8d75-2b4f6939f747.png N=3200

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440400-83a9f900-5575-11e9-9150-77769aacf2d9.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mostafa-razavi/ITIC-paper/issues/24#issuecomment-479266470, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUvhB1Bl8hloR6vREAa3P-hyebbp7xlks5vc_ImgaJpZM4ZcpkC .

ramess101 commented 5 years ago

From this new plots it seems pretty clear that the energy takes a long time to equilibrate. So maybe we should just replace Figure 9b and emphasize that equilibration/production was 3 times longer.

On Tuesday, April 2, 2019, Richard Messerly ramess101@gmail.com wrote:

My vote would be to mention the need for much longer simulations, move the 48 ns to the SI and get rid of the Figure 15 in SI. Not sure what our ethical obligation it is to mention this in our review, since our previous review satisfied this concern.

On Tuesday, April 2, 2019, mostafa-razavi notifications@github.com wrote:

@jrelliottoh https://github.com/jrelliottoh @ramess101 https://github.com/ramess101

I performed very long simulations of ethane (48 ns) using flexible bond in LAMMPS and replotted Figure 9b. It seems like flexible-bond C2 eventually converges if you run it long enough. I also plotted different pressure and energy contributions of flexible C2 simulations. The need for such long simulation times for a simple molecule such as C2 is still due intramolecular energies. What should we do now? Should be include the new Figure 6b or keep old Figure 6b? Our general recommendation for preferring fixed-MD is still true, because even with 48 ns of simulation flexible-MD still suffers from large uncertainties and noise at low densities for such a simple molecule. I think we need to update Figure 6b and reformulate this paragraph carefully so that we don't have to suffer another round of review. I'm not sure what we should do about the Figure 15 in SI. Should we remove it or keep it?

[image: image] https://user-images.githubusercontent.com/16358113/55443698-b2c66780-5581-11e9-9a9a-14352e13a407.png

Here is the new plot (48 ns): [image: image] https://user-images.githubusercontent.com/16358113/55443424-b3aac980-5580-11e9-8be0-7c5e50217ade.png Here is the old plot (~16 ns): [image: image] https://user-images.githubusercontent.com/16358113/55440210-f9fa2b80-5574-11e9-92e3-e317e2d8193e.png Plots of different pressure and energy contributions of flexible C2 N=120

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440368-67a65780-5575-11e9-9893-d92b92b99019.png N=400

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440377-6ffe9280-5575-11e9-991c-fcae951fc819.png N=1600

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440391-7a209100-5575-11e9-8d75-2b4f6939f747.png N=3200

[image: Eq-50ns] https://user-images.githubusercontent.com/16358113/55440400-83a9f900-5575-11e9-9150-77769aacf2d9.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mostafa-razavi/ITIC-paper/issues/24#issuecomment-479266470, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUvhB1Bl8hloR6vREAa3P-hyebbp7xlks5vc_ImgaJpZM4ZcpkC .

ramess101 commented 5 years ago

@mostafa-razavi and I just spoke. We are going to replace Figure 9b and remove Figure 15 of SI. The key points are: 1) Flexible requires much longer equilibration (orders of magnitude more) 2) Larger uncertainties for same number of production snapshots (note that we cannot compare on a time basis because of the RESPA approach for flexible bonds)

Mostafa is going to update the figure so that they have the same number of frames. I am going to revise the text in the manual and SI and work on the response to the reviewers.

mostafa-razavi commented 5 years ago

Here is the new Figure 9b, using exact number of data point as rigid-MD plot. It didn't change much from the previous plot.

image

ramess101 commented 5 years ago

@mostafa-razavi @jrelliottoh

This is what I have so far. Still need to edit the actual manuscript:

Comment #18 I find the behavior in figure 9b and the attribution to lack of equilibration a bit disturbing. United-atom ethane is a very simple molecule and the bond length should not need much time to equilibrate in the vapor phase. Have the authors been able to collect better data using longer equilibration times? Does the divergence at low density at least decrease with increasing equilibration length? This should not be hard for the smallest system size (N=120). If not, then I suspect something is just broken.

In the first review, we reported that the abnormal behavior in Figure 9b for MD flexible was not due to equilibration. This conclusion was based on a visual inspection of pressure vs time plots for 16 ns simulations. However, after investigating the convergence of the bonded energy vs time for 48 ns simulations, we conclude that equilibration was the actual cause of the peculiar MD flexible results. For example, this figure demonstrates that (at low densities for N = 3200) there is still a significant drift in bonded energies for equilibration times as long as 30-40 ns. The lack of equilibration is much less obvious in the pressure plots due to the extremely large fluctuations in pressure for MD flexible.

By averaging the pressure only after the bonded energy has equilibrated (the last few nanoseconds) we obtain much better agreement between MD flexible and the theoretical trend for Z-1/rho. This can be seen in the updated figure below:

It is still important to note that MD flexible has two primary disadvantages. First, the required equilibration time is 1-2 orders of magnitude larger than MD rigid (which required an equilibration time of less than 1 ns). Second, the uncertainties are much larger than MD rigid for the same number of frames, i.e., MD flexible has larger fluctuations. The discussion on page 11 now reads:

We also removed Figure 15 from the Supporting Information. This figure was added after the second review. The purpose of this figure was to present an empirical approach for correcting the MD flexible issue. However, after updating Figure 9b, Figure 15 of SI is no longer needed.

ramess101 commented 5 years ago

Nice! I will have something for you in an hour

On Tuesday, April 2, 2019, mostafa-razavi notifications@github.com wrote:

Here is the new Figure 9b, using exact number of data point as rigid-MD plot. It didn't change much from the previous plot.

[image: image] https://user-images.githubusercontent.com/16358113/55448860-6c7c0300-5597-11e9-9e77-c76339458d42.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mostafa-razavi/ITIC-paper/issues/24#issuecomment-479306925, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUvhOt92fAWR4uUtRbICNAHyaa0h2uVks5vdBNYgaJpZM4ZcpkC .

ramess101 commented 5 years ago

@mostafa-razavi

OK, I have rewritten the paragraph for the manuscript. Note that I have to !TODO actions, one is for the figure caption and one for the SI. Let me know if you have any concerns.

Comment # 18 I find the behavior in figure 9b and the attribution to lack of equilibration a bit disturbing. United-atom ethane is a very simple molecule and the bond length should not need much time to equilibrate in the vapor phase. Have the authors been able to collect better data using longer equilibration times? Does the divergence at low density at least decrease with increasing equilibration length? This should not be hard for the smallest system size (N=120). If not, then I suspect something is just broken.

In the first review, we reported that the abnormal behavior in Figure 9b for MD flexible was not due to equilibration. This conclusion was based on visual inspection of pressure vs time plots for 16 ns simulations. However, after investigating the convergence of the bonded energy vs time for even longer simulations (48 ns), we conclude that equilibration was the actual cause of the peculiar MD flexible results. For example, this figure demonstrates that (at low densities for N = 3200) there is still a significant drift in bonded energies for equilibration times as long as 30-40 ns. The lack of equilibration is much less obvious in the pressure plots due to the extremely large fluctuations in pressure for MD flexible.

image

By averaging the pressure only after the bonded energy has equilibrated (around 40 ns at the lowest density) we obtain much better agreement between MD flexible and the theoretical trend for Z-1/rho. This can be seen in the updated figure below:

image

!TODO Add to Figure 9 caption: Equilibration times were 1 ns for MD rigid and between 24-47 ns for MD flexible.

!TODO Update values in SI

It is still important to note that MD flexible has two primary disadvantages. First, the required equilibration time is 1-2 orders of magnitude larger than MD rigid (which required an equilibration time of less than 1 ns). Second, the uncertainties are much larger than MD rigid for the same number of frames, i.e., MD flexible has larger fluctuations. The discussion on page 11 now reads:

The effect of using flexible bonds is shown in Figure 9(b). Two key limitations exist for obtaining reliable estimates with flexible bond MD simulations. First, the MD flexible uncertainties are considerably larger than the MD rigid uncertainties. Note that the same number of production steps were used to compute the uncertainties for both MD flexible and MD rigid, for a given value of N. Second, while all of the MD rigid simulations equilibrated in less than 1 ns, the MD flexible systems at the lowest densities required equilibration times between 24-47 ns (with 24 ns corresponding to N = 120). The slow equilibration for MD flexible simulations is presumably due to the small number of intermolecular collisions at low density relative to the large number of intramolecular collisions.

We also removed Figure 15 from the Supporting Information. This figure was added after the second review. The purpose of this figure was to present an empirical approach for correcting the MD flexible issue. However, after updating Figure 9b, Figure 15 of SI is no longer needed.

ramess101 commented 5 years ago

@mostafa-razavi

I just realized that the figures didn't copy over the first time, but I have edited my comment. Let me know if you want to talk about this at all.

mostafa-razavi commented 5 years ago

@ramess101

!TODO Add to Figure 9 caption: Equilibration times were 1 ns for MD rigid and between 24-47 ns for MD flexible.

Done!

!TODO Update values in SI

What values in SI do you mean?

mostafa-razavi commented 5 years ago

I added the above response written by @ramess101 to the end of RevLet3

mostafa-razavi commented 5 years ago

I don't think the insets in Figure 9 is that useful now that the plot is more zoomed in. What do you think?

mostafa-razavi commented 5 years ago

@ramess101

I remember you said something about merging two sections in the manuscript, but I said let's wait till the last moment. Which sections were they?

jrelliottoh commented 5 years ago

Would it be too cluttered to add the 48ns result to the existing figure(s)? Maybe as an inset? It demonstrates that the problem is not with thermo, but MD.

JRE

On Wednesday, April 3, 2019, 4:28:00 AM EDT, mostafa-razavi <notifications@github.com> wrote:  

@ramess101

I remember you said something about merging two sections in the manuscript, but I said let's wait till the last moment. Which sections were they?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ramess101 commented 5 years ago

@mostafa-razavi It was section 6 was going to become subsection 5.2 and a new subsection as 5.1 would be something like “ Virial correlations”.

@jrelliotoh we have decided to only use the 48 ns results. Mostafa can you direct Richard to the updated discussion for this section and the review letter?

On Wednesday, April 3, 2019, jrelliottoh notifications@github.com wrote:

Would it be too cluttered to add the 48ns result to the existing figure(s)? Maybe as an inset? It demonstrates that the problem is not with thermo, but MD.

JRE

On Wednesday, April 3, 2019, 4:28:00 AM EDT, mostafa-razavi < notifications@github.com> wrote:

@ramess101

I remember you said something about merging two sections in the manuscript, but I said let's wait till the last moment. Which sections were they?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mostafa-razavi/ITIC-paper/issues/24#issuecomment-479481066, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUvhLu42AVO9MDUXOWOHMlFJ60UZnoGks5vdKeBgaJpZM4ZcpkC .

ramess101 commented 5 years ago

Did we remove from the SI the table of standard deviations for the different low density simulations?

On Wednesday, April 3, 2019, mostafa-razavi notifications@github.com wrote:

@ramess101 https://github.com/ramess101

!TODO Add to Figure 9 caption: Equilibration times were 1 ns for MD rigid and between 24-47 ns for MD flexible.

Done!

!TODO Update values in SI

What values in SI do you mean?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mostafa-razavi/ITIC-paper/issues/24#issuecomment-479354278, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUvhE8Z90pbv5hvyciViL2vW0L9Hy7dks5vdEaZgaJpZM4ZcpkC .

ramess101 commented 5 years ago

Maybe not as helpful, but to be consistent with panels a) and c) I think we should keep it since we have the space. Unless you want to change the upper limit to -2. That could look better

On Wednesday, April 3, 2019, mostafa-razavi notifications@github.com wrote:

I don't think the insets in Figure 9 is that useful now that the plot is more zoomed in. What do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mostafa-razavi/ITIC-paper/issues/24#issuecomment-479384760, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUvhA91Jaf-Mha5Lil61A5x8rVti1n4ks5vdGFsgaJpZM4ZcpkC .

mostafa-razavi commented 5 years ago

@ramess101 here is new Figure 9. I removed the insets and changed the symbols for different N. I could limit the Y values on all three plots to -4 for better visibility if we are fine with not seeing part of error bars in figure b

image

mostafa-razavi commented 5 years ago

Did we remove from the SI the table of standard deviations for the different low density simulations?

Yes. we did.

ramess101 commented 5 years ago

Nice. Yeah id try making the lower bound like -4.1 so we only miss the lowest density lower error bar.

On Wednesday, April 3, 2019, mostafa-razavi notifications@github.com wrote:

@ramess101 https://github.com/ramess101 here is new Figure 9. I removed the insets and changed the symbols for different N. I could limit the Y values on all three plots to -4 for better visibility if we are fine with not seeing part of error bars in figure b

[image: image] https://user-images.githubusercontent.com/16358113/55520174-0fd92080-5649-11e9-94fd-a11e04d330cb.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mostafa-razavi/ITIC-paper/issues/24#issuecomment-479697948, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUvhMqIPcQynPAxGvVoKpcbYssEURwRks5vdT2UgaJpZM4ZcpkC .

mostafa-razavi commented 5 years ago

Nice. Yeah id try making the lower bound like -4.1 so we only miss the lowest density lower error bar.

Done!