ramess101 / MBAR_ITIC

Merging the MBAR and ITIC methodologies to optimize a TraPPE force switch model
0 stars 2 forks source link

Basis Functions #6

Open ramess101 opened 7 years ago

ramess101 commented 7 years ago

@mrshirts I am working on including the ability to use Basis Functions with the Mie potential. Could you look over the python code "basis_functions_Mie"? I believe I have this working for the internal energy but pressure is not as straightforward to me.

Expected outcomes: Predicted internal energy: -5264.72725683 Actual internal energy is -5264.71484375 Predicted virial: 100.363121038 Actual virial is 104.601626078 Predicted pressure: 294.28551816 Actual pressure is 290.05674235

For internal energy, the values that I solve for at each time step are sum(r^-lambda) and sum(r^-6). The way I am obtaining these values is by solving the equation Ax=b, where A is a 2x2 matrix of Clam (or C12) and C6, b is the LJ shortrange energies, and x is what we are solving for. This can be found on line 128:

rarray = np.linalg.solve(Cmatrix,U_basis) #First entry of rarray is the sum of r^-lambda and second entry is sum of r^-6

The rarray can be used to predict U with any value of epsilon and sigma for the same lambda. This is on line 149:

U_new_hat = np.linalg.multi_dot([Cmatrix_new,rarray])

where

Clam_new = eps_new * sig_new ** lam_basis #I intentionally ommitted the prefactor (4 for LJ) and just lumped it into the rarray C6_new = eps_new * sig_new ** 6.

Cmatrix_new = np.array([Clam_new,C6_new])

This appears to be working in my code. Notice in the output of basis_functions_Mie.py that the predicted internal energy and actual internal energy are essentially the same.

However, for pressure I am not sure exactly how to construct my basis functions. The approach I have taken gets me in the ballpark for P but is not as accurate as it is for U. Notice that the predicted virial and pressure are different than the actual virial and pressure by about 4 bar (1% error in P).

The methodology I am using is:

  1. Obtain the virials from XX, YY, and ZZ

#Virial XX,YY,ZZ...

  1. Solve Ax=b where b is now the virials and x should now be something like sum(r*du/dr) for the repulsive (lambda) and attractive (6) contributions, with the Clambda and C6 terms divided out (since those make up the A matrix)

rdrXXarray = np.linalg.solve(Cmatrix,VirXX_basis-Pdc_basis) ...

  1. Use this array (rdrXXarray, rdrYYarray, or rdrZZarray) to predict the respective virial for a different value of epsilon and sigma.

VirXX_new_hat = np.linalg.multi_dot([Cmatrix_new,rdrXXarray])+Pdc_new

  1. Take the average of the vir-XX,YY,ZZ predicted values to obtain the overall virial

Vir_new_hat_alt = (VirXX_new_hat + VirYY_new_hat + VirZZ_new_hat)/3.

  1. Convert this virial into pressure using kinetic energy and box volume

P_new_hat_alt = 2./Vbox*(KE/3.-Vir_new_hat_alt)*nm3tom3*kJm3tobar/NA

Do you see anything clearly wrong with this methodology?

ramess101 commented 7 years ago

So I think there is an easier way to do this, but it should give the same answer. Rather than performing two reruns with two different eps/sig and solving a system of equations for sum(r^-lambda) and sum(r^-6), I could just set Clambda as 1 and C6 as 0 and then Clambda as 0 and C6 as 1. Then, the LJ (SR) would be directly proportional to the sum(r^-lambda) or sum(r^-6) and the same should go for the virial. Although this should give the same answer, this approach might be more straightforward, especially when you want to look at sum(r^-lambda) for lots of different lambdas.

ramess101 commented 7 years ago

I just verified that my methodology is identical to using one rerun with just a repulsive and one with just an attractive potential. However, due to round-off errors it is actually more accurate to use the two eps/sig system. In this figure I have plotted the percent error for the two methods, where percent error is the deviation between a real rerun of some system and the basis function predicted U of that system. "Linear algebra" means that we had two eps/sig and we solved for sum(r^-lambda) and sum(r^-6). "Repulsive/Attractive" means that one rerun had just repulsive and one rerun had just attractive.

image

Note that not only does "Repulsive/Attractive" have a larger range of errors (RMS) but it also has a shift (bias). Here I have quantified these results:

Bias: Linear algebra: 6.48593E-06 Repulsive/Attractive: 0.000178315 RMS: Linear algebra: 0.000137564 Repulsive/Attractive: 0.000266342

Obviously these errors are pretty small regardless, but this is just good to keep in mind.

ramess101 commented 7 years ago

I believe my problem with the virial is that there is a virial contribution from the constrained bonds. Going through the Gromacs manual, it is not clear to me how we could get the virial contribution that is just due to LJ. That being said, shouldn't we be able to calculate that directly from the basis functions we obtained with U? In other words, we know that the virial due to the repulsive LJ (SR) should just be -lambda x sum(r^-lambda) where this sum was obtained from U. Similarly, for attractive we would just calculate -6 x sum(r^-6). We would then find the difference between these values and the actual virial to determine the virial from bond constraints, etc. Then we could add that constant contribution back into our virial when evaluating the virial basis functions with a new set of epsilon/sigma. Does that make sense @mrshirts?

ramess101 commented 7 years ago

I have verified that basis functions are working properly for predicting the LJ (SR) and LJ (DC). This dramatically speeds up the parameterization when U is the target property and you use integer values of lambda.

In these three figures I show the leapfrog sampling results where lambda is fixed to a value of 13 and sigma is constrained by sig>sig_TraPPE and rmin<rmin_TraPPE.

First, this is when I do actual reruns at each parameter set:

image

Second, I use the basis functions (which takes a few seconds compared to an hour or so). Note that the range of sigma is different just because the basis functions were obtained at the extreme values of eps/sig:

image

Finally, here I have included a larger range of epsilon just for curiosity (the original bounds were a little too tight)

image

ramess101 commented 7 years ago

Tasks:

  1. Get pressures working
  2. Allow for basis functions to be generated from old data (i.e. without having to go through sequential approach).
  3. Use Bayesian inference instead of standard optimization
jrelliottoh commented 7 years ago

The speedup with basis functions sounds encouraging. Am I correct in surmising that the optimum U_kN from basis functions is at about 111.1, 0.3757? For the pressure, I think we need to write a post-processing routine anyway because gromacs does not separately compute/report the intramolecular energy. Should we just include a computation of the separate components of pressure in that post-processing routine?

ramess101 commented 7 years ago

@jrelliottoh The optimum of ~111 and 0.376 is only for a lambda of 13, with a reference simulation of the TraPPE model, and only internal energy as the target. So no, that optimum is not a very good optimum. It was simply to validate that the basis function analysis was working properly for internal energy. I still need to get this working for pressures. And then we would want to use multiple lambdas and multiple references, etc. The final optimum we get is going to have a lambda around 15-16 with an epsilon closer to 120.

But yes, a post-processing routine that separates intramolecular energy is probably necessary. And it would make sense to separate the pressure contributions as well. How are you planning on doing this? Are you going to have your own code that runs through the trajectory file? We should probably see if @mrshirts knows a good way to separate the different contributions in gromacs. If we find a different package that separates the contributions for us, we could just convert the trajectory files from gromacs into the other simulation package's format. @mrshirts Can mdtraj help us out in this regard?

ramess101 commented 7 years ago

I think we need to write a post-processing routine anyway because gromacs does not separately compute/report the intramolecular energy. Should we just include a computation of the separate components of pressure in that post-processing routine?

@jrelliottoh Gromacs will, in fact, provide the energies from each contribution: bond, angle, dihedral, LJ 1-4, etc. Sorry for telling you otherwise. Hopefully you have not spent too much time coding up a separate intramolecular energy calculator.

mostafa-razavi commented 7 years ago

Gromacs will, in fact, provide the energies from each contribution: bond, angle, dihedral, LJ 1-4, etc.

@ramess101 Where can these be found? Are they reported in log files automatically, or should I somehow tell GROMACS to report them?

ramess101 commented 7 years ago

@mostafa-razavi I did not need to specify any additional terms in my .mdp file or elsewhere. GROMACS simply provided those terms in the log file automatically. I think if the force field and molecule have the corresponding terms GROMACS will return their contributions. Let me know if that does not seem to be the case for you.

mostafa-razavi commented 7 years ago

@ramess101 I attached one of the log files that I have. I cannot find any of those information you mentioned in my log file. Can you please send me a log file of yours that you know contains such information?

nvt_prod.log.zip

ramess101 commented 7 years ago

From what I can tell, you are still just simulating ethane, right? I have attached a log file from butane. You will see that there "Angle" and "Ryckaert-Bell" (dihedral) terms. I did not change anything in my input files, so I think GROMACS just provides those automatically. United-atom ethane just does not have angles and dihedrals so they are omitted.

nvt_prod_butane.zip

ramess101 commented 7 years ago

@mrshirts I am still trying to figure out how to calculate pressures using the basis function approach. Could you look at my basis_functions_Mie.py file when you get a chance? I have modified it slightly since I started this issue, but the outcome is essentially the same. Here are the most recent numbers:

Predicted virial-XX: 221.607275425 Actual virial-XX is 214.462371826 Predicted virial-YY: 172.348184061 Actual virial-YY is 166.006591797 Predicted virial-ZZ: -60.4240148236 Actual virial-ZZ is -66.6640853882 Predicted pressure alternative: 283.496190983 Actual pressure is 290.05674235

The question is, is an error of <10 bar reasonable when using the basis function approach? Or am I implementing this incorrectly?

mrshirts commented 7 years ago

Hard to multitask too much at the conference, but I will work on this on the plane tonight.

On Wed, Sep 13, 2017 at 7:57 AM, Richard Messerly notifications@github.com wrote:

@mrshirts https://github.com/mrshirts I am still trying to figure out how to calculate pressures using the basis function approach. Could you look at my basis_functions_Mie.py file when you get a chance? I have modified it slightly since I started this issue, but the outcome is essentially the same. Here are the most recent numbers:

Predicted virial-XX: 221.607275425 Actual virial-XX is 214.462371826 Predicted virial-YY: 172.348184061 Actual virial-YY is 166.006591797 Predicted virial-ZZ: -60.4240148236 Actual virial-ZZ is -66.6640853882 Predicted pressure alternative: 283.496190983 Actual pressure is 290.05674235

The question is, is an error of <10 bar reasonable when using the basis function approach? Or am I implementing this incorrectly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ramess101/MBAR_ITIC/issues/6#issuecomment-329194945, or mute the thread https://github.com/notifications/unsubscribe-auth/AEE31D0FOmoGsiFC9G9uNpZzpeMFP16pks5sh-1igaJpZM4O4EJg .

mrshirts commented 7 years ago

Quick question: these simulations were with ethane, correct? and if so, are constraints on? I wonder if the constraint portion of the virial is doing something non-basis functiony.

ramess101 commented 7 years ago

Yes, this is with ethane and constraints are on. That was my suspicion but I was not sure how GROMACS reports the virials. Do these virials have the constraints already included? I assume so since the pressures can be calculated from the reported virials without any additional terms (other than the ideal gas portion). So is there a way to find the virials without the constraints? Or alternatively, can you find/calculate the constraint contribution to the virials?

mrshirts commented 7 years ago

Virials include constraints. But I don't know you want the non-constrained virial includes forces that get removed. So maybe that's not the issue.

Here's the experiment to see if it matters: can you run some short simulations with a harmonic potential and no constraints, and see if virial by reweighting agrees better? For something like this, all you need is long enough for the simulation to equilibrate, and then the basis function stuff can be done with any single point).

ramess101 commented 7 years ago

Yeah, I will try that.

ramess101 commented 7 years ago

@mrshirts I repeated the process with a harmonic potential instead of LINCS constrained bonds. The results are basically the same, namely, the deviation between virials and pressures is <10 bar. I have updated the file on GitHub. Here are the new expected values:

Predicted internal energy: -5264.10403522 Actual internal energy is -5264.100586 Predicted virial: 547.907121115 Actual virial is 539.431264233 Predicted pressure: -152.236259656 Actual pressure is -143.7788086 Predicted virial-XX: 716.077538818 Actual virial-XX is 709.1787109 Predicted virial-YY: 641.705973369 Actual virial-YY is 633.9317017 Predicted virial-ZZ: 285.937851159 Actual virial-ZZ is 275.1833801 Predicted pressure alternative: -152.236259656 Actual pressure is -143.7788086

It is worth noting that the pressures and virials are very different from what we got previously for constrained bonds. However, if you compare the deviations between predicted and actual you will see that harmonic potential has the same problem as before.

ramess101 commented 7 years ago

I believe I found one error in my methodology. I was not dividing the dispersive correction for the virial by 3. Now the predicted pressure and virial agree very closely, especially when I am not using harmonic bonds.

Here are my results with constrained bonds:

Predicted internal energy: -5264.72725683 Actual internal energy is -5264.71484375 Predicted virial: 103.967785199 Actual virial is 104.601626078 Predicted pressure: 290.689087308 Actual pressure is 290.05674235 Predicted virial-XX: 214.397912403 Actual virial-XX is 214.462371826 Predicted virial-YY: 165.13882104 Actual virial-YY is 166.006591797 Predicted virial-ZZ: -67.6333778452 Actual virial-ZZ is -66.6640853882 Predicted pressure alternative: 290.689087308 Actual pressure is 290.05674235

And here they are for harmonic bonds:

Predicted internal energy: -5264.10403522 Actual internal energy is -5264.100586 Predicted virial: 540.697758093 Actual virial is 539.431264233 Predicted pressure: -145.043363332 Actual pressure is -143.7788086 Predicted virial-XX: 708.868175796 Actual virial-XX is 709.1787109 Predicted virial-YY: 634.496610347 Actual virial-YY is 633.9317017 Predicted virial-ZZ: 278.728488137 Actual virial-ZZ is 275.1833801 Predicted pressure alternative: -145.043363332 Actual pressure is -143.7788086

In either case the pressures are much closer. For harmonic the deviation is about 1.3 bar while for constrained bonds the deviation is 0.6 bar.

To verify that this works for lots of configurations I am going to rerun this analysis for multiple snapshots.

ramess101 commented 7 years ago

OK, I think basis functions are working for pressures now. Here I analyzed all the snapshots from the previous system and found the basis function virials to be within 1.5 bar on average.

It is hard to see this on a parity plot:

image

So I have included an absolute deviation plot here:

image

Clearly the old approach had a bias. This bias is equal to 2/3*(Vir_dc_new - Vir_dc_basis). Now the deviations fluctuate around 0 (actually there is still a small bias of 0.52).

This plot shows that there is no strong negative or positive bias in error depending on the value of the virial:

image

mrshirts commented 7 years ago

Would you consider this problem solved now, or do you want me to look at this some more?

On Wed, Sep 13, 2017 at 3:45 PM, Richard Messerly notifications@github.com wrote:

OK, I think basis functions are working for pressures now. Here I analyzed all the snapshots from the previous system and found the basis function virials to be within 1.5 bar on average.

It is hard to see this on a parity plot:

[image: image] https://user-images.githubusercontent.com/23408516/30404160-74d6c73c-98a2-11e7-9d00-8a94a7baaaa5.png

So I have included an absolute deviation plot here:

[image: image] https://user-images.githubusercontent.com/23408516/30404187-a7e7cfea-98a2-11e7-9338-de72bb24adff.png

Clearly the old approach had a bias. This bias is equal to 2/3*(Vir_dc_new

  • Vir_dc_basis). Now the deviations fluctuate around 0.

This plot shows that there is no strong negative or positive bias in error depending on the value of the virial:

[image: image] https://user-images.githubusercontent.com/23408516/30404233-ddf1818a-98a2-11e7-9136-74939c48c40f.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ramess101/MBAR_ITIC/issues/6#issuecomment-329318189, or mute the thread https://github.com/notifications/unsubscribe-auth/AEE31L7fAa2kITB0Un2ppkfk4z7AAC0Yks5siFscgaJpZM4O4EJg .

ramess101 commented 7 years ago

I believe it is solved. I will do some more tests and let you know if I have anymore concerns. When we have more complicated molecules than ethane we will probably need to discuss this some more.

ramess101 commented 6 years ago

So it turns out that what I thought fixed the problem was actually just a fortuitous (or unlucky depending on how you look at it) cancellation of errors. The key is NOT to remove the dispersive correction of the virial but rather it is to perform a rerun calculation with k = 0 for the harmonic potential. Now my deviations in the virial are very small (notice that before my "fixed" had much larger fluctuations and still had a slight bias):

image

Previously I thought that I needed to account for the dispersive correction to the virial. However, when I tried to implement my basis function code for different non-bonded potentials it gave poor results. For example, in this plot I show that while rr3 (rerun 3) looked good rr4 was poor (this is using my old basis function analysis). This is because rerun 3 and 4 have very similar dispersive corrections but very different repulsive parameters (C12):

image

To find the real problem I looked into a simpler system with no bonded constraints (argon) and I discovered that in fact removing the dispersive correction made things worse. For argon you get excellent prediction for the virial:

image

Note that the larger fluctuations for rr4 are because of the larger absolute value of the virial. The key is that rr4 does not show a strong bias.

image

Instead, if I perform a rerun with k = 0 and do not remove the dispersive corrections I get excellent agreement for both rr3 and rr4:

image

So I believe that now I have solved the basis function problem. I am glad that the solution was a rather simple one, namely, exclude the harmonic contributions in the rerun calculation. I can now implement this approach into my optimization code and validate the contours that I got over the entire parameter space very quickly.

ramess101 commented 6 years ago

Developing a basis function to predict the virial for the non-bonded contributions (previous post) appears to only be half the battle. I thought I would be able to use the 2/3 KE/V (or rho k T) for the ideal gas portion and simply add the non-bonded virial. Unfortunately, this did not work even when I tried accounting for the difference in degrees of freedom between the constrained and unconstrained systems. What I concluded is that we also need to be able to determine the virial from the constraints, which depend on the non-bonded contributions.

I was able to accomplish this task by performing multiple reruns and developing two different basis functions: one for the non-bonded unconstrained virial (presented in previous post) and one for the non-bonded constrained virial.

The algorithm for developing the basis function for the non-bonded constrained virial requires three different types of rerun calculations:

  1. With constraints and with the non-bonded interactions
  2. Without constraints (k=0) and with non-bonded interactions
  3. With constraints and without non-bonded interactions

The first type of rerun was what I was doing initially (at the very beginning of this issue). This gives us the complete virial (which is what we want) but we cannot develop a basis function to predict it. The second type of rerun is what I used in my previous post to develop a basis function for the non-bonded unconstrained virial. The third type of rerun yields a virial that is independent of the non-bonded potential, so we only perform this rerun once. This virial is necessary to developing a basis function for the virial of the non-bonded contribution to the constraint (call this Vir_C,nb)

Vir_C,nb = Vir_1 - Vir_2 - Vir_3

Equations A.1-A.3 from the LINCS derivation are useful for understanding the contributions of the different reruns.

image

Essentially, we are trying to solve for the f_n contribution to delta r_n (A.3). This is the contribution that the non-bonded potential makes to the constraint force. Vir_1 is the complete version of A.2, Vir_2 is A.2 but without the constraint term, Vir_3 is A.3 but without f_n^i (i.e. the non-bonded forces are 0). So the difference of these allows us to solve for the final term in A.3.

After we have developed a second basis function to predict Vir_C,nb we can predict the overall virial using:

Vir_1 = basis_function_1 + basis_function_2 + Vir_3

Again, Vir_3 is constant with respect to the non-bonded potential. basis_function_1 and basis_function_2 predict Vir_2 and Vir_C,nb, respectively.

These figures show that the error from using these two basis functions is reasonably small compared to the total pressure:

image

image

ramess101 commented 6 years ago

@mrshirts

I have validated that my basis functions code is working for ethane by comparing the contour plots of the SSE of U and Z obtained with basis functions with those obtained using reruns.

These plots were obtained using basis functions:

image

image

These plots were obtained performing reruns at each parameter set:

image

image

The contours are practically identical, indicating that the basis function code is working properly.

mrshirts commented 6 years ago

Great!

ramess101 commented 6 years ago

@mrshirts

I was able to implement the basis functions approach for propane without any serious issues. It requires a separate basis function for the CH3-CH3, CH2-CH2, and CH3-CH2 interactions, but the inclusion of angles did not require any additional bookkeeping. Essentially, the angular contributions are lumped in with the bonded contributions that are subtracted and then added back in at the end of the virial calculation. This means that creating basis functions for larger compounds, with torsions, should not be any more complicated.

These plots are for propane (TraPPE-UA) at a single state point (isochore, rho0, T0- i.e. the most dense saturated liquid):

image

image

Notice that, in this example, the error in nonbonded energy is less than 2e-0.5% and the error in pressure is only about 0.05%. These errors are typical of properly implemented basis functions.

ramess101 commented 6 years ago

@mrshirts

I just wanted to let you know that the basis function approach is working for n-octane. This was a bit tricky to code (mainly because of what GROMACS gives you) but it is basically the same as butane except we want a basis function for both the LJ non-bonded intermolecular and intramolecular energies. This is because MBAR needs the total energy (Utotal) while ITIC needs the departure energy (Udep) while GROMACS provides Utotal and LJtotal (inter+intra non-bonded).

The average percent deviations between basis functions and direct simulations for several state points are:

Udep (LJ_inter) = 7e-5 % Utotal = 3e-4 % (this value gets skewed by a few energies that are around 0 at certain state points, the absolute deviation is the same as Udep) pressure = 1e-2% (pressure is always slightly less accurate it appears, but plenty accurate for our purposes)

So now I have a basis function class that is completely generalized for any alkane chain-length. I don't foresee any problems with branched or polar compounds, but I have been wrong before.

The speed-up and memory-saved are immense for a larger molecule such as n-octane, especially since I had to use 800 molecules instead of 400 (see Issue #9). Basis functions take the exact length of time for n-octane as they did for n-butane. This is a BIG DEAL because reruns are MUCH slower for n-octane (a molecule twice as big as n-butane with twice as many molecules in the system requires four times as many evaluations).

Here is the pseudo-code:

Perform direct simulation (with constraints, with exclusions = 3, etc)

Perform rerun with constraints but with VDW parameters (C6, Clam) = 0 for ALL site-types

Loop through interaction site-types (i.e. CH3CH3, CH2CH2, CH3CH2) and perform this process for each:

     For each step below, C6 and Clam = 0 for all other site-types (interactions are calculated for only one site-type)

     1. Perform reruns with constraints, exclusions = 3
     2. Perform reruns without constraints, exclusions = 3
     3. Perform reruns with constraints, exclusions = 8 (for n-octane)
ramess101 commented 6 years ago

Basis functions take the exact length of time for n-octane as they did for n-butane.

To clarify, because there is one additional basis function (LJ intramolecular) for n-octane you do need to perform one additional np.linalg.multi.dot() command. So there is a very slight increase in computational time, but still orders of magnitude faster than reruns.

mrshirts commented 6 years ago

This would be great to document (and perhaps release scripts as examples) for the Bayesian follow up paper, since it's 1) nontrivial and 2) much faster.

On Fri, Feb 9, 2018 at 2:46 PM, Richard Messerly notifications@github.com wrote:

Basis functions take the exact length of time for n-octane as they did for n-butane.

To clarify, because there is one additional basis function (LJ intramolecular) for n-octane you do need to perform one additional np.linalg.multi.dot() command. So there is a very slight increase in computational time, but still orders of magnitude faster than reruns.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ramess101/MBAR_ITIC/issues/6#issuecomment-364578171, or mute the thread https://github.com/notifications/unsubscribe-auth/AEE31Htc2nB_UXh_ZgZZlau4IKtbbQcpks5tTLyhgaJpZM4O4EJg .

ramess101 commented 6 years ago

@mrshirts

I am working on including a basis functions section for the supporting information of this first manuscript ("Configuration-sampling-based..."). Although basis functions are not essential for this paper, it might be nice to present it in multiple publications. I will let you know when I have a version that you can read. This can be added even after the NIST internal review is completed.

ramess101 commented 6 years ago

@mrshirts

The Supporting Information now contains a thorough discussion about basis functions. When you get a chance to read it (no rush) let me know what you think. Some of the theory needs to be validated but the implementation section is consistent with what I have successfully used with GROMACS.

mrshirts commented 6 years ago

Taking a look today. One thing; I'd include the .gro,.top files etc as separate ASCII files instead of in the supporting information PDF.

mrshirts commented 6 years ago

It SEEMS like the energy components for the non- bonded CH3, CH2, or cross-interactions could be set up with the proper definition of energy groups in the .mdp.

mrshirts commented 6 years ago

Do you want to save the UQ in section S.V for the second paper? I don't know it belongs in the supporting information here - seems like it goes beyond the scope of the paper.

ramess101 commented 6 years ago

Taking a look today. One thing; I'd include the .gro,.top files etc as separate ASCII files instead of in the supporting information PDF

OK, for the NIST review this seemed more convenient. But for the journal that would probably be better.

It SEEMS like the energy components for the non- bonded CH3, CH2, or cross-interactions could be set up with the proper definition of energy groups in the .mdp.

What do you mean exactly? Are you referring to how I use "group" instead of "verlet" cutoffs? Or that I am using vdwtype = user?

Do you want to save the UQ for the second paper? I don't know it belongs in the supporting information here - seems like it goes beyond the scope of the paper.

I agree that it doesn't belong in this paper. The reason I included it was because I wanted to show the butane contours for CH2 in Figure 10. Since I don't have any direct simulations with butane for comparison, I thought the butane contours are not really justified without some analysis. But perhaps it is sufficient to just show that the contours are similar to propane and that we can scan the parameter space very quickly because of this approach.

mrshirts commented 6 years ago

What do you mean exactly? Are you referring to how I use "group" instead of "verlet" cutoffs? Or that I am using vdwtype = user?

No, the mdp options energygrp-excl (which leave out some energy groups) and energygrps (which report interactions between different atoms differently) could be set in rerun to generate some of the energies. I don't know that it can be done NOW, but it could potentially be the place to do this in the future. Actually SOME of this could be done now to reduce the number of reruns required, but probably not all of it.

mrshirts commented 6 years ago

Hmm. Section S.IV.2 generally makes sense, though the algebra is hard to follow. I think that perhaps writing out the equations explicitly (in vector form) like Y = C_X Phi_X^Y, but with explicit terms would help make it easier to see. It might also help after each of these to provide a bit of information as to why a given set of rerun calculations are used, i.e. why they lead to the specific differences that are desired. This would particularly be important for the constraint virial terms.

mrshirts commented 6 years ago

You could potentially reduce the size and complexity of the table by just focusing on the set of 6 required for each atom type term, explaining that the other atom terms are set to zero there. Though it might not end up being that much clearer.

ramess101 commented 6 years ago

No, the mdp options energygrp-excl (which leave out some energy groups) and energygrps (which report interactions between different atoms differently) could be set in rerun to generate some of the energies. I don't know that it can be done NOW, but it could potentially be the place to do this in the future. Actually SOME of this could be done now to reduce the number of reruns required, but probably not all of it.

OK, now I understand what you mean. It does sound like those features could reduce the number of reruns required. I need to become more familiar with them to see exactly what type of output they would provide.

ramess101 commented 6 years ago

Hmm. Section S.IV.2 generally makes sense, though the algebra is hard to follow. I think that perhaps writing out the equations explicitly (in vector form) like Y = C_X Phi_X^Y, but with explicit terms would help make it easier to see.

So do you mean that I should substitute the Y, C_X, and Phi_X^Y matrices directly into the equations?

It might also help after each of these to provide a bit of information as to why a given set of rerun calculations are used, i.e. why they lead to the specific differences that are desired. This would particularly be important for the constraint virial terms.

Yeah, I agree that some explanation is needed. I will work on this.

ramess101 commented 6 years ago

You could potentially reduce the size and complexity of the table by just focusing on the set of 6 required for each atom type term, explaining that the other atom terms are set to zero there. Though it might not end up being that much clearer.

I tried that originally. But being more concise did not appear to be more helpful or clear. Also, I think by just looking at the table you can see that certain simulations are essentially grouped together.

mrshirts commented 6 years ago

I tried that originally. But being more concise did not appear to be more helpful or clear. Also, I think by just looking at the table you can see that certain simulations are essentially grouped together.

Sure, maybe just explicitly stating the grouping -- really, it's a set of 6 simulations per atom type (per lambda). Think of it as a practice draft for the follow up paper.

mrshirts commented 6 years ago

So do you mean that I should substitute the Y, C_X, and Phi_X^Y matrices directly into the equations?

Yeah, that's what I was thinking. OK to leave in matrix form, though.

ramess101 commented 6 years ago

@mrshirts

OK, I made the basis function equations explicit with the matrices in the supporting information. Let me know if you think this is clearer.

I also removed the UQ discussion in the manuscript and from the supporting information.

I have not had time to look into reducing the number of reruns yet. I will also let you know about the constraint rerun issue you emailed Pascal and me.

mrshirts commented 6 years ago

I don't think that you need to worry about reducing the number of reruns. If what you are doing is working for you, then it's great. EVENTUALLY, we could figure out how to reduce them. Maybe as soon as the next paper, but not before.

Constraint rerun issue would be useful so we can make sure that what we put into GROMACS doesn't break functionality.

mrshirts commented 6 years ago

Yeah, I think that helps.