ramess101 / MBAR_GCMC

3 stars 0 forks source link

Energy recalculation #7

Closed ramess101 closed 5 years ago

ramess101 commented 6 years ago

We implement the total energy recalculation from outputted PDB file by GOMC. We might change console output a little bit but I think it is done. It is implemented in the branched called "RecalcFrame" https://github.com/GOMC-WSU/GOMC/tree/RecalcFrame. You can use it for all Ensembles.

To set it up;

1- set the "RunSteps" to "0".

2- Load the output PDB "OutputName_BOX_0.pdb" and PSF "OutputName_merged.psf".

3- If you need to calculate the pressure, use "PressureCalc true 10000".

ramess101 commented 6 years ago

@msoroush

Sorry for the slow response. I finally got around to testing the RecalcFrame branch. The energy recalculation appears to work but I have a few questions/concerns. In brief:

  1. PDB files work for now, but are not ideal
  2. Recomputing energies works, but for some reason the energies are not output to the his.dat file
  3. How can I restart (not recompute) a simulation without needing another equilibration period?

The PDB files will become enormous (around 30 GB for hexane) if we are outputting the configurations at the same frequency that we used to output the histograms (every 200 MC steps). To work around this, I perform hundreds of smaller GCMC simulations sequentially. After each GCMC run, I perform the RecalcFrame using the PDB file for this short run, and then I restart the GCMC run and remove the old PDB file:

  1. Run first simulation
  2. Rerun (recompute the energies)
  3. Restart from previous simulation (load final configuration)
  4. Loop through steps 2 and 3 until finishes total number of steps

Obviously, this is just a work around, and is hopefully only temporary. Once the configurations are stored in a .trr (Gromacs trajectory) file we should not have this storage issue. Any idea how long until that will be ready?

More importantly, I have a question about RecalcFrame. I need to have the number of molecules and the recomputed energy for each frame in the PDB file. Obviously, N does not change upon recalculation and I could extract the recomputed energies from the log file, but I thought it would be easier to get both from the his.dat file. However, for some reason his.dat is empty for the recalculation. I tried setting SampleFreq to 1 (since I need it to output to his.dat for every PDB frame) but this did not solve the problem. When recomputing the energies, does it no longer output to the his.dat file? I did notice that the block average file outputs the energies at the same frequency as the PDB coordinate frequency. I cannot use block averages for generating basis functions, but it appears that the block averages reported for the recalculation are actually from a single PDB frame. So for now I am using the block average file, although it is somewhat misleading to call these block averages (Note that you must set BlockAverageFreq to 1, because the code divides the single frame values by BlockAveragesFreq.)

Also, I am having some issues with the restarts. Specifically, I have not been able to get the restarts to run without including an equilibration period. I was hoping that if the previous run had already equilibrated and adjusted the move sizes, then the restart would have some way of loading those parameters. Is that not the case? I tried setting EqSteps to 0, hoping that it would just skip the equilibration because it already loaded the move sizes, but this did not seem to work. I do not have a lot of experience using GOMC, so perhaps I made a simple mistake for restarting simulations.

To help debug my problem, I have included the .conf files for the start, restart, and rerun (recalculate). (Note that I tried setting PRNG to RESTART in in_restart.conf, but the simulations would not run so I changed this back to RANDOM).

Thanks

in_start.txt in_restart.txt in_rerun.txt

msoroush commented 6 years ago

Hi @ramess101

  1. PDB size: PDB file contains molecule's information of two boxes. If you use less molecule in reservoir, the size of PDB file would decreased at least by a factor of 2.

  2. His.dat file: Currently histogram file is enabled for GCMC simulation only. The energy information will be printed only when simulation steps passes the equilibration steps. The reason that in recalculate energy, hist file is not printed is that your equilibration steps "2000000", is higher than run steps "40000".

  3. Recalculate energy: If you look at the "C6P_510_00_K_u_4127_r1a_BOX_0.pdb" file, you can see at each frame, we print CellBasis information, frame number, and step number that we printed the PDB file. So, all the outputting information is based on this printed step number. Blockaverage in this context make no sense since it prints the instantaneous value. We should turn it off when we recalculate energy. I attached an example file to show how set it up. Rule of thumb is:

  1. Restarting simulation with no EqSteps: I believe it's an issue in GOMC. We should set a flag that if we are using restart, no need that EqSteps be greater than RunSteps and AdjSteps. Currently we get this error: Error: Move adjustment frequency should be smaller than Equilibration steps!
msoroush commented 6 years ago

@ramess101 I hope I answered your question. Please let me know if I missed anything. In addition, would you please add @YounesN ti this repository?

ramess101 commented 6 years ago

@msoroush Thank you for getting back to me.

  1. OK, that will help, but the files would still be about 15 GB. So I am going to continue using my sequential approach where I remove past PDB files.

  2. Alright, that makes sense. I did not think to change EqSteps because RunSteps was already 0. That should fix the his.dat problem.

  3. Yes, the Blockaverage should probably be turned off, but it was working for me when I set BlockAverageFreq to 1 (but this was just a temporary fix).

  4. OK, I am still confused as to how I should restart my simulations. Did you mean "no need that [RunSteps] be greater than [EqSteps] and AdjSteps"? (I think you inverted EqSteps and RunSteps in your explanation.) So for now is there a way for the restart to continue without requiring additional equilibration? Or is that still being implemented?

  5. Welcome @YounesN . Do you want to be added as a collaborator or just "watch" this repository?

YounesN commented 6 years ago

I think I will watch the repository. It should be sufficient. I can always PR if I have to. Thanks

msoroush commented 6 years ago

@ramess101 I made some changes to this branch. Now if recalculate frame is activated (by setting RunSteps 0), it will no longer print block average, PDB for restart file and simulation frames. In addition, it does not print the move info in the console.

Here is the condition for running simulation: -For starting simulation, RunSteps > EqSteps > AdjSteps. -For restarting simulation RunSteps > EqSteps (EqSteps can be any number). -For recalculating energy of frame there is no condition.

We use two tags in restartPDB file to detect if this file is belong to GOMC or not. In addition, we read box dimension information, cell angles, and move adjustment data from these tags. For instance: REMARK GOMC 1.114 1.72566 0.000 100000
CRYST1 35.000 35.000 35.000 90.00 90.00 90.00 P 1 1

So, as long as your PDB file has these tags, you should be okay to restart your simulation.

ramess101 commented 6 years ago

@msoroush

OK, this helped a lot, thanks for the clarification. I will keep you posted.