"Fatal error" during rerun step - how should we fix it?

luancarvalhomartins commented 2 years ago

During the rerun step, gmx mdrun -rerun is used to obtain potential energies of each of the sampled conformations of the system using every hamiltonian. During this process, if the potential energy is too high, an error like the one below (note: energies will vary) will be raised and gmx will halt execution.

Fatal error:
Step 0: The total potential energy is 8.11303e+09, which is extremely high.
The LJ and electrostatic contributions to the energy are 8.11306e+09 and
-31622, respectively. A very high potential energy can be caused by
overlapping interactions in bonded interactions or very large coordinate
values. Usually this is caused by a badly- or non-equilibrated initial
configuration, incorrect interactions or parameters in the topology.

This error, is, therefore, a showstopper. gmx mdrun -rerun only checks for this high potential in the first frame, so a workaround could be removing the first frame of the trajectories and retrying, which feels a bit hacky. So far, I found no other solution, though. In case anyone have any suggestion about this, feel free to comment.

Note: in case you are affected by this issue and a fix is yet to be committed, use the following steps to automatically prepare trajs without the first frame and rerun gmx mdrun -rerun, if needed.

> parallel --jobs=1 'printf "0\n" | gmx trjconv -f {}/lambda.trr -s {}/lambda_nocont.tpr -o {}/lambda_skip0.xtc -b 0.002 -ndec 6' ::: lambda*
> parallel --jobs=1 'gmx dump -e rerun/rerun_struct{2}_coord{1}.edr &> /dev/null || gmx mdrun -rerun lambda{1}/lambda_skip0.xtc -s lambda{2}/lambda_nocont.tpr -e rerun/rerun_struct{2}_coord{1}.edr -g rerun/rerun_struct{2}_coord{1}.log' ::: $(seq 0 11) ::: $(seq 0 11)

baba-hashimoto commented 2 years ago

I asked the developers about the case. As a result, I received the following comments.

"If handled by throwing an exception, the rerun code could catch it and continue."

I don't have the skills to modify Gromacs, so I can't handle it, but it seems to be possible to handle it by exception handling. I think my remedy would be to recognize the rerun flag somewhere and add a process to prevent line 465 from executing.

Source file: src/gromacs/mdlib/sim_util.cpp (line 465)

https://gitlab.com/gromacs/gromacs/-/issues/4352

luancarvalhomartins commented 2 years ago

Unfortunately, I don't know C++, so I can't write a patch for GROMACS. Let's hope someone suggests something in the bug you filled, which I'm also tracking.

luancarvalhomartins commented 2 years ago

It has been almost a month since issue https://gitlab.com/gromacs/gromacs/-/issues/4352 was opened, and no further replies were posted. I am seriously considering implementing the workaround - which I also dislike, BTW - in the run script.

baba-hashimoto commented 2 years ago

It seems to have recently gone up on the revised list. I have not been able to work on this project recently because I have been busy with other work, but I would like to resume consideration of membrane protein systems. https://gitlab.com/gromacs/gromacs/-/milestones/132#tab-issues

luancarvalhomartins commented 2 years ago

That's great news. I hope they commit a patch circumventing this issue so we can avoid using a horrible hack.

I'm also super busy with manuscript these days. I hope I am able to do more work on PyAutoFEP soon (likely next week). A tutorial for membrane systems is in my priority list.

baba-hashimoto commented 2 years ago

Gromacs2022.1 has been released and I ran the sample and got the following error. There is nothing wrong with the file or other files. When I used the same files and ran them with Gromacs2020.6, it worked fine. We would appreciate it if you could check with 2022.1 when you have time.

[ERROR] Failed to run gmx pdb2gmx. Error code 1. [ERROR] Command line was: ['gmx', 'pdb2gmx', '-f', 'receptor_data/5q17_processed.pdb', '-p', '/work113/tmp/PyAutoFEP/docs/tutorial01_plumed2022/tutorial/FXR_12-FXR_74/protein/build_system_125508_26042022/protein_step1_125508_26042022.top', '-o', '/work113/tmp/PyAutoFEP/docs/tutorial01_plumed2022/tutorial/FXR_12-FXR_74/protein/build_system_125508_26042022/protein_step1_125508_26042022.pdb'] [ERROR] [ERROR] stdout: [ERROR] Select the Force Field: [ERROR] [ERROR] From current directory: [ERROR] [ERROR] 1: OPLS-AA/M all-atom force field (2015 aminoacid dihedrals) [ERROR] [ERROR] From '/home/gromacs-2022.1_plumed/share/gromacs/top': [ERROR] [ERROR] 2: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003) [ERROR] [ERROR] 3: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995) [ERROR] [ERROR] 4: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996) [ERROR] [ERROR] 5: AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000) [ERROR] [ERROR] 6: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006) [ERROR] [ERROR] 7: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010) [ERROR] [ERROR] 8: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002) [ERROR] [ERROR] 9: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins) [ERROR] [ERROR] 10: GROMOS96 43a1 force field [ERROR] [ERROR] 11: GROMOS96 43a2 force field (improved alkane dihedrals) [ERROR] [ERROR] 12: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205) [ERROR] [ERROR] 13: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656) [ERROR] [ERROR] 14: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656) [ERROR] [ERROR] 15: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40,, 843-856, DOI: 10.1007/s00249-011-0700-9) [ERROR] [ERROR] 16: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals) [ERROR] [ERROR] Using the Oplsaam force field in directory ./oplsaam.ff [ERROR] [ERROR] Select the Water Model: [ERROR] [ERROR] 1: TIP4P TIP 4-point, recommended [ERROR] [ERROR] 2: TIP3P TIP 3-point [ERROR] [ERROR] 3: TIP5P TIP 5-point (see http://redmine.gromacs.org/issues/1348 for issues) [ERROR] [ERROR] 4: TIP5P TIP 5-point improved for Ewald sums [ERROR] [ERROR] 5: SPC simple point charge [ERROR] [ERROR] 6: SPC/E extended simple point charge [ERROR] [ERROR] 7: None [ERROR] [ERROR] going to rename ./oplsaam.ff/aminoacids.r2b [ERROR] Reading receptor_data/5q17_processed.pdb... [ERROR] Read '', 3871 atoms [ERROR] [ERROR] Analyzing pdb file [ERROR] Splitting chemical chains based on TER records or chain id changing. [ERROR] [ERROR] There are 2 chains and 0 blocks of water and 236 residues with 3871 atoms [ERROR] [ERROR] chain #res #atoms [ERROR] [ERROR] 1 ' ' 224 3662 [ERROR] [ERROR] 2 ' ' 12 209 [ERROR] [ERROR] there were 2106 atoms with zero occupancy and 1765 atoms with occupancy unequal to one (out of 3871 atoms). Check your pdb file. [ERROR] [ERROR] Reading residue database... (Oplsaam) [ERROR] [ERROR] [ERROR] stderr: [ERROR] :-) GROMACS - gmx pdb2gmx, 2022.1-plumed_2.8.0 (-: [ERROR] [ERROR] Executable: /home/gromacs-2022.1_plumed/bin/gmx [ERROR] Data prefix: /home/gromacs-2022.1_plumed [ERROR] Working dir: /work113/tmp/PyAutoFEP/docs/tutorial01_plumed2022 [ERROR] Command line: [ERROR] gmx pdb2gmx -f receptor_data/5q17_processed.pdb -p /work113/tmp/PyAutoFEP/docs/tutorial01_plumed2022/tutorial/FXR_12-FXR_74/protein/build_system_125508_26042022/protein_step1_125508_26042022.top -o /work113/tmp/PyAutoFEP/docs/tutorial01_plumed2022/tutorial/FXR_12-FXR_74/protein/build_system_125508_26042022/protein_step1_125508_26042022.pdb [ERROR] [ERROR] Opening force field file ./oplsaam.ff/watermodels.dat [ERROR] Opening force field file ./oplsaam.ff/aminoacids.r2b [ERROR] there were 2106 atoms with zero occupancy and 1765 atoms with occupancy unequal to one (out of 3871 atoms). Check your pdb file. [ERROR] Opening force field file ./oplsaam.ff/atomtypes.atp [ERROR] Opening force field file ./oplsaam.ff/aminoacids.rtp [ERROR] Opening force field file ./oplsaam.ff/aminoacids.hdb [ERROR] Opening force field file ./oplsaam.ff/aminoacids.n.tdb [ERROR] [ERROR] ------------------------------------------------------- [ERROR] Program: gmx pdb2gmx, version 2022.1-plumed_2.8.0 [ERROR] Source file: src/gromacs/gmxpreprocess/ter_db.cpp (line 139) [ERROR] Function: void read_atom(char, bool, std::__cxx11::string, t_atom, PreprocessingAtomTypes, int*) [ERROR] [ERROR] Inconsistency in user input: [ERROR] Atomtype for atom name oplsm_298 not found in terminal data base [ERROR] [ERROR] For more information and tips for troubleshooting, please check the GROMACS [ERROR] website at http://www.gromacs.org/Documentation/Errors [ERROR] ------------------------------------------------------- [ERROR] =================== STACK INFO =================== File "../../prepare_dual_topology.py", line 3445, in solvate_data=solvate_data, verbosity=arguments.verbose) File "../../prepare_dual_topology.py", line 249, in prepare_complex_system os_util.run_gmx(gmx_bin, pdb2gmx_list, communicate_str, build_files_dict['pdb2gmx_log'], verbosity=verbosity) File "/work113/tmp/PyAutoFEP/os_util.py", line 146, in run_gmx msg_verbosity=verbosity_level.error, current_verbosity=verbosity) File "/work113/tmp/PyAutoFEP/os_util.py", line 292, in local_print formatted_string += '\n{:=^50}\n{}{:=^50}'.format(' STACK INFO ', ''.join(traceback.format_stack()), =================== STACK INFO ===================

luancarvalhomartins commented 2 years ago

Thanks for testing GROMACS 2022 version. I created a new issue (#42) for keeping track of this and I am looking into it

chengbao917 commented 2 years ago

During the rerun step, gmx mdrun -rerun is used to obtain potential energies of each of the sampled conformations of the system using every hamiltonian. During this process, if the potential energy is too high, an error like the one below (note: energies will vary) will be raised and gmx will halt execution.
Fatal error:
Step 0: The total potential energy is 8.11303e+09, which is extremely high.
The LJ and electrostatic contributions to the energy are 8.11306e+09 and
-31622, respectively. A very high potential energy can be caused by
overlapping interactions in bonded interactions or very large coordinate
values. Usually this is caused by a badly- or non-equilibrated initial
configuration, incorrect interactions or parameters in the topology.
This error, is, therefore, a showstopper. gmx mdrun -rerun only checks for this high potential in the first frame, so a workaround could be removing the first frame of the trajectories and retrying, which feels a bit hacky. So far, I found no other solution, though. In case anyone have any suggestion about this, feel free to comment.

Note: in case you are affected by this issue and a fix is yet to be committed, use the following steps to automatically prepare trajs without the first frame and rerun gmx mdrun -rerun, if needed.
> parallel --jobs=1 'printf "0\n" | gmx trjconv -f {}/lambda.trr -s {}/lambda_nocont.tpr -o {}/lambda_skip0.xtc -b 0.002 -ndec 6' ::: lambda*
> parallel --jobs=1 'gmx dump -e rerun/rerun_struct{2}_coord{1}.edr &> /dev/null || gmx mdrun -rerun lambda{1}/lambda_skip0.xtc -s lambda{2}/lambda_nocont.tpr -e rerun/rerun_struct{2}_coord{1}.edr -g rerun/rerun_struct{2}_coord{1}.log' ::: $(seq 0 11) ::: $(seq 0 11)

Yes, I encountered the same error. I checked the result and found that it was due to poor molecular alignment between pairs, which resulted in spatial collision between ligand and protein, resulting in energy of extremely high. I think it may be necessary to find a proper ligands alignment method.

luancarvalhomartins / PyAutoFEP

"Fatal error" during rerun step - how should we fix it? #30