grimme-lab / xtb

Semiempirical Extended Tight-Binding Program Package
https://xtb-docs.readthedocs.io/
GNU Lesser General Public License v3.0
568 stars 144 forks source link

Crash with large systems #700

Open aizvorski opened 1 year ago

aizvorski commented 1 year ago

Describe the bug Systems larger than approx 831-833 atoms always crash. This doesn't seem to depend on what the systems are (tried a few different types of systems, from one long linear molecule to many small ones with different atoms, all behave the same), and also doesn't depend on the coordinates (molecules near each other in different orientations, or very far apart). It also doesn’t seem related to the OpenMP stack size.

To Reproduce Using the provided water278.xyz file: https://gist.github.com/aizvorski/641a987e7dfa89eba4ce241c68409768#file-water278-xyz

$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G time -v /home/ubuntu/bin/xtb-6.5.1/bin/xtb water278.xyz --gfn 2 --chrg "0"
...
   * xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
          ...................................................
          :                      SETUP                      :
          :.................................................:
          :  # basis functions                1668          :
          :  # atomic orbitals                1668          :
          :  # shells                         1112          :
          :  # electrons                      2224          :
          :  max. iterations                   250          :
          :  Hamiltonian                  GFN2-xTB          :
          :  restarted?                      false          :
          :  GBSA solvation                  false          :
          :  PC potential                    false          :
          :  electronic temp.          300.0000000     K    :
          :  accuracy                    1.0000000          :
          :  -> integral cutoff          0.2500000E+02      :
          :  -> integral neglect         0.1000000E-07      :
          :  -> SCF convergence          0.1000000E-05 Eh   :
          :  -> wf. convergence          0.1000000E-03 e    :
          :  Broyden damping             0.4000000          :
          ...................................................
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
xtb                000000000305452D  Unknown               Unknown  Unknown
xtb                0000000003271BC0  Unknown               Unknown  Unknown
xtb                000000000099DF21  xtb_disp_coordina         396  coordinationnumber.f90
xtb                00000000031D4B83  Unknown               Unknown  Unknown
xtb                0000000003186C16  Unknown               Unknown  Unknown
xtb                0000000003155085  Unknown               Unknown  Unknown
xtb                000000000099DCA0  xtb_disp_coordina         396  coordinationnumber.f90
xtb                000000000099B2C8  xtb_disp_coordina         340  coordinationnumber.f90
xtb                00000000008E7399  xtb_scf_mp_scf_.A         519  scf_module.F90
xtb                00000000006125A3  xtb_xtb_calculato         257  calculator.f90
xtb                000000000041800F  xtb_prog_main_mp_         580  main.F90
xtb                000000000042512B  MAIN__                     55  primary.f90
xtb                00000000004020EE  Unknown               Unknown  Unknown
xtb                0000000003273060  Unknown               Unknown  Unknown
xtb                0000000000401FD7  Unknown               Unknown  Unknown
Command exited with non-zero status 174
    Command being timed: "/home/ubuntu/bin/xtb-6.5.1/bin/xtb water278.xyz --gfn 2 --chrg 0"
    User time (seconds): 0.15
    System time (seconds): 0.03
    Percent of CPU this job got: 97%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.19
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 108560
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 28220
    Voluntary context switches: 1
    Involuntary context switches: 449
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 174

For comparison, an input file water277.xyz with one less water succeeds: https://gist.github.com/aizvorski/7b4215388491126090ba83b6ae4ab341#file-water277-xyz

$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G time -v /home/ubuntu/bin/xtb-6.5.1/bin/xtb water277.xyz --gfn 2 --chrg "0"
...
   * xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
          ...................................................
          :                      SETUP                      :
          :.................................................:
          :  # basis functions                1662          :
          :  # atomic orbitals                1662          :
          :  # shells                         1108          :
          :  # electrons                      2216          :
          :  max. iterations                   250          :
          :  Hamiltonian                  GFN2-xTB          :
          :  restarted?                       true          :
          :  GBSA solvation                  false          :
          :  PC potential                    false          :
          :  electronic temp.          300.0000000     K    :
          :  accuracy                    1.0000000          :
          :  -> integral cutoff          0.2500000E+02      :
          :  -> integral neglect         0.1000000E-07      :
          :  -> SCF convergence          0.1000000E-05 Eh   :
          :  -> wf. convergence          0.1000000E-03 e    :
          :  Broyden damping             0.4000000          :
          ...................................................

 iter      E             dE          RMSdq      gap      omega  full diag
   1  -1415.6386943 -0.141564E+04  0.204E-07    8.73       0.0  T
   2  -1415.6386943  0.886757E-11  0.119E-07    8.73   29040.2  T
   3  -1415.6386943 -0.106866E-10  0.207E-08    8.73  100000.0  T

   *** convergence criteria satisfied after 3 iterations ***

         #    Occupation            Energy/Eh            Energy/eV
      -------------------------------------------------------------
         1        2.0000           -0.7271272             -19.7861
       ...           ...                  ...                  ...
      1102        2.0000           -0.3682050             -10.0194
      1103        2.0000           -0.3664023              -9.9703
      1104        2.0000           -0.3625255              -9.8648
      1105        2.0000           -0.3584824              -9.7548
      1106        2.0000           -0.3570151              -9.7149
      1107        2.0000           -0.3556497              -9.6777
      1108        2.0000           -0.3359206              -9.1409 (HOMO)
      1109                         -0.0151621              -0.4126 (LUMO)
      1110                         -0.0061251              -0.1667
      1111                          0.0011029               0.0300
      1112                          0.0020212               0.0550
      1113                          0.0029399               0.0800
       ...                                ...                  ...
      1662                          0.4675880              12.7237
      -------------------------------------------------------------
                  HL-Gap            0.3207585 Eh            8.7283 eV
             Fermi-level           -0.1755413 Eh           -4.7767 eV

 SCC (total)                   0 d,  0 h,  0 min, 17.350 sec
 SCC setup                      ...        0 min,  0.037 sec (  0.211%)
 Dispersion                     ...        0 min,  0.080 sec (  0.462%)
 classical contributions        ...        0 min,  0.011 sec (  0.063%)
 integral evaluation            ...        0 min,  0.634 sec (  3.651%)
 iterations                     ...        0 min, 11.684 sec ( 67.342%)
 molecular gradient             ...        0 min,  4.016 sec ( 23.145%)
 printout                       ...        0 min,  0.889 sec (  5.125%)

         :::::::::::::::::::::::::::::::::::::::::::::::::::::
         ::                     SUMMARY                     ::
         :::::::::::::::::::::::::::::::::::::::::::::::::::::
         :: total energy           -1405.892124104588 Eh    ::
         :: gradient norm              0.203225946340 Eh/a0 ::
         :: HOMO-LUMO gap              8.728283439762 eV    ::
         ::.................................................::
         :: SCC energy             -1415.638694336316 Eh    ::
         :: -> isotropic ES            8.569566870483 Eh    ::
         :: -> anisotropic ES         -0.289563022977 Eh    ::
         :: -> anisotropic XC         -0.213130853940 Eh    ::
         :: -> dispersion             -0.253146647874 Eh    ::
         :: repulsion energy           9.734657198491 Eh    ::
         :: add. restraining           0.000000000000 Eh    ::
         :: total charge              -0.000000000003 e     ::
         :::::::::::::::::::::::::::::::::::::::::::::::::::::
...

           -------------------------------------------------
          | TOTAL ENERGY            -1405.892124104588 Eh   |
          | GRADIENT NORM               0.203225946340 Eh/α |
          | HOMO-LUMO GAP               8.728283439762 eV   |
           -------------------------------------------------

------------------------------------------------------------------------
 * finished run on 2022/10/02 at 00:43:41.395     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min, 18.069 sec
 *  cpu-time:     0 d,  0 h,  0 min, 18.065 sec
 * ratio c/w:     1.000 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min, 17.377 sec
 *  cpu-time:     0 d,  0 h,  0 min, 17.376 sec
 * ratio c/w:     1.000 speedup

normal termination of xtb
    Command being timed: "/home/ubuntu/bin/xtb-6.5.1/bin/xtb water277.xyz --gfn 2 --chrg 0"
    User time (seconds): 17.69
    System time (seconds): 0.37
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.07
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 594568
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 228439
    Voluntary context switches: 1
    Involuntary context switches: 483
    Swaps: 0
    File system inputs: 0
    File system outputs: 368
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

This does not appear to be due to out-of-memory, or to too-low setting for OMP_STACKSIZE. The machine this was tested on has >200GB memory. The actual memory used when the crash happens (reported by time -v) is just a little over 100MB.

Setting the stack size deliberately very low with largest input system which succeeds, water277.xyz:

GDB backtrace:

$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G gdb /home/ubuntu/bin/xtb-6.5.0/bin/xtb
(gdb) run water278.xyz --gfn 1 --chrg "0"

Program received signal SIGSEGV, Segmentation fault.
0x000000000099cf41 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
(gdb) bt
#0  0x000000000099cf41 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
#1  0x00000000031d3f83 in __kmp_invoke_microtask ()
#2  0x0000000003186016 in __kmp_fork_call ()
#3  0x0000000003154485 in __kmpc_fork_call ()
#4  0x000000000099ccc0 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
#5  0x000000000099a438 in xtb_disp_coordinationnumber_mp_getcoordinationnumberlp_ ()
#6  0x00000000008e6429 in xtb_scf_mp_scf_.A ()
#7  0x0000000000611d33 in xtb_xtb_calculator_mp_singlepoint_.A ()
#8  0x00000000004177f3 in xtb_prog_main_mp_xtbmain_.A ()
#9  0x000000000042492b in MAIN__ ()

Expected behaviour No crash.

Additional context

Using xtb 6.5.1 binary downloaded from https://github.com/grimme-lab/xtb/releases/download/v6.5.1/xtb-6.5.1-linux-x86_64.tar.xz xtb --version gives version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11

OS: Ubuntu 18.04.4 LTS Hardware: AMD EPYC 7B13 CPU, 224GB RAM (also tested on Ubuntu 20.04 LTS, Intel i7-10510U, 48GB RAM: same behavior) (also tested on xtb-6.5.0 and 6.4.1: same)

aizvorski commented 1 year ago

Update: the exact number of atoms which causes the crash is 834. The number of orbitals doesn't seem to matter, it really is atoms.

Works: 833 helium atoms https://gist.github.com/aizvorski/a6616970339d8447a98989b4d0455db8#file-helium833-xyz

Crashes: 834 helium atoms https://gist.github.com/aizvorski/b7b65913c1a52379937afc76b38c3450#file-helium834-xyz

haneug commented 1 year ago

This works fine for me, once I set 'ulimit -s unlimited' and 'export OMP_STACKSIZE=4G' xtb he834.xyz --namespcae test

* xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
------------------------------------------------------------------------
 * finished run on 2022/10/04 at 09:11:23.662     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min, 12.147 sec
 *  cpu-time:     0 d,  0 h,  0 min, 58.211 sec
 * ratio c/w:     4.792 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min, 11.557 sec
 *  cpu-time:     0 d,  0 h,  0 min, 55.357 sec
 * ratio c/w:     4.790 speedup

normal termination of xtb
aizvorski commented 1 year ago

@haneug I can confirm this, the process stack in ulimit -s was the limiting factor. ulimit -s unlimited works.

I think it's fair to say any SIGSEGV crash is a bug, since it is impossible to distinguish it from other bugs like out of bounds pointer, and there is no indication to the user what it is necessary to do to make the calculation succeed.

Since this is likely to be a thing a lot of folks run into, I'm going to suggest one of two things:

awvwgk commented 1 year ago

While educating the users on this setting seems error prone there are not really much alternatives, or better put, not many universal solutions. A simple band-aid solution could be a shell wrapper around xtb which sets those values by default.

Back to the problem. So far I found a solution for MacOS (using -Wl,-stacksize,0x1000000) and Windows (using /STACK:16777216).

On Linux we have the possibility to use a system call getrlimit(2) / setrlimit(2) to retrieve the current stack limit and warn the user if it not sufficient (note that system call here does not refer to Fortran's call system but usage of a function from the Linux kernel). I don't know whether setrlimit(2) is sufficient to increase the stack size at runtime, this sounds like something a process should not be allowed to do without elevated permissions, but maybe worth a try.

The OpenMP stack size issue is more severe, so far I found no good way to detect a too small stack. However, I believe this is a problem that can be solved on the algorithm side, for example I could restructure most OpenMP regions in s-dftd3 to not put large arrays on the OpenMP stack, which almost completely eliminates issues with stack overflows on both the system or OpenMP stack. Might be a way for xtb as well. The implementation however gets somewhat more verbose about memory allocations.

awvwgk commented 1 year ago

Regarding stack usage, there is many insightful discussions on the use of stack vs. heap arrays in the Fortran discourse:

That issue actually comes up a lot, not only in xtb. The only surefire method so far seems to avoid putting any large arrays on any stack but rather do the heap allocation explicitly.

aizvorski commented 1 year ago

@awvwgk Thanks, that's a good collection of links! I don't know too much about Fortran specifically, but perhaps using some compiler feature to avoid large arrays on the stack (without having to modify code) might work.

What compiler are release xtb binaries compiled with now?

It looks like gfortran doesn't yet have any way of doing this, but Intel ifx -heap-arrays [size] (docs) and NVIDIA/PGI nvfortran -Mnostack_arrays (docs) might do the job.

(Bonus: ifx and nvfortran can both compile OpenMP code to run on GPU :)

aizvorski commented 1 year ago

@awvwgk About OMP_STACKSIZE: the compiler options to reduce stack use may also apply to OpenMP code, but if not, maybe we could default to OMP_STACKSIZE=physical memory/number of threads? That's only if OMP_STACKSIZE environment variable is unset of course; if it is set, then use the value and maybe warn if it is low.

awvwgk commented 1 year ago

It looks like gfortran doesn't yet have any way of doing this, but Intel ifx -heap-arrays [size] (docs) and NVIDIA/PGI nvfortran -Mnostack_arrays (docs) might do the job.

Those apply to automatic arrays. Since we don't use automatic arrays in xtb, the option to put them on the heap will not change the program behavior. Maybe providing a custom allocator in the OpenMP directive might do the trick.

(Bonus: ifx and nvfortran can both compile OpenMP code to run on GPU :)

I'm really looking forward to see the first LLVM based Fortran compiler working for a code base using moderately new Fortran features (F2003+).

lsvvt commented 1 year ago

I found how to fix this bug for Windows. 1) Install MVSC 2) Use Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\Hostx64\x64\editbin.exe to patch xtb.exe editbin.exe /STACK:64000000 xtb.exe