grimme-lab / xtb

Semiempirical Extended Tight-Binding Program Package
https://xtb-docs.readthedocs.io/
GNU Lesser General Public License v3.0
567 stars 142 forks source link

XTB crashes when run with large molecules (MSYS2/MinGW) #439

Open shoubhikraj opened 3 years ago

shoubhikraj commented 3 years ago

Describe the bug I am trying to run xtb on an 83 atom molecule. However, it crashes immediately after staring without any error message. For small molecules the error does not happen.

To Reproduce

  1. Input file: https://pastebin.com/2L7atHsu
  2. C:\msys64\mingw64\bin\xtb.exe spherand0_XTB.xyz -c 0 -u 0 -P 4 (running on cmd prompt)
  3. C:\msys64\mingw64\bin\xtb.exe spherand0_XTB.xyz -c 0 -u 0 -P 4 --verbose > out.log 2>&1
  4. Output file: https://pastebin.com/8EcHwPiu

Expected behaviour The program should have ran as usual. When I try the same commands with an .xyz file of methane or other small molecules, it works fine without any problems.

Additional context System: Windows 10, 8GB RAM Compiler: msys2-mingw64 native windows compilation with GNU compilers v10.2.0 (all test passed except expected failures)

The environment variable OMP_STACKSIZE is set to 2G. OMP_NUM_THREADS is set to 4,1 Another weird thing is that when I run the command inside the command prompt, I get two more error messages that are not redirected to the log file for some reason:

libgomp: Invalid value for environmental variable OMP_STACKSIZE
[WARNING] Please study the warnings concerning your input carefully
-1- prog_main_parseArguments: Process number higher than OMP_NUM_THREADS, I hope you know what you are doing.
awvwgk commented 3 years ago

Thanks for testing and reporting. I haven't explored using xtb on Windows beyond building and running the testsuite so far.

Looks like the OMP variables are set but the value is not recognizable by OMP runtime for some reason. The crash happens at a unexpected place as well, usually OMP related stackoverflows happen only at the beginning of the SCC iterations. I probably won't be of much help here, so you might have to try a bit around to yourself.

At least the second warning should be harmless, as far as I can tell.

awvwgk commented 3 years ago

The closest setup I can currently get on Linux with GCC 10.2 runs smoothly, even for more than 100 atoms and uninitialized environment. From the output the failure seems to happen around the initialization of the random number seed (which is only used in the dynamics much later), strange.

shoubhikraj commented 3 years ago

@awvwgk Thanks for the messages. #315 mentions that meson can be used with Intel Compilers. Is there any place where I could search how to do that? Because I think the problem with OpenMP is due to the fact that mingw64 is a separate build environement, so the executables depend on mingw dlls, which may not be able to detect the Windows environment variables. So compiling with Intel compilers and visual studio/meson may solve the problem. I have tried this, but unfortunately keep running into multiple errors.

awvwgk commented 3 years ago

Does your xtb version work in a MSYS2 terminal? At least that is the environment in which its functionality was tested by the testsuite.

To use xtb outside of the MSYS2 terminal, i.e. in CMD.exe, you might have to link statically against the runtime libraries. Could you try adding -Dfortran_link_args=-static in the configuration step to get rid of the dll dependencies on the MinGW64 and GCC libraries?

shoubhikraj commented 3 years ago

I have ran it through the msys2-mingw64 terminal, but it produces the same result.

Recompiling with -Dgfortran_link_args=-static seems to have removed the libgomp OMP_STACKSIZE error, but the program still stops without any error message.

Here's the new output (stdout+stderr): https://pastebin.com/QJ0V8bsM

awvwgk commented 3 years ago

That looks better, seem like you are now running into a legitimate stackoverflow, on Unix there is ulimit -s unlimited to mend this, not sure if there is a runtime equivalent on Windows. I have seen in previous posts for Windows versions of xtb that the stacksize has to provided at compile time, maybe check your compiler documentation for the right option here.

Yuandq commented 3 years ago

I have encountered the same problem. Following the discussion above, I have basically solved the problem. System: Windows 10, CPU AMD Ryzen 7 4700U, 16GB RAM Compiler: msys2-mingw64 native windows compilation with GNU compilers v10.2.0 Patch: /xtb/cmake/CMakeLists.txt line 4 set(dialect "-fdefault-real-8 -fdefault-double-8 -ffree-line-length-none -fbacktrace") was changed to set(dialect "-fdefault-real-8 -fdefault-double-8 -ffree-line-length-none -fbacktrace -Wl,--stack=16777216") xtb version: 6.4.0

Building xtb with CMake works with the following chain of commands: cmake -B build -DCMAKE_BUILD_TYPE=Release -G"MSYS Makefiles" make -C build make -C build test ##all test passed make -C build install

I have test spherand0_XTB.xyz, and it can be run with normal termination of xtb. c:\xtb\bin\xtb.exe spherand0_XTB.xyz -c 0 -u 0 -P 4 --verbose > out.log 2>&1
out.log

awvwgk commented 3 years ago

Thanks for sharing @Yuandq.

For meson this can be done without patching the build files but including -Dfortran_link_args="-static -Wl,--stack=16777216" in the build configuration step. We might include this by default in the meson build files if we detect a Windows build in the future.

shoubhikraj commented 3 years ago

@Yuandq @awvwgk Thanks for the help! It works now. However, at the end of normal termination, I get this error message:

Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

Is this usual? Or does this mean there are errors in the calculation?

awvwgk commented 3 years ago

Those warnings should be mostly harmless, potential floating point errors are currently handled, but I think the flags are not cleared correctly yet.