Open tobigithub opened 4 years ago
One can also use netlib lapack backend, the best remove the whole build directory and do a fresh unzip. The reconfigure command for meson and ninja also works.
>meson --reconfigure build_gcc/
>ninja -C build_gcc test
or
>unzip master.zip
>cd xtb-master
>export FC=gfortran-8 CC=gcc-8
>meson setup build_gcc --buildtype release -Dla_backend=netlib --warnlevel 0
>ninja -C build_gcc test
we get the same results that all tests were OK
Ok: 54
Expected Fail: 5
Fail: 0
Unexpected Pass: 0
Skipped: 3
Timeout: 0
Full log written to /home/tkind/xtb_source/xtb-master/build_gcc/meson-logs/testlog.txt
then we can test xtb with the caffeine example:
tkind@instance-1:~/xtb_source/test/caffeine-netlib$ ./xtb --version
-----------------------------------------------------------
| ===================== |
| x T B |
| ===================== |
| S. Grimme |
| Mulliken Center for Theoretical Chemistry |
| University of Bonn |
-----------------------------------------------------------
* xtb version 6.2.3 (unknown) compiled by 'tkind@instance-1' on 2020-03-20
normal termination of xtb
>./xtb --opt extreme caffeine.coord
optimized geometry written to: xtbopt.coord
-------------------------------------------------
| TOTAL ENERGY -42.153937410494 Eh |
| GRADIENT NORM 0.000011585590 Eh/α |
| HOMO-LUMO GAP 3.424625890456 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2020/03/20 at 04:21:22.044
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 0.816 sec
* cpu-time: 0 d, 0 h, 0 min, 0.761 sec
* ratio c/w: 0.932 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 0.050 sec
* cpu-time: 0 d, 0 h, 0 min, 0.044 sec
* ratio c/w: 0.891 speedup
ANC optimizer:
* wall-time: 0 d, 0 h, 0 min, 0.727 sec
* cpu-time: 0 d, 0 h, 0 min, 0.690 sec
* ratio c/w: 0.950 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
While the file size is the same (netlib/openblas) the md5sum is different. Also the executables seem small compared to the intel/mkl compiled binaries, so maybe they are not statically linked.
Micro Benchmark I do not claim to understand the different BLAS features, except that the Intel MKL version with AVX2 vectorization is faster. An explanation from quantum espresso explains more [LINK]. A look into the meson build config reveals "('Static linked binaries are only supported with Intel MKL')"
Also for gcc the -O3 option with -march-native needs to be enabled to allow for gfortran vectorization whenever possible (-O3 -march=native) see [GNU fortran compiler flags on stackoverflow]
Example of C54H30.xyz
>wget https://raw.githubusercontent.com/tobigithub/quantum-xtb/master/input-molecules/C54H30.xyz
>tkind@instance-1:~/xtb_source/test/c60-intel$ ./xtb C54H30.xyz --opt extreme
-------------------------------------------------
| TOTAL ENERGY -131.025242691597 Eh |
| GRADIENT NORM 0.000035803133 Eh/a |
| HOMO-LUMO GAP 2.139351180564 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2020/03/20 at 06:23:39.398
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 21.496 sec
* cpu-time: 0 d, 0 h, 0 min, 21.338 sec
* ratio c/w: 0.993 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 0.449 sec
* cpu-time: 0 d, 0 h, 0 min, 0.444 sec
* ratio c/w: 0.989 speedup
ANC optimizer:
* wall-time: 0 d, 0 h, 0 min, 20.718 sec
* cpu-time: 0 d, 0 h, 0 min, 20.586 sec
* ratio c/w: 0.994 speedup
normal termination of xtb
----
>tkind@instance-1:~/xtb_source/test/c60-gcc-native-vec256$ ./xtb C54H30.xyz --opt extreme
-------------------------------------------------
| TOTAL ENERGY -131.025242602428 Eh |
| GRADIENT NORM 0.000049906715 Eh/α |
| HOMO-LUMO GAP 2.139040221321 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2020/03/20 at 07:25:58.338
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 54.423 sec
* cpu-time: 0 d, 0 h, 0 min, 54.005 sec
* ratio c/w: 0.992 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 0.487 sec
* cpu-time: 0 d, 0 h, 0 min, 0.479 sec
* ratio c/w: 0.984 speedup
ANC optimizer:
* wall-time: 0 d, 0 h, 0 min, 53.748 sec
* cpu-time: 0 d, 0 h, 0 min, 53.373 sec
* ratio c/w: 0.993 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
----
>tkind@instance-1:~/xtb_source/test/c60-gcc-native-vec128$ ./xtb C54H30.xyz --opt extreme
-------------------------------------------------
| TOTAL ENERGY -131.025242602387 Eh |
| GRADIENT NORM 0.000049899084 Eh/α |
| HOMO-LUMO GAP 2.139040182769 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2020/03/20 at 07:34:26.943
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 53.942 sec
* cpu-time: 0 d, 0 h, 0 min, 53.507 sec
* ratio c/w: 0.992 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 0.482 sec
* cpu-time: 0 d, 0 h, 0 min, 0.474 sec
* ratio c/w: 0.984 speedup
ANC optimizer:
* wall-time: 0 d, 0 h, 0 min, 53.267 sec
* cpu-time: 0 d, 0 h, 0 min, 52.878 sec
* ratio c/w: 0.993 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
----
>tkind@instance-1:~/xtb_source/test/c60-gcc-march-native$ ./xtb C54H30.xyz --opt extreme
-------------------------------------------------
| TOTAL ENERGY -131.025242602428 Eh |
| GRADIENT NORM 0.000049906715 Eh/a |
| HOMO-LUMO GAP 2.139040221321 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2020/03/20 at 06:25:34.589
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 56.322 sec
* cpu-time: 0 d, 0 h, 0 min, 53.881 sec
* ratio c/w: 0.957 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 0.483 sec
* cpu-time: 0 d, 0 h, 0 min, 0.478 sec
* ratio c/w: 0.989 speedup
ANC optimizer:
* wall-time: 0 d, 0 h, 0 min, 55.665 sec
* cpu-time: 0 d, 0 h, 0 min, 53.247 sec
* ratio c/w: 0.957 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
-----
>tkind@instance-1:~/xtb_source/test/c60-netlib$ ./xtb C54H30.xyz --opt extreme
-------------------------------------------------
| TOTAL ENERGY -131.025242766277 Eh |
| GRADIENT NORM 0.000047149550 Eh/a |
| HOMO-LUMO GAP 2.139325447768 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2020/03/20 at 06:32:19.573
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 2 min, 18.933 sec
* cpu-time: 0 d, 0 h, 2 min, 18.247 sec
* ratio c/w: 0.995 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 0.490 sec
* cpu-time: 0 d, 0 h, 0 min, 0.485 sec
* ratio c/w: 0.989 speedup
ANC optimizer:
* wall-time: 0 d, 0 h, 2 min, 18.269 sec
* cpu-time: 0 d, 0 h, 2 min, 17.606 sec
* ratio c/w: 0.995 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
----
>tkind@instance-1:~/xtb_source/test/c60-openblas$ ./xtb C54H30.xyz --opt extreme
-------------------------------------------------
| TOTAL ENERGY -131.025242766277 Eh |
| GRADIENT NORM 0.000047149550 Eh/a |
| HOMO-LUMO GAP 2.139325447768 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2020/03/20 at 06:49:33.778
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 2 min, 19.628 sec
* cpu-time: 0 d, 0 h, 2 min, 18.905 sec
* ratio c/w: 0.995 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 0.492 sec
* cpu-time: 0 d, 0 h, 0 min, 0.487 sec
* ratio c/w: 0.990 speedup
ANC optimizer:
* wall-time: 0 d, 0 h, 2 min, 18.957 sec
* cpu-time: 0 d, 0 h, 2 min, 18.259 sec
* ratio c/w: 0.995 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
>lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping: 0
CPU MHz: 2300.000
BogoMIPS: 4600.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp
lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave
avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_cl
ear arch_capabilities
Results (1 XEON CPU@2.3 Ghz):
Basically Intel MKL is 6-fold faster, while gfortran vectorization allows for 2-fold faster computations and the baseline with just the "-O3" compiler option and netlib/openblas is the slowest. EasyBuild seems to be another way to integrate different packages.
to update compiler settings for gcc meson builds use:
>meson --reconfigure build_gcc/
>ninja -C build_gcc test
compilation of gcc on ubuntu (VM instance on GCP)
Compilation can be done with meson/ninja, or with make/cmake. Here is the complete installation procedure from a blank system.
The refer to build instructions: https://xtb-docs.readthedocs.io/en/latest/development.html
1) Install pip3, ninja and meson, be aware meson and ninja in this case are installed via pip3 not natively (apt install meson) on the OS level. In the case below, without the sudo command meson is installed into "~/.local/bin/meson". See https://github.com/mesonbuild/meson/issues/1613 and https://mesonbuild.com/Quick-guide.html
2) Install the BLAS libraries under Ubuntu, if openblas is not installed it may raise an error "ERROR: Fortran library 'openblas' not found"
3) Install gcc and gfortran (latest >8.x) see https://www.osradar.com/install-gnu-fortran-on-ubuntu-18-04/ the normal "sudo apt-get install gfortran" only installs gfortran 7.5 hence we need gfortran-8 this can also be confirmed by looking into the xtb repository
4) get the xtb sources, with wget on gets the master.zip (Windows gets xtb-master)
5) make changes to the system and then built (UBUNTU gcc), openblas backend is defined here, example for netlib see below. https://github.com/grimme-lab/xtb/blob/master/meson_options.txt also see https://github.com/mesonbuild/meson/issues/4890
6) After running ninja with the tests, a number of warnings may occur, not sure if relevant or not or covered by the tests. Compile time on one Intel Xeon core (2.4 Ghz) is around 1-2 minutes.
7) Then copy the xtb binary from the folder "/xtb_source/xtb-master/build_gcc/" and run a version test and run a caffeine test.
8) Now testing the xtb computations whith caffeine as example:
9) running xtb (just copied into the same folder, observe the binaries and config files and folders when transferring)
Comparing with INTEL compilers and MKL
Basically the difference between the INTEL compiler and gcc is 0.00000002719 Eh which translates into 0.000017 kcal/mol or a very small error however the IEEE_UNDERFLOW_FLAG IEEE_DENORMAL is somewhat unsettling, reminding me of Intel 387 coprocessor testing times. A quick google search reveals that there are potential remedies to this warning, which I will not explore.
Verdict compiling xtb with gcc and gfortran-8 works.