tobigithub / quantum-xtb

Benchmarks and examples for Grimme's semiempirical GFNn-xTB
3 stars 0 forks source link

gcc compilation of xtb #19

Open tobigithub opened 4 years ago

tobigithub commented 4 years ago

compilation of gcc on ubuntu (VM instance on GCP)

Compilation can be done with meson/ninja, or with make/cmake. Here is the complete installation procedure from a blank system.

Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.0.0-1031-gcp x86_64)
>sudo apt-get update
>sudo apt update
>sudo apt upgrade

The refer to build instructions: https://xtb-docs.readthedocs.io/en/latest/development.html

1) Install pip3, ninja and meson, be aware meson and ninja in this case are installed via pip3 not natively (apt install meson) on the OS level. In the case below, without the sudo command meson is installed into "~/.local/bin/meson". See https://github.com/mesonbuild/meson/issues/1613 and https://mesonbuild.com/Quick-guide.html

>sudo apt install python3-pip
>sudo pip3 install meson
>sudo pip3 install ninja

>pip3 list
meson (0.53.2)
ninja (1.9.0.post1)

2) Install the BLAS libraries under Ubuntu, if openblas is not installed it may raise an error "ERROR: Fortran library 'openblas' not found"

>sudo apt-get install libblas-dev liblapack-dev
>sudo apt-get install libopenblas-base libopenblas-dev 

3) Install gcc and gfortran (latest >8.x) see https://www.osradar.com/install-gnu-fortran-on-ubuntu-18-04/ the normal "sudo apt-get install gfortran" only installs gfortran 7.5 hence we need gfortran-8 this can also be confirmed by looking into the xtb repository

>sudo apt-get install gfortran-8
>sudo apt-get install gcc-8
>gfortran-8 --version
GNU Fortran (Ubuntu 8.3.0-26ubuntu1~18.04) 8.3.0

4) get the xtb sources, with wget on gets the master.zip (Windows gets xtb-master)

>mkdir xtb_source
>cd xtb_source
>wget https://codeload.github.com/grimme-lab/xtb/zip/master/xtb-master.zip
>unzip master.zip
>cd xtb-master

5) make changes to the system and then built (UBUNTU gcc), openblas backend is defined here, example for netlib see below. https://github.com/grimme-lab/xtb/blob/master/meson_options.txt also see https://github.com/mesonbuild/meson/issues/4890

>export FC=gfortran-8 CC=gcc-8 
>meson setup build_gcc --buildtype release -Dla_backend=openblas --warnlevel 0
>ninja -C build_gcc test

6) After running ninja with the tests, a number of warnings may occur, not sure if relevant or not or covered by the tests. Compile time on one Intel Xeon core (2.4 Ghz) is around 1-2 minutes.

[345/365] Compiling Fortran object 'TESTSUITE/477912c@@xtb_test@exe/assertion.f90.o'.
../TESTSUITE/assertion.f90:2:20:

    use, intrinsic :: iso_fortran_env
                    1
Warning: Use of the NUMERIC_STORAGE_SIZE named constant from intrinsic module ISO_FORTRAN_ENV at (1) is incompatible with option -fdefault-real-8
[353/365] Linking target xtb.
/usr/bin/ld: warning: libgfortran.so.4, needed by /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libopenblas.so, may conflict with libgfortran.so.5
[364/365] Linking target TESTSUITE/xtb_test.
/usr/bin/ld: warning: libgfortran.so.4, needed by /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libopenblas.so, may conflict with libgfortran.so.5
[364/365] Running all tests.
 1/63 Argparser: print version                OK       0.01 s 
 2/63 Argparser: print help                   OK       0.01 s 
 3/63 Argparser: print license                OK       0.01 s 
 4/63 Argparser: no arguments                 EXPECTEDFAIL 0.02 s 
 5/63 Info                                    OK       0.01 s 
 6/63 Singlepoint                             OK       0.12 s 
 7/63 Geometry opt.                           OK       0.37 s 
 8/63 IP/EA                                   OK       0.17 s 
 9/63 GFN0-xTB                                OK       0.07 s 
10/63 GFN1-xTB                                OK       0.12 s 
11/63 GFN2-xTB/GBSA                           OK       0.12 s 
12/63 Molecule: axis                          OK       0.01 s 
13/63 Molecule: MIC                           OK       0.01 s 
14/63 Wigner-Seitz Cell (3D)                  OK       0.01 s 
15/63 coord 3D                                OK       0.01 s 
16/63 coord 3D                                OK       0.01 s 
17/63 coord 2D                                SKIP     0.01 s 
18/63 coord 1D                                SKIP     0.01 s 
19/63 coord 0D                                OK       0.01 s 
20/63 Xmol  0D                                OK       0.01 s 
21/63 POSCAR                                  OK       0.01 s 
22/63 molfile                                 OK       0.01 s 
23/63 molfile flat                            EXPECTEDFAIL 0.02 s 
24/63 SDF                                     OK       0.01 s 
25/63 SDF flat                                EXPECTEDFAIL 0.02 s 
26/63 SDF no H                                EXPECTEDFAIL 0.02 s 
27/63 PDB                                     OK       0.01 s 
28/63 PDB no H                                EXPECTEDFAIL 0.02 s 
29/63 genFormat                               OK       0.01 s 
30/63 PBC tools: convert                      OK       0.01 s 
31/63 PBC tools: cutoff                       OK       0.01 s 
32/63 Symmetry: Water                         OK       0.01 s 
33/63 Symmetry: Li8                           OK       0.01 s 
34/63 Symmetry: PCl3                          OK       0.01 s 
35/63 Symmetry: large                         OK       0.17 s 
36/63 Thermo: axis                            OK       0.01 s 
37/63 Thermo: calculation                     OK       0.01 s 
38/63 Thermo: print                           OK       0.01 s 
39/63 EEQ model: water                        OK       0.01 s 
40/63 EEQ model: 3D Ewald                     OK       0.01 s 
41/63 EEQ model: GBSA                         OK       0.01 s 
42/63 EEQ model: GBSA (salt)                  OK       0.01 s 
43/63 EEQ model: GBSA (H-bond)                SKIP     0.01 s 
44/63 Dispersion: properties                  OK       0.02 s 
45/63 Dispersion: energies                    OK       0.02 s 
46/63 Dispersion: energies (PBC)              OK       1.72 s 
47/63 Dispersion: API                         OK       0.03 s 
48/63 GFN2-xTB: SCC                           OK       0.02 s 
49/63 GFN2-xTB: API                           OK       0.02 s 
50/63 GFN2-xTB: API (GBSA)                    OK       0.03 s 
51/63 GFN2-xTB: API (GBSA+salt)               OK       0.03 s 
52/63 GFN2-xTB: API (PCEM)                    OK       0.03 s 
53/63 GFN1-xTB: SCC                           OK       0.01 s 
54/63 GFN1-xTB: API                           OK       0.02 s 
55/63 GFN1-xTB: XB                            OK       0.02 s 
56/63 GFN1-xTB: API (GBSA)                    OK       0.02 s 
57/63 GFN1-xTB: API (PCEM)                    OK       0.03 s 
58/63 GFN0-xTB: SP                            OK       0.03 s 
59/63 GFN0-xTB: API                           OK       0.03 s 
60/63 GFN0-xTB: SRB                           OK       0.03 s 
61/63 GFN0-xTB: SP  (PBC)                     OK       0.07 s 
62/63 GFN0-xTB: API (PBC)                     OK       0.03 s 
63/63 GFN0-xTB: SRB (PBC)                     OK       0.17 s 
Ok:                   55
Expected Fail:         5
Fail:                  0
Unexpected Pass:       0
Skipped:               3
Timeout:               0
Full log written to /home/tkind/xtb_source/xtb-master/build_gcc/meson-logs/testlog.txt
tkind@instance-1:~/xtb_source/xtb-master$ 

7) Then copy the xtb binary from the folder "/xtb_source/xtb-master/build_gcc/" and run a version test and run a caffeine test.

tkind@instance-1:~/xtb_source/test/caffeine$ ./xtb --version
      -----------------------------------------------------------      
     |                   =====================                   |     
     |                           x T B                           |     
     |                   =====================                   |     
     |                         S. Grimme                         |     
     |          Mulliken Center for Theoretical Chemistry        |     
     |                    University of Bonn                     |     
      -----------------------------------------------------------      
   * xtb version 6.2.3 (unknown) compiled by 'tkind@instance-1' on 2020-03-20
normal termination of xtb
tkind@instance-1:~/xtb_source/test/caffeine$ 

8) Now testing the xtb computations whith caffeine as example:

>cat caffeine.ccord
$coord
    2.02799738646442      0.09231312124713     -0.14310895950963       C
    4.75011007621000      0.02373496014051     -0.14324124033844       N
    6.33434307654413      2.07098865582721     -0.14235306905930       C
    8.72860718071825      1.38002919517619     -0.14265542523943       N
    8.65318821103610     -1.19324866489847     -0.14231527453678       C
    6.23857175648671     -2.08353643730276     -0.14218299370797       C
    5.63266886875962     -4.69950321056008     -0.13940509630299       C
    3.44931709749015     -5.48092386085491     -0.14318454855466       O
    7.77508917214346     -6.24427872938674     -0.13107140408805       N
   10.30229550927022     -5.39739796609292     -0.13672168520430       C
   12.07410272485492     -6.91573621641911     -0.13666499342053       O
   10.70038521493902     -2.79078533715849     -0.14148379504141       N
   13.24597858727017     -1.76969072232377     -0.14218299370797       C
    7.40891694074004     -8.95905928176407     -0.11636933482904       C
    1.38702118184179      2.05575746325296     -0.14178615122154       H
    1.34622199478497     -0.86356704498496      1.55590600570783       H
    1.34624089204623     -0.86133716815647     -1.84340893849267       H
    5.65596919189118      4.00172183859480     -0.14131371969009       H
   14.67430918222276     -3.26230980007732     -0.14344911021228       H
   13.50897177220290     -0.60815166181684      1.54898960808727       H
   13.50780014200488     -0.60614855212345     -1.83214617078268       H
    5.41408424778406     -9.49239668625902     -0.11022772492007       H
    8.31919801555568     -9.74947502841788      1.56539243085954       H
    8.31511620712388     -9.76854236502758     -1.79108242206824       H
$end

9) running xtb (just copied into the same folder, observe the binaries and config files and folders when transferring)

>./xtb --opt extreme caffeine.coord 
...
optimized geometry written to: xtbopt.coord

           -------------------------------------------------
          | TOTAL ENERGY              -42.153937410501 Eh   |
          | GRADIENT NORM               0.000011578351 Eh/α |
          | HOMO-LUMO GAP               3.424625518833 eV   |
           -------------------------------------------------

------------------------------------------------------------------------
 * finished run on 2020/03/20 at 03:39:39.545     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min,  0.767 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.720 sec
 * ratio c/w:     0.940 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.041 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.037 sec
 * ratio c/w:     0.907 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min,  0.687 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.657 sec
 * ratio c/w:     0.957 speedup

normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

Comparing with INTEL compilers and MKL

gcc+gfortran-8 (openblas)
         -------------------------------------------------
          | TOTAL ENERGY              -42.153937410494 Eh   |
          | GRADIENT NORM               0.000011585590 Eh/a |
          | HOMO-LUMO GAP               3.424625890456 eV   |
           -------------------------------------------------
intel (MKL)

           -------------------------------------------------
          | TOTAL ENERGY              -42.153937383304 Eh   |
          | GRADIENT NORM               0.000043689684 Eh/a |
          | HOMO-LUMO GAP               3.424638764696 eV   |
           -------------------------------------------------

Difference is 0.00000002719 Eh

Basically the difference between the INTEL compiler and gcc is 0.00000002719 Eh which translates into 0.000017 kcal/mol or a very small error however the IEEE_UNDERFLOW_FLAG IEEE_DENORMAL is somewhat unsettling, reminding me of Intel 387 coprocessor testing times. A quick google search reveals that there are potential remedies to this warning, which I will not explore.

Verdict compiling xtb with gcc and gfortran-8 works.

tobigithub commented 4 years ago

One can also use netlib lapack backend, the best remove the whole build directory and do a fresh unzip. The reconfigure command for meson and ninja also works.

>meson --reconfigure build_gcc/
>ninja -C build_gcc test

or

>unzip master.zip
>cd xtb-master

>export FC=gfortran-8 CC=gcc-8 
>meson setup build_gcc --buildtype release -Dla_backend=netlib --warnlevel 0
>ninja -C build_gcc test

we get the same results that all tests were OK

Ok:                   54
Expected Fail:         5
Fail:                  0
Unexpected Pass:       0
Skipped:               3
Timeout:               0
Full log written to /home/tkind/xtb_source/xtb-master/build_gcc/meson-logs/testlog.txt

then we can test xtb with the caffeine example:

tkind@instance-1:~/xtb_source/test/caffeine-netlib$ ./xtb --version
      -----------------------------------------------------------      
     |                   =====================                   |     
     |                           x T B                           |     
     |                   =====================                   |     
     |                         S. Grimme                         |     
     |          Mulliken Center for Theoretical Chemistry        |     
     |                    University of Bonn                     |     
      -----------------------------------------------------------      

   * xtb version 6.2.3 (unknown) compiled by 'tkind@instance-1' on 2020-03-20

normal termination of xtb

>./xtb --opt extreme caffeine.coord 
optimized geometry written to: xtbopt.coord
           -------------------------------------------------
          | TOTAL ENERGY              -42.153937410494 Eh   |
          | GRADIENT NORM               0.000011585590 Eh/α |
          | HOMO-LUMO GAP               3.424625890456 eV   |
           -------------------------------------------------
------------------------------------------------------------------------
 * finished run on 2020/03/20 at 04:21:22.044     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min,  0.816 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.761 sec
 * ratio c/w:     0.932 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.050 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.044 sec
 * ratio c/w:     0.891 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min,  0.727 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.690 sec
 * ratio c/w:     0.950 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

While the file size is the same (netlib/openblas) the md5sum is different. Also the executables seem small compared to the intel/mkl compiled binaries, so maybe they are not statically linked.

tobigithub commented 4 years ago

Micro Benchmark I do not claim to understand the different BLAS features, except that the Intel MKL version with AVX2 vectorization is faster. An explanation from quantum espresso explains more [LINK]. A look into the meson build config reveals "('Static linked binaries are only supported with Intel MKL')"

Also for gcc the -O3 option with -march-native needs to be enabled to allow for gfortran vectorization whenever possible (-O3 -march=native) see [GNU fortran compiler flags on stackoverflow]

Example of C54H30.xyz

>wget https://raw.githubusercontent.com/tobigithub/quantum-xtb/master/input-molecules/C54H30.xyz

>tkind@instance-1:~/xtb_source/test/c60-intel$ ./xtb C54H30.xyz  --opt extreme

           -------------------------------------------------
          | TOTAL ENERGY             -131.025242691597 Eh   |
          | GRADIENT NORM               0.000035803133 Eh/a |
          | HOMO-LUMO GAP               2.139351180564 eV   |
           -------------------------------------------------
------------------------------------------------------------------------
 * finished run on 2020/03/20 at 06:23:39.398     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min, 21.496 sec
 *  cpu-time:     0 d,  0 h,  0 min, 21.338 sec
 * ratio c/w:     0.993 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.449 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.444 sec
 * ratio c/w:     0.989 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 20.718 sec
 *  cpu-time:     0 d,  0 h,  0 min, 20.586 sec
 * ratio c/w:     0.994 speedup
normal termination of xtb

----
>tkind@instance-1:~/xtb_source/test/c60-gcc-native-vec256$ ./xtb C54H30.xyz --opt extreme
          -------------------------------------------------
          | TOTAL ENERGY             -131.025242602428 Eh   |
          | GRADIENT NORM               0.000049906715 Eh/α |
          | HOMO-LUMO GAP               2.139040221321 eV   |
           -------------------------------------------------
------------------------------------------------------------------------
 * finished run on 2020/03/20 at 07:25:58.338     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min, 54.423 sec
 *  cpu-time:     0 d,  0 h,  0 min, 54.005 sec
 * ratio c/w:     0.992 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.487 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.479 sec
 * ratio c/w:     0.984 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 53.748 sec
 *  cpu-time:     0 d,  0 h,  0 min, 53.373 sec
 * ratio c/w:     0.993 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
----
>tkind@instance-1:~/xtb_source/test/c60-gcc-native-vec128$ ./xtb C54H30.xyz --opt extreme
           -------------------------------------------------
          | TOTAL ENERGY             -131.025242602387 Eh   |
          | GRADIENT NORM               0.000049899084 Eh/α |
          | HOMO-LUMO GAP               2.139040182769 eV   |
           -------------------------------------------------

------------------------------------------------------------------------
 * finished run on 2020/03/20 at 07:34:26.943     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min, 53.942 sec
 *  cpu-time:     0 d,  0 h,  0 min, 53.507 sec
 * ratio c/w:     0.992 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.482 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.474 sec
 * ratio c/w:     0.984 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 53.267 sec
 *  cpu-time:     0 d,  0 h,  0 min, 52.878 sec
 * ratio c/w:     0.993 speedup

normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
----
>tkind@instance-1:~/xtb_source/test/c60-gcc-march-native$ ./xtb C54H30.xyz  --opt extreme
           -------------------------------------------------
          | TOTAL ENERGY             -131.025242602428 Eh   |
          | GRADIENT NORM               0.000049906715 Eh/a |
          | HOMO-LUMO GAP               2.139040221321 eV   |
           -------------------------------------------------
------------------------------------------------------------------------
 * finished run on 2020/03/20 at 06:25:34.589     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min, 56.322 sec
 *  cpu-time:     0 d,  0 h,  0 min, 53.881 sec
 * ratio c/w:     0.957 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.483 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.478 sec
 * ratio c/w:     0.989 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  0 min, 55.665 sec
 *  cpu-time:     0 d,  0 h,  0 min, 53.247 sec
 * ratio c/w:     0.957 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

-----

>tkind@instance-1:~/xtb_source/test/c60-netlib$ ./xtb C54H30.xyz  --opt extreme
           -------------------------------------------------
          | TOTAL ENERGY             -131.025242766277 Eh   |
          | GRADIENT NORM               0.000047149550 Eh/a |
          | HOMO-LUMO GAP               2.139325447768 eV   |
           -------------------------------------------------
------------------------------------------------------------------------
 * finished run on 2020/03/20 at 06:32:19.573     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  2 min, 18.933 sec
 *  cpu-time:     0 d,  0 h,  2 min, 18.247 sec
 * ratio c/w:     0.995 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.490 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.485 sec
 * ratio c/w:     0.989 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  2 min, 18.269 sec
 *  cpu-time:     0 d,  0 h,  2 min, 17.606 sec
 * ratio c/w:     0.995 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

----

>tkind@instance-1:~/xtb_source/test/c60-openblas$ ./xtb C54H30.xyz  --opt extreme
           -------------------------------------------------
          | TOTAL ENERGY             -131.025242766277 Eh   |
          | GRADIENT NORM               0.000047149550 Eh/a |
          | HOMO-LUMO GAP               2.139325447768 eV   |
           -------------------------------------------------
------------------------------------------------------------------------
 * finished run on 2020/03/20 at 06:49:33.778     
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  2 min, 19.628 sec
 *  cpu-time:     0 d,  0 h,  2 min, 18.905 sec
 * ratio c/w:     0.995 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.492 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.487 sec
 * ratio c/w:     0.990 speedup
 ANC optimizer:
 * wall-time:     0 d,  0 h,  2 min, 18.957 sec
 *  cpu-time:     0 d,  0 h,  2 min, 18.259 sec
 * ratio c/w:     0.995 speedup
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
>lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping:            0
CPU MHz:             2300.000
BogoMIPS:            4600.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp 
lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave 
avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_cl
ear arch_capabilities

Results (1 XEON CPU@2.3 Ghz):

  1. Intel MKL: 21.496 sec
  2. gfortran-8 (netlib/openblas): "-O3 -march=native -mprefer-vector-width=256": 54 sec
  3. gfortran-8 (netlib/openblas): "-O3 -march=native -mprefer-vector-width=128": 54 sec
  4. gfortran-8 (netlib/openblas): "-O3 -march=native": 56.322 sec
  5. gfortran-8 (netlib/openblas) "-O3": 2 min, 19.628 sec

Basically Intel MKL is 6-fold faster, while gfortran vectorization allows for 2-fold faster computations and the baseline with just the "-O3" compiler option and netlib/openblas is the slowest. EasyBuild seems to be another way to integrate different packages.

tobigithub commented 4 years ago

to update compiler settings for gcc meson builds use:

>meson --reconfigure build_gcc/
>ninja -C build_gcc test