About the way parallelisation works

grimme-lab / xtb

Semiempirical Extended Tight-Binding Program Package

https://xtb-docs.readthedocs.io/

GNU Lesser General Public License v3.0

570 stars 144 forks source link

About the way parallelisation works #613

Open xiki-tempula opened 2 years ago

xiki-tempula commented 2 years ago

I wonder if I could have an explanation of how the xTB parallelization works? Thanks. I was under the impression that xTB is parallelised only through openMP and the number of openMP is controlled with -P or --parallel. However, I noticed that for my GFN2-xTB calculation, it seems that -P 1 works much faster than using all the threads -P 16. This is not too surprising as openMP can be problematic. However, I noticed that even though I have specified to use one thread by -P 1, the top still shows that it uses all 16 cores, which is a bit baffling to me. I think it might be better to explain this in the doc as I cannot find it and the default of using all the threads doesn't seems to be the optimal solution.

ntampellini commented 2 years ago

I would love that too! I am having the same issue on a cluster, where "-P 1" is much faster than using more (GFN2-xTB, 102 atoms, xtb version 6.4.1).

kjelljorner commented 2 years ago

Might be related to https://github.com/grimme-lab/xtb/issues/401. Solution there was to switch to MKL

TyBalduf commented 1 year ago

I found that changing OMP_NUM_THREAD changed the time for calculations even if -Dopenmp=false was used when building xTB. It seems that even without -P num, xTB tries to use the environment variable and if it isn't set, defaults to the max available threads. There must something off with the logic for when it decides to use OpenMP.

awvwgk commented 1 year ago

That can be dependent on the backend used for the linear algebra. If xtb is compiled with -Dopenmp=false the xtb library is built without OpenMP pragmas and without ways to adjust the number of threads, e.g. in MKL (when used via mkl_rt). In this case the linear algebra backend can still parallelize.

TyBalduf commented 1 month ago

Just wanted to mention here that, as noted above, OMP_NUM_THREAD (or various hardware specific options like KMP_NUM_THREAD on Intel machines) will affect the parallelization even if -Dopenmp=false or you don't specify --parallel <N>, as it still affects the linear algebra backend.

Often times if OMP_NUM_THREAD (or some equivalent is not set), the default behavior of OpenMP will be to pull in all available threads on your machine. So if you don't want multithreading for whatever reason (e.g. you are running several xTB calculations simultaneously and don't want them in contention for multiple threads), you will want to ensure this environment variable is set even if you don't specify --parallel.