Open coltonbh opened 4 months ago
Oops! Misspoke but leaving this here for context! I was thinking you were using OpenMPI
and my thinking above was in the context of starting new MPI processes as ranks, but you are using OpenMP
. Regardless the facts of the setup I shared are still the correct! For some reason setting OMP_NUM_THREADS=1
creates a 10x performance increase and xtb-python
spends the majority of times spawning processes rather than doing calculations unless you set this variable. Not sure why process spawning is related to the number of threads, but it is!
Also, this variable gets set inside xtb somehow the minute any bit of the xtb library is imported. So one needs to set this with os.environ['OMP_NUM_THREADS'] = 1
before importing any of the xtb library. Then things work as expected.
Thanks again for your time working out the kinks in a very important package where speed matters! :P
More details. It appears to be spawning threads within threads. So if I set this variable to 16 (I have a 16 more machine) I actually get 47 (randomly?!) threads spawned on my machine. The resource contention kills the performance. If I set this variable to 1, I get 16 threads spawned. So there are threads spawning threads and this is causing the issue.
This fixes the perf issue, but of course isn't optimal and has to be used wherever you first import anything from the xtb library:
from contextlib import contextmanager
import os
@contextmanager
def set_env_variable(var_name, value):
"""Context manager to set an environment variable temporarily.
Args:
var_name: The name of the environment variable.
value: The value to set the environment variable to.
"""
original_value = os.environ.get(var_name)
try:
os.environ[var_name] = str(value)
yield
finally:
if original_value:
os.environ[var_name] = original_value
else:
del os.environ[var_name]
Then wherever you first import xtb:
with set_env_variable("OMP_NUM_THREADS", "1"):
import xtb
import xtb.interface
import xtb.libxtb
from xtb.utils import Solvent
@awvwgk these are really all issues with the underlying xtb
library. Should I file issues there? Or is this all built anew with the tblite
library?
Describe the bug
I understand this library is no longer maintained; however, I haven't tested this on tblite but wanted to add my results here for your reference.
When I run a "typical" calculation on a molecule with ~40-60 atoms or so the calculation is 10x slower than if I set
OMP_NUM_THREADS=1
. If I profile the code most of the time is spent spawning new process and not actually doing calcultions.For such lightweight calculations I'd recommend threads over processes as the overhead of spawning new processes is actually higher than the calculation itself. Not sure if you've already updated the
tblite
implementation to use threads over processes, but I'd certainly recommend this ;PThe performance hit is impressive. If OpenMP is here to stay in the implementation I'd suggest an easy API for passing in the
OMP_NUM_THREADS
variable and probably set it to 1 by default. :car::dash: . The current implementation does not offer this possibility and since new processes are spawned outside of the python interpreter unfortunatelyxtb-python
does not respect settingos.environ['OMP_NUM_THREADS': 1]
so programatically controlling this important variable is rather challenging (still looking for a solution).Thx!