grimme-lab / xtb-python

Python API for the extended tight binding program package
https://xtb-python.readthedocs.io
GNU Lesser General Public License v3.0
102 stars 30 forks source link

[BUG] - Default settings cause a 10x performance decrease relative to `OMP_NUM_THREADS=1` #115

Open coltonbh opened 4 months ago

coltonbh commented 4 months ago

Describe the bug

I understand this library is no longer maintained; however, I haven't tested this on tblite but wanted to add my results here for your reference.

When I run a "typical" calculation on a molecule with ~40-60 atoms or so the calculation is 10x slower than if I set OMP_NUM_THREADS=1. If I profile the code most of the time is spent spawning new process and not actually doing calcultions.

For such lightweight calculations I'd recommend threads over processes as the overhead of spawning new processes is actually higher than the calculation itself. Not sure if you've already updated the tblite implementation to use threads over processes, but I'd certainly recommend this ;P

The performance hit is impressive. If OpenMP is here to stay in the implementation I'd suggest an easy API for passing in the OMP_NUM_THREADS variable and probably set it to 1 by default. :car::dash: . The current implementation does not offer this possibility and since new processes are spawned outside of the python interpreter unfortunately xtb-python does not respect setting os.environ['OMP_NUM_THREADS': 1] so programatically controlling this important variable is rather challenging (still looking for a solution).

Thx!

coltonbh commented 4 months ago

Oops! Misspoke but leaving this here for context! I was thinking you were using OpenMPI and my thinking above was in the context of starting new MPI processes as ranks, but you are using OpenMP. Regardless the facts of the setup I shared are still the correct! For some reason setting OMP_NUM_THREADS=1 creates a 10x performance increase and xtb-python spends the majority of times spawning processes rather than doing calculations unless you set this variable. Not sure why process spawning is related to the number of threads, but it is!

Also, this variable gets set inside xtb somehow the minute any bit of the xtb library is imported. So one needs to set this with os.environ['OMP_NUM_THREADS'] = 1 before importing any of the xtb library. Then things work as expected.

Thanks again for your time working out the kinks in a very important package where speed matters! :P

coltonbh commented 4 months ago

More details. It appears to be spawning threads within threads. So if I set this variable to 16 (I have a 16 more machine) I actually get 47 (randomly?!) threads spawned on my machine. The resource contention kills the performance. If I set this variable to 1, I get 16 threads spawned. So there are threads spawning threads and this is causing the issue.

coltonbh commented 4 months ago

This fixes the perf issue, but of course isn't optimal and has to be used wherever you first import anything from the xtb library:

from contextlib import contextmanager
import os

@contextmanager
def set_env_variable(var_name, value):
    """Context manager to set an environment variable temporarily.

    Args:
        var_name: The name of the environment variable.
        value: The value to set the environment variable to.
    """
    original_value = os.environ.get(var_name)
    try:
        os.environ[var_name] = str(value)
        yield
    finally:
        if original_value:
            os.environ[var_name] = original_value
        else:
            del os.environ[var_name]

Then wherever you first import xtb:

with set_env_variable("OMP_NUM_THREADS", "1"):
    import xtb
    import xtb.interface
    import xtb.libxtb
    from xtb.utils import Solvent

@awvwgk these are really all issues with the underlying xtb library. Should I file issues there? Or is this all built anew with the tblite library?