igmk / pamtra

Passive and Active Microwave TRAnsfer model
GNU General Public License v3.0
19 stars 16 forks source link

BLAS+LAPACK thread safety #24

Open DaveOri opened 4 years ago

DaveOri commented 4 years ago

It happens to me once per year and every time I have to dig into it and find the very same problem: "When I use pamtra in passive simulations and parallel mode the performances are quite horrible".

The issue is caused by the installation on the system of a BLAS version which is not thread-safe. In my case (Ubuntu 16.04) it is the libopenblas-base. This is a high-performance implementation that enables multithreading. If this version of BLAS is used at runtime the parallel jobs compete for resources (each one of them wants to exploit multi-threading) and the performances go down with the number of parallel jobs.

DIAGNOSIS:

QUICK SOLUTION:

CAVEATS:

ACTIONS TO CONSIDER:

mariomech commented 4 years ago

I can confirm this and add up for Ubuntu 20.04 installations. There, the reason for slowing down parallel and non-parallel simulations with cpu showing more then 100 % are libopenblas0-pthread and libopenblas0. Uninstalling them brings pamtr back to normal behaviour.

maahn commented 4 years ago

I guess using the openblas multithreading still makes sense if some one uses only a single Pamtra process. Why don't we tell openblas to not use multithreading when we use pyPamtra's parallel feature? E.g. we could set the environment variable from Python? See https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

mariomech commented 4 years ago

Max is right, setting export OPENBLAS_NUM_THREADS=1 before calling python fixed it.

DaveOri commented 4 years ago

I agree that there are limited case scenarios where this behavior might be even useful for pamtra users as Max suggested. But I also think that a user with such particular needs is expert enough to solve it if the issue is mentioned in the documentation.

The best could be setting the environment variables at runtime if possible (again Max suggestion). Like one should be able to call runParallelPamtra() with an argument multithreading=True which enables multithreading. If left false, by default, prior to execution pamtra sets some environment variables with os.environ['OPENBLAS_NUM_THREADS'] = 1

If this works, I think we should mention the problem in the documentation in any case. Like a troubleshooting section. That's because the problem can reappear if someone installs another performance BLAS library which will have a different envvar to set https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms Maybe we can even make pamtra accepting a dictionary of envvars to set when creating the pyPamtra object. Like that it will be much easier for users and developers to test workarounds of similar problems

MeraX commented 3 years ago

I just stumbled into the same problem, when using runPamtra() from the python3 branch in python3 on my Ubuntu 20.04 office PC. export OPENBLAS_NUM_THREADS=1 again fixed the situation and I would support setting this or any similar envvar in runPamtra and runParallelPamtra

MeraX commented 3 years ago

Besides setting the envvar, I could also solve the issue by installing libopenblas-serial-dev libopenblas0-serial. However, I don't like to depend on such subtle differences in the environment.

Also, I did a bit of debugging and found that the code is hanging at CALL DGETRF( M, N, MATRIX2, LDA, IPIV, INFO ) https://github.com/igmk/pamtra/blob/master/src/radmat.f90#L220 in my case. In my case, it is a LU-decomposition of a 32x32 matrix, so nothing very demanding. I investigated further and it seems that certain BLAS or LAPACK routines in openblas have or had issues when dealing with small matrices in multi-threading mode.

Therefore, I would suggest that we solve this issue just with respect to openblas. DaveOri, it's true, the problem could reappear, when someone uses a different BLAS, but this BLAS could have other untested issues as well. To handle this, it could be worthwhile to note in the docs, that we suggest using PAMTRA with openblas.

For me, the remaining question is: Shall we rather set os.environ['OPENBLAS_NUM_THREADS'] = 1in python or Call openblas_set_num_threads(1) from the Fortran code. (The latter should be put in an #ifdef condition).

MeraX commented 3 years ago

I did some tests and setting os.environ['OPENBLAS_NUM_THREADS'] = "1" in runPamtra is already to late. it has to be set before importing from libWrapper. However, I would expect that openblas_set_num_threads(1) should work during runtime.

In Principle I would love to have such code structure in runPamtra to keep modifications to the environment as small as possibe. But now, such solution seems impossible.

old_OPENBLAS_NUM_THREADS = os.environ.get('OPENBLAS_NUM_THREADS', None)
os.environ['OPENBLAS_NUM_THREADS'] = "1" # avoid deadlock with parallel version of openBLAS library

try:
    # <regular runPamtra code>

finally:
    if old_OPENBLAS_NUM_THREADS is None:
        del os.environ['OPENBLAS_NUM_THREADS']
    else:
        os.environ['OPENBLAS_NUM_THREADS'] = old_OPENBLAS_NUM_THREADS
DaveOri commented 3 weeks ago

Just want to mention that this problem is mentioned by other libraries as well, and they solved switching to a different backend for parallel multiprocessing in python https://joblib.readthedocs.io/en/latest/parallel.html