markovmodel / PyEMMA

πŸš‚ Python API for Emma's Markov Model Algorithms πŸš‚
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

Use spawn() for multiprocessing instead of fork() #1565

Closed sents closed 2 years ago

sents commented 2 years ago

This PR changes the multiprocessing method used for the pool in estimate_param_scan.

The issue

The reason for this PR is the following issue: When using pyemma with n_jobs>1 and OpenBLAS I observed the worker processes getting stuck in lapack calls, idling forever. When using netlib blas or n_jobs=1 this problem did not occur.

This is likely due to OpenMP not supporting use in the main process AND forked processes, as explained here.

The fix

Setting the multiprocess start method to spawn for the pool fixes this issue.

This is a breaking change

Using spawn requires guarding the main process with

if __name__ == "__main__":
    #code 

For scripts this is best practice, but this PR would still break scripts which do not do this.

Common interactive contexts such as IPython or jupyter-notebook do this automatically and do not require an explicit if __name__ == "__main__": statement.

Checklist

codecov[bot] commented 2 years ago

Codecov Report

Merging #1565 (9c07fb5) into devel (109044a) will increase coverage by 0.01%. The diff coverage is 40.00%.

@@            Coverage Diff             @@
##            devel    #1565      +/-   ##
==========================================
+ Coverage   91.94%   91.95%   +0.01%     
==========================================
  Files         229      229              
  Lines       26642    26645       +3     
==========================================
+ Hits        24496    24502       +6     
+ Misses       2146     2143       -3     
Impacted Files Coverage Ξ”
pyemma/_base/estimator.py 81.25% <40.00%> (-0.24%) :arrow_down:
pyemma/coordinates/data/util/traj_info_backends.py 93.72% <0.00%> (+1.79%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update 109044a...9c07fb5. Read the comment docs.

clonker commented 2 years ago

Hi @sents, thank you very much for your contribution! I would agree that spawn is preferable here, especially since it causes the program to hang. However to limit breakage of other people's codes, perhaps this should only performed on linux platforms (you can check the platform.system() for that). MacOS and windows are - per your linked article and since we compile with clang and MSVC - not affected.

sents commented 2 years ago

I've looked into multiprocess.context to see what happens on the other platforms. The default context for sys.platform == "win32" is spawn already. For sys.platform == "darwin" the default context is still fork, although this is listed as a bug in the issue tracker.

As this is not in multiprocess yet I'll add check for only linux. The default in multiprocess will probably change to spawn for macOS at some point.

clonker commented 2 years ago

Thanks! I'll merge once CI gives green light.