open2c / cooltools

The tools for your .cool's
MIT License
133 stars 50 forks source link

cooltools use both multi-threading and multi-processing based parallelization and can take too many cores #182

Open sergpolly opened 4 years ago

sergpolly commented 4 years ago

message when running cooltools compute-expected

INFO:numexpr.utils:NumExpr defaulting to 8 threads.

--nproc does not seem to affect or control it ...

sergpolly commented 3 years ago

https://mitrocketscience.blogspot.com/2018/11/automatic-mulit-threading-with-python.html https://stackoverflow.com/questions/17053671/how-do-you-stop-numpy-from-multithreading https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy/31622299#31622299

sergpolly commented 2 years ago

still an issue ! something to do with the way numpy (and underlying math libraries) are installed

golobor commented 2 years ago

ok, it's a serious issue, but it comes in two different scenarios: -- mkl-based multithreading inside numpy -- numexpr-based multithreading inside pandas

In this specific case, we're dealing with the latter and this multi-threading can potentially be disabled: NumExpr: https://stackoverflow.com/questions/59445147/weird-bug-in-pandas-and-numpy-regarding-multithreading Numpy: https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy (+NumExpr) https://github.com/joblib/threadpoolctl

(a simple solution: set all environmental variables at the top of the init.py of the CLI https://github.com/open2c/cooltools/blob/master/cooltools/cli/__init__.py )

Another generalizable solution is to give users control over the number of processes and the number of threads per process, in each cooltool.

But, we do need to see if this multithreading is indeed causing any real issue.