Open uellue opened 1 year ago
@uellue This is really useful information and a great help!
I've been suspicious of something like this happening but have never gotten around to determining why that is the case. I would imagine that dask
would be very interested in this as well. Is this a problem with dask-distributed
as well? I think I usually get fairly good performance with 2-4 threads per process using the distributed
backend but the scheduling seems quite a bit slower than I feel it should be.
Yes, we had the same issue with dask-distributed
. It is not so apparent on small machines, but a big machine will come to a crawling halt. I'm not sure if it will happen with native Dask array operations. To be tested! I'll open an issue in Dask for discussion.
@uellue Sounds like some better context managers is in order for hyperspy/ pyxem. Thanks for bringing this up!
By the way I am planning on making a couple of changes to the orientation mapping code in the next week or 2. Mostly to simplify the method and let it use dask-distributed so it can use multi gpus. Are there any changes you might be interested in seeing?
When matching a larger dataset following example 11 but using lazy arrays, running this as the very first cell before any numerical library is loaded speeds up the calculation on a machine with with 24 cores:
When @sk1p profiled the system under load without this setting, it spent most of it's time in
sched_yield
instead of doing useful work. With this setting enabled (no OpenMP multithreading) it was mostly doing useful work. I didn't benchmark the difference because I ran out of patience, but it is about a factor 10.Some routines in SciPy and NumPy are multi-threaded internally, for example OpenBLAS. It seems that Dask's/pyxem's parallelism in combination with OpenMP/OpenBLAS threading leads to oversubscription of the CPU or some other kind of scheduling issues. Restricting to only on one level of parallelism resolves this issue.
FYI we encountered a similar issue in LiberTEM. In order to avoid setting the environment variable and disabling threading altogether, we implemented a few context managers to set the thread count to 1 in code blocks that run in parallel: https://github.com/LiberTEM/LiberTEM/blob/master/src/libertem/common/threading.py
Maybe that can be useful in HyprSpy/pyxem? Perhaps this should actually be handled in Dask.