Closed goplanid closed 1 month ago
@anton-malakhov
Hi @goplanid,
To guarantee parallelism in the inner loop, you could use TBB in the outer loop only. In the inner loop, you could launch numjobs
threads (e.g., with std::thread
) in myfunction_
, with each thread performing an InnerLoopTask
.
You can prevent oversubscription by throttling down the oneTBB concurrency (e.g., to hardware_concurrency
/ numjobs
).
@goplanid is this issue still relevant?
If anyone encounter this issue in the future please open new issue with a link to this one
Brief Description: I am trying out this OpenBLAS PR [https://github.com/OpenMathLib/OpenBLAS/pull/4577] with TBB. I first register a callback in my code to dynamically change the threading backend. Instead of creating its own threads, OpenBLAS passes the work to the registered callback. I use TBB for running gemm and again want to use TBB for executing the callback.
Issue: I am facing deadlock issue in OpenBLAS (multiple threads get stuck in inner_threads function in OpenBLAS). OpenBLAS apears to encounter deadlock when used with fewer threads than no of available threads.
Below is my test code and steps to reproduce it.
Run command: g++ -std=c++11 -o tbb_nested tbb_nested.cpp -ltbb -lpthread -I/home/openblas/include -L/home/openblas/lib -lopenblas -Wl,-rpath,/home/openblas/lib
Help needed: So as you can see here, I have below case of nested parallelism, outer loop: tbb::parallel_for(tbb::blocked_range(0, 2), MatrixMultiplicationTask(A,B,C));
inner loop: tbb::parallel_for(tbb::blocked_range(0, numjobs), innerLoopTask);
In the above code Level 1 runs for 2 iterations and each iteration of Level 1 runs numjobs no of iterations(as it is an inner loop). I have a dependency in my code such that innerLoopTask can only operate when exact no of numjobs threads are used. What is the best possible nested solution provided by TBB to solve this problem? Kindly advise.