mmuckley / torchkbnufft

A high-level, easy-to-deploy non-uniform Fast Fourier Transform in PyTorch.
https://torchkbnufft.readthedocs.io/
MIT License
204 stars 44 forks source link

Potential issue with shell-based thread management #36

Open mmuckley opened 2 years ago

mmuckley commented 2 years ago

I made a change in #27 that could be a potential issue, so I would like to document it here. Particularly, in this commit the following lines were commented out:

if USING_OMP and cpu_device:
    torch.set_num_threads(num_threads)

The reason for this was that in PyTorch 1.8 these lines led to a severe performance regression - i.e., at the Python level it seemed PyTorch wasn't handling switching the number of available threads very well. I removed the lines as the regression was too large.

The downside is that shell-based OMP thread management may be ignored within forks - OMP specifies the number of threads that can be spawned but does not keep a global limit, so if your global limit is 8 and you fork 8 times, each one of those forks could create 8 new threads and lead to oversubscription.

In general it's a niche issue that I hope doesn't affect most people. I haven't been able to figure out how to fix this - the answer may be to just go to C++ as mentioned in Issue #28 if anyone decides to tackle this.