We currently do not configure blas in any way. This can lead to very bad behavior if we sample in several threads:
Many blas implementations default to using one worker thread per hardware thread in the machine. But if we sample in parallel with multiprocessing, each chain will use an independent thread pool, so we end up starting chains*hardware_chains worker threads. Combined with some spinnlocking that some blas implementations seem to do, this can lead to terrible performance.
This PR adds a blas_cores argument to pm.sample(), and then uses threadpoolctl to control how many worker threads we start.
If it is set to None, we don't do anything, and keep the current behavior of just using whatever the blas implementation uses as default. If set to auto (the default) use the cores argument to guess a decent number of blas worker threads. If it is set to an integer, we use that number of total blas worker.
See for instance here for a model that shows bad behavior without this PR.
Description
We currently do not configure blas in any way. This can lead to very bad behavior if we sample in several threads: Many blas implementations default to using one worker thread per hardware thread in the machine. But if we sample in parallel with multiprocessing, each chain will use an independent thread pool, so we end up starting
chains*hardware_chains
worker threads. Combined with some spinnlocking that some blas implementations seem to do, this can lead to terrible performance.This PR adds a
blas_cores
argument topm.sample()
, and then usesthreadpoolctl
to control how many worker threads we start.If it is set to
None
, we don't do anything, and keep the current behavior of just using whatever the blas implementation uses as default. If set toauto
(the default) use the cores argument to guess a decent number of blas worker threads. If it is set to an integer, we use that number of total blas worker.See for instance here for a model that shows bad behavior without this PR.
Type of change
📚 Documentation preview 📚: https://pymc--7318.org.readthedocs.build/en/7318/