pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.47k stars 1.97k forks source link

Add blas_cores argument to pm.sample #7318

Closed aseyboldt closed 1 month ago

aseyboldt commented 1 month ago

Description

We currently do not configure blas in any way. This can lead to very bad behavior if we sample in several threads: Many blas implementations default to using one worker thread per hardware thread in the machine. But if we sample in parallel with multiprocessing, each chain will use an independent thread pool, so we end up starting chains*hardware_chains worker threads. Combined with some spinnlocking that some blas implementations seem to do, this can lead to terrible performance.

This PR adds a blas_cores argument to pm.sample(), and then uses threadpoolctl to control how many worker threads we start.

If it is set to None, we don't do anything, and keep the current behavior of just using whatever the blas implementation uses as default. If set to auto (the default) use the cores argument to guess a decent number of blas worker threads. If it is set to an integer, we use that number of total blas worker.

See for instance here for a model that shows bad behavior without this PR.

Type of change


📚 Documentation preview 📚: https://pymc--7318.org.readthedocs.build/en/7318/

ricardoV94 commented 1 month ago

pre commit failed