Inspect and control the number of threads when BLAS is implemented by Apple Accelerate

joblib / threadpoolctl

Python helpers to limit the number of threads used in native libraries that handle their own internal threadpool (BLAS and OpenMP implementations)

BSD 3-Clause "New" or "Revised" License

336 stars 30 forks source link

Inspect and control the number of threads when BLAS is implemented by Apple Accelerate #136

Open ogrisel opened 1 year ago

ogrisel commented 1 year ago

Follow-up on #135.

I am not sure if it's possible or not.

ogrisel commented 1 year ago

I have not conducted an extensive evaluation yet, but it seems that we do not suffer from oversubscription problems when calling vecLib's GEMM under OpenMP threads (for instance, in scikit-learn's KMeans). So maybe there is some kind of automated mechanism in Grand Central Dispatch that prevents the usual oversubscription problem we observed with other threaded BLAS libraries.

ogrisel commented 7 months ago

At least we could detect that Accelerate is linked, even if we cannot inspect or set the number of threads.

ogrisel commented 7 months ago

Apparently it's possible to tell vecLib to not use all threads via an environment variable: VECLIB_MAXIMUM_THREADS.

EDIT: it does not seem to have much effect on numpy workloads (matmul & SVD) linked against Accelerate on a Mac M1 host.