scientific-python / summit-2024

1 stars 0 forks source link

SPEC: Improve parallel API uniformity and features across the ecosystem #13

Open stefanv opened 6 months ago

stefanv commented 6 months ago

See https://thomasjpfan.github.io/parallelism-python-libraries-design/

Across libraries, we should have standard mechanisms and naming for parallel concepts and features.

See also https://discuss.scientific-python.org/t/terminology-for-parameters-controlling-parallel-computation/1016/5

betatim commented 6 months ago

Updated the title to include "parallelism", it seemed a bit broad without it :D

thomasjpfan commented 5 months ago

Here are the overarching questions around a parallelism API:

  1. What should the keyword parameter be? workers, n_jobs, etc.
  2. Should it be added to every function call that does anything parallel? For example, np.linalg.matmul(..., workers=?)
  3. What to do with operators that are not function calls? A_array @ B_array is parallel. The solution is a context manager like threadpoolctl.
  4. Should libraries configure each other when it comes to parallelism? scikit-learn will prevent oversubscription with NumPy BLAS calls + joblib's n_jobs by using threadpoolctl.