Executorlib - Functional programming for performance in python
Executorlib - the minimalist’s solution to up-scale python functions for high performance computing
Arguments to use executorlib
Up-scale your python program beyond a single computer - Based on the concurrent futures Executors API from the python standard library in combination with job schedulers for high performance computing (HPC) like SLURM and flux, executorlib allows you to distribute your python functions over multiple compute nodes.
Parallelize your python code one function at a time - Executorlib allows you to assign dedicated resources like CPU cores, threads or GPUs to each python function so you can accelerate your python code one function at a time.
Permanent caching of intermediate results to accelerate rapid prototyping - To accelerate the development of machine learning pipelines and simulation workflows executorlib provides caching of intermediate results for iterative development in interactive environments like jupyter notebooks.
Table of content
Motivation - The aim is to support the user in up scaling their workflow from working locally to running on HPC
Compare to ThreadPool and ProcessPool - apply function
explain map and for-loop
Similar approaches in literature
Installation
Minimal installation - pyzmq and cloudpickle
Optional dependencies
Plot - pygraphviz, matplotlib, networks, ipython
Pysqa - also has optional dependencies
Flux
Flux core
Flux sched - gpu support
Openmpi vs mpich - Pmi
Cache - hdf
MPI - mpi4py
Installation from source
Local Testing
Parallel functions- Resource specification
MPI parallel functions - including MPI oversubscribe
Thread parallelism - Max workers or max cores
Combined
Dependencies - Plot graph
Performance optimization
Block allocation & Init function
Caching
disable dependency check
HPC Submission
Pysqa
Threads
MPI
GPU
Dependencies - explain file based dependencies - disable dependency check
Cache - Clear cache
HPC Allocation
Flux
Combined parallelism
Nested flux executors
Jupyter integration
SLURM
Combined parallelism
Additional SLURM Arguments
Nested - using flux inside slurm
Performance Optimization
Block allocation & Init function
Use temporary working directory
disable dependency check
Application
ASE - DFT - convergence test - classical file based submission - dependencies
SLURM is the de facto standard for job schedulers, so I should focus on explaining the setup with SLURM and only afterwards highlight the capabilities of flux.
Executorlib - Functional programming for performance in python
Executorlib - the minimalist’s solution to up-scale python functions for high performance computing Arguments to use executorlib
Table of content