pyMBE-dev / pyMBE

pyMBE provides tools to facilitate building up molecules with complex architectures in the Molecular Dynamics software ESPResSo. For an up-to-date API documention please check our website:
https://pymbe-dev.github.io/pyMBE/pyMBE.html
GNU General Public License v3.0
6 stars 8 forks source link

Leverage multiprocessing in sample scripts #64

Closed jngrad closed 3 weeks ago

jngrad commented 4 months ago

Several sample scripts and functional tests spend most of their runtime in a hot loop that iterates over a pH range and starts independent simulations, typically via a subprocess. These are ideal candidates for parallelization, since the intermediate simulations do not share information with one another and can be executed in any order.

For illustration, consider the dialysis test and its modified version in 166baf4779776dd1cf1928dff233510c13fc7fd2: the original runtime is 10 min, which goes down to 5 min 30s with 2 threads or 3 min 30s with 4 threads. The runtime is not always perfectly divided by the number of threads, because some pH values require longer sampling. GitHub Actions have 2 CPU cores, but since we use a Makefile to schedule Python tests, we can only run one test at a time. Running the long tests with 2 cores could help shave off 10 to 15 min in the biweekly CI.

Adapting existing scripts to leverage multiprocessing is usually only a matter of moving the body of a for loop into a function that is passed to a multiprocessing pool. There are a few caveats:

Users could choose the number of threads with an argparse argument like --threads=8 and a sensible default value like 2 or 4. A larger value like 6 wouldn't really be sound, because on a 4-core machine with hyperthreading enabled (i.e. 8 logical cores), the last 2 threads would land on a hyperthread and compete with the 4 other threads, resulting in a marginal speed-up compared to using just 4 threads. I would personally recommend 4 as the default value[^ncores-default], but I'm open to other suggestions.

[^ncores-default]: I couldn't find market share analysis that were detailed enough to allow me to estimate the median number of physical cores in consumer-grade CPUs in 2024. Nevertheless, Intel and AMD made 4-core CPUs available for desktop computers in 2008, and it's probably reasonable to assume that most desktop computers built after 2020 have 4 or more physical cores. This guess includes recent architectures like Intel i7 12700K and Apple M1, which have a mix of performance cores (high power, high frequency, for gaming and video decoding) and energy-efficient cores (low power, low frequency, for background applications).

jngrad commented 3 weeks ago

Fixed by #87.