rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

cpu count vs available cpu :: potential overcommit #56

Closed EricDeveaud closed 1 year ago

EricDeveaud commented 1 year ago

Trycycler uses multiprocessing.cpu_count() to get the number of cpus via trycycler/misc.py. multiprocessing.cpu_count() return the number of cpu in the machine, But this is not the same as the number of available cpu to the process. For example, you can run in a taskset context or a batch scheduler like slurm.

see:

$ nproc
96
$ taskset -c 1 nproc
1
$ taskset -c 1 python3 -c "import multiprocessing; print(multiprocessing.cpu_count())"
96

I would suggest to use len(os.sched_getaffinity(0)) instead of multiprocessing.cpu_count()

$ python3 -c "import os; print(len(os.sched_getaffinity(0)))"
96
$ taskset -c 1 python3 -c "import os; print(len(os.sched_getaffinity(0)))"
1

NB Mac OSX python does not have os.sched_getaffinity so a portable way to code it would be

try:
num_cpus = len(os.sched_getaffinity(0))
except AttributeError:
num_cpus = multiprocessing.cpu_count()

regards

Eric

rrwick commented 1 year ago

Thanks! I've made this change in c084ad9. Should be in the next release (coming soon).