marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
127 stars 25 forks source link

cpu//threads number :: avoid oversubscription #120

Closed EricDeveaud closed 1 year ago

EricDeveaud commented 2 years ago

Hello

Hello

parsnp uses multiprocessing.cpu_count() to ge the number of available cpus which returns the number of cpu in the machine. But this is not the same as the number of cpu available to the process. For example, you can run in a taskset context or a batch scheduler like slurm.

see:

$ nproc
96
$ taskset -c 1 nproc
1
$ taskset -c 1 python3 -c "import multiprocessing; print(multiprocessing.cpu_count())"
96

I would suggest to use len(os.sched_getaffinity(0)) instead of multiprocessing.cpu_count()

$ python3 -c "import os; print(len(os.sched_getaffinity(0)))"
96
$ taskset -c 1 python3 -c "import os; print(len(os.sched_getaffinity(0)))"
1

regards

Eric

EricDeveaud commented 2 years ago

ok this is just for a warning but it will be a nicer message

bkille commented 1 year ago

Thanks for the tip @EricDeveaud! I'll be sure to keep this in mind for the future as well.