jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
227 stars 31 forks source link

Default number of threads #124

Closed jbd closed 2 years ago

jbd commented 2 years ago

Hello,

when -j is not specified, the default number of threads is multiprocessing.cpu_count() which is the number of cpu in the machine. But this is not the same as the number of cpu available to the process. For example, you can run virsorter using taskset or a batch scheduler like slurm.

$ taskset -c 1 nproc
1
$ taskset -c 1 python3 -c "import multiprocessing; print(multiprocessing.cpu_count())"
96

If I run virtsorter on a 96 cores machines in a single core slurm allocation, 96 threads will be spawned and fight for a single core. A solution would be to use os.sched_getaffinity instead of multiprocessing.cpu_count on platform supporting it:

$ python3 -c "import os; print(len(os.sched_getaffinity(0)))"
96
$ taskset -c 1 python3 -c "import os; print(len(os.sched_getaffinity(0)))"
1

What do you think ?

jiarong commented 2 years ago

Good suggestion, thanks!