wwylab / MuSE

Somatic point mutation caller
GNU General Public License v2.0
18 stars 6 forks source link

MuSE is using too many threads #8

Closed skchronicles closed 1 year ago

skchronicles commented 1 year ago

Hello there,

I just wanted to start off by saying thank you for creating and maintaining MuSE. It is an awesome somatic caller!

Recently while running MuSE/2.0.1 (exact commit 105c5be01c43dbe8cc29803e2ab87f35f6f3589a), I noticed that it is using more threads than I have told it to use: image

For the job above, I ran MuSE with -n 32; however, it is using 41/42 threads. I am running MuSE on a cluster with other users, so its over-allocation impacts other people's jobs.

Here is a log file containing the commands that were run:

$ MuSE call \
        -n 32 \
        -f GRCh38p11.fasta \
        -O muse/somatic/WGS_NCI_T_S1 \
       BAM/WGS_NCI_T_S1.recal.bam BAM/WGS_NCI_N_S2.recal.bam 

$ MuSE sump \
        -n 32 \
        -G \
        -I muse/somatic/WGS_NCI_T_S1.MuSE.txt \
        -O muse/somatic/WGS_NCI_T_S1.muse.tmp.vcf \
        -D GATK_resource_bundle/hg38bundle/dbsnp_146.hg38.vcf.gz

If you could please take some time to look into this issue, that would be amazing! Thank you again for your time, and have a wonderful day.

Best Regards, @skchronicles

skchronicles commented 1 year ago

I was curious to see if setting -n 22 would temporarily fix the issue (decreasing the -n option's value by ten, since it was over-allocating by ten threads), and it appears to have worked. image

jiyunmaths commented 1 year ago

Hi @skchronicles, thanks for using MuSE! Sorry for the confusing. There are actually 9 more threads used than the number specified in the command (e.g., MuSE call -n 32) when run MuSE. The 9 threads are used for parsing BAM chunks, unzipping BAM chunks, merging reads from tumor and normal into the same queue and writing candidate SNVs into file etc. The 32 threads specified by the user are for reads processing and inferring candidate SNVs, which take the majority of CPU resources. From my experience, even though 41/42 threads are used, the CPU resource usage does not exceed 3200% (32 CPU cores) too much. Please go ahead and test it and let me know if much more CPU core resources are used. I will update the README and clear this confusion.

skchronicles commented 1 year ago

Hey @jiyunmaths,

Thank you for your prompt and detailed response!

Would it be possible to add some logic to prevent this from occurring in a future release (no rush)? This would prevent the user from needing to do anything on their side. Instead of allocating N threads for (e.g. 32) processing/inferring candidate SNVs, you could spawn N-9 threads (e.g. 32-9).

If not, that's totally okay. I will run MuSE with -n N-9 moving forward.

Thank you again for your time and help. I appreciate it!

Best Regards, @skchronicles

jiyunmaths commented 1 year ago

@skchronicles Thanks for suggestion! I will consider to update the flag in the future release.

skchronicles commented 1 year ago

Okay, that sounds good.

Have a great weekend, @skchronicles