Closed maotian06 closed 1 year ago
@maotian06 Thanks for reporting this issue. Based on my experience of working with PCAWG WGS data using MuSE, I have not encountered such a large file from MuSE call. My *MuSE.txt files from WGS data are usually ~1GB. Please can you first check if it is common in your dataset? Is this sample a hypermutator? Thanks.
@jiyunmaths It is a real sample we normally used to test the limit of our pipeline. Its size is relatively larger than average, but not an extreme case. And I believe those WGS samples will get even larger in the future. Not a hypermutator as well, sample quality was good.
Another thing I realized another thing is the running time, when I processed WXS samples using MuSE, it is really fast compared to other tools, but for WGS, it is much slower. I used -12
option to specify the thread. Shall I adjust that?
Thanks so much!
@maotian06 Based on our test, MuSE may take 5-6h for WGS data with threads=12. You can increase the number of threads to 28 or more, it may take 1-2h. But due to the large size of MuSE.txt, it may take a bit longer. Thanks. I also optimized the second step of MuSE, it is now faster. I will release it soon.
When I run the MuSE on one large normal/tumor paired WGS sample, it reported the following error:
Those WGS bam files are realigned and recalibrated and the size is ~400 GB. The
*.txt
output is around 12GB.