WGS sample error: `-G` switch to `-E` due to no enough data points.

wwylab / MuSE

Somatic point mutation caller

GNU General Public License v2.0

18 stars 6 forks source link

WGS sample error: `-G` switch to `-E` due to no enough data points. #3

Closed maotian06 closed 1 year ago

maotian06 commented 1 year ago

When I run the MuSE on one large normal/tumor paired WGS sample, it reported the following error:

Not enough data points for model fitting. Automatically switch to option -E.

Those WGS bam files are realigned and recalibrated and the size is ~400 GB. The *.txt output is around 12GB.

jiyunmaths commented 1 year ago

@maotian06 Thanks for reporting this issue. Based on my experience of working with PCAWG WGS data using MuSE, I have not encountered such a large file from MuSE call. My *MuSE.txt files from WGS data are usually ~1GB. Please can you first check if it is common in your dataset? Is this sample a hypermutator? Thanks.

maotian06 commented 1 year ago

@jiyunmaths It is a real sample we normally used to test the limit of our pipeline. Its size is relatively larger than average, but not an extreme case. And I believe those WGS samples will get even larger in the future. Not a hypermutator as well, sample quality was good.

Another thing I realized another thing is the running time, when I processed WXS samples using MuSE, it is really fast compared to other tools, but for WGS, it is much slower. I used -12 option to specify the thread. Shall I adjust that?

Thanks so much!

jiyunmaths commented 1 year ago

@maotian06 Based on our test, MuSE may take 5-6h for WGS data with threads=12. You can increase the number of threads to 28 or more, it may take 1-2h. But due to the large size of MuSE.txt, it may take a bit longer. Thanks. I also optimized the second step of MuSE, it is now faster. I will release it soon.