wwylab / MuSE

Somatic point mutation caller
GNU General Public License v2.0
18 stars 6 forks source link

MuSE is unable to run on a BAM in hg38 but is able to run on the same BAM in hs37d5 #17

Open Mathieu179 opened 5 months ago

Mathieu179 commented 5 months ago

I tried running MuSE (V1 and V2) on sample aligned against hg38 but it didn't work. the same sample is successfully analyzed by MuSE when it is aligned against hs37d5. I use GATK's best practices to pre-process my sample.

MuSE (V2) doesn't return an error message (hg38), but remains stuck on a position again and again. MuSE (V1) return error message (hg38): segmentation fault.

More information below:

Command:

ref=/hg38-gatk/Homo_sapiens_assembly38.fasta

/bin/MuSE-2 call -f $ref "${pathin}${cancer}-tumour.bam" "${pathin}${cancer}-normal.bam" -n ${threads} -O ${pathout}${cancer}

/bin/MuSE-2 sump -I "${pathout}${cancer}.MuSE.txt" -G -O "${pathout}${cancer}.vcf" -D /hg38-gatk/Homo_sapiens_assembly38.dbsnp138.vcf.gz

Input:

tumour source : https://dcc.icgc.org/repositories/files/FI8519 tumour ID: PCAWG.e22e63de-c436-43c0-a595-022622c1fe06.bam normal source: https://dcc.icgc.org/repositories/files/FI8518 normal ID : PCAWG.53420f15-0856-4359-957d-3295ef631f6a.bam

PS: is there a container with MuSE installed on it (I can't build one)

jiyunmaths commented 5 months ago

@Mathieu179 Thanks for reporting the issue. We will address it as soon as possible. The BAM files are really helpful for us to debug. By the way, this is a docker image of MuSE2 in DockerHub: https://hub.docker.com/r/jiyunmathbham/muse2. Please try it.

jiyunmaths commented 5 months ago

Hi @Mathieu179 , we downloaded the bam files from ICGC. We need to remap the data to hg38, which may take a while. Can you figure out on which chromosome you had the error? If it is possible for you to extract the reads from tumor and normal for that chromosome only and run MuSE on it to see if you still have the error? If so, we can share the bam files for this chromosome for us to debug? Thank you.

Mathieu179 commented 4 months ago

sorry for the wait. When n = 10, I have this log:

[14:51:53] chr1:-1 BamRead 0 processQSize 0 writeQSize 0 readPool 0 [14:51:54] chr1:883000 BamRead 0 processQSize 31 writeQSize 307 readPool 0 [14:51:55] chr1:2588000 BamRead 37 processQSize 26 writeQSize 50 readPool 186634 [14:51:56] chr1:2650000 BamRead 2 processQSize 400 writeQSize 805 readPool 0 [14:51:57] chr1:2650000 BamRead 7 processQSize 739 writeQSize 1883 readPool 0 [14:51:58] chr1:2652000 BamRead 2 processQSize 963 writeQSize 3053 readPool 0 [14:51:59] chr1:2652000 BamRead 1 processQSize 1107 writeQSize 4187 readPool 0 [14:52:00] chr1:2652000 BamRead 1 processQSize 1134 writeQSize 5357 readPool 0 [14:52:01] chr1:2655000 BamRead 0 processQSize 1100 writeQSize 6497 readPool 0 [14:52:02] chr1:2655000 BamRead 2 processQSize 852 writeQSize 7603 readPool 0 [14:52:03] chr1:2655000 BamRead 1 processQSize 582 writeQSize 8713 readPool 0 [14:52:04] chr1:2656000 BamRead 1 processQSize 271 writeQSize 9763 readPool 0 [14:52:05] chr1:2656000 BamRead 200000 processQSize 0 writeQSize 10000 readPool 0 [14:52:06] chr1:2656000 BamRead 200000 processQSize 0 writeQSize 10000 readPool 0 [14:52:07] chr1:2656000 BamRead 200000 processQSize 0 writeQSize 10000 readPool 0 [14:52:08] chr1:2656000 BamRead 200000 processQSize 0 writeQSize 10000 readPool 0 [14:52:09] chr1:2656000 BamRead 200000 processQSize 0 writeQSize 10000 readPool 0

...

[BamRead 200000 processQSize 0 writeQSize 10000 readPool 0] * infinite.

I will try to extract the reads from tumor and normal for that chromosome and run MuSE .

jiyunmaths commented 1 month ago

Hi @Mathieu179 , can I confirm with you - did you run MuSE 2 on this sample multiple times and did the issue always occur? Your answer and the QNAMES (READ IDs) will help me to fix this issue. Thank you.