Open hisplan opened 4 years ago
I don't know if this was resolved at any point, but I was looking at a similar question on my own.
I think this is a little bug on the diff tool on the multi-mapped reads as you guessed. Looking at the code https://github.com/statgen/bamUtil/blob/017721cc07948558395e4934ec10d0f91407c5eb/src/Diff.cpp#L577 you get that the "read from file 1" is tagged with the cigar flag from the mismatching "read from file 2" in the output (plus other annotations from the remaining columns).
i.e. following your example c.bam
is telling you that the read with cigar 46M1I44M
from a.bam
is mismatched to read with cigar 42M1I48M
from b.bam
. However, after inspecting a.bam
and b.bam
you can see that both files have the pair of reads with the 42M1I48M
and 46M1I44M
cigars.
If c.bam
is small enough, you could brutishly grep
on the extracted sam
files to clean it out.
e.g. First, cut the left most 15 columns in c.sam
and grep
it on b.sam
$ samtools view b.bam > b.sam
$ samtools view c.bam | cut -d $'\t' -f1-15 > c-query.sam
$ grep -F -v -f c-query.sam b.sam > c.clean.sam
Hi,
I have two BAM files that I'd like to compare. Each is about 5.6GB. I expect them to be identical (I'm sort of doing a reproducibility test).
When I ran with the following command:
It generated three files:
I tried to see what actually differs between the two, but I think they look identical. My suspicion is maybe something to do with the muti-mapped reads. Do you have any idea how to resolve this?