Closed GoogleCodeExporter closed 8 years ago
I emailed the following to the BSMAP google code site:
----
I am using BSMAP for the first time, and I am wondering about the BAM flags
0x0040 and 0x0080. The way I understand the spec, these should be used to mean
paired-end read1 and paired-end read2, respectively. So in Illumina naming,
the "/1" should always go to 0x0040 and the "/2" should always go to 0x0080.
In BSMAP v1.15, it seems like this is not getting set correctly. It seems like
the "first" sorted read, which is forward strand relative to the genome
assembly, is always set to 0x0040, and the second sorted read is always 0x0080.
For instance, I get the following output (omitting sequences):
HWI-ST550_0085:7:1101:7291:57138#0/2 0x63 chr11 363691 255 105M
= 363705 119 NM:i:0 ZS:Z:--
HWI-ST550_0085:7:1101:7291:57138#0/1 0x93 chr11 363705 255 105M
= 363691 119 NM:i:0 ZS:Z:-+
HWI-ST550_0085:7:1101:11239:54670#0/1 0x63 chr11 368130 255 105M
= 368362 337 NM:i:1 ZS:Z:++
HWI-ST550_0085:7:1101:11239:54670#0/2 0x93 chr11 368362 255 105M
= 368130 337 NM:i:0 ZS:Z:+-
It seems to me that the correct flags for these, in order, should be
0xA3,0x53,0x63,0x93
Do you agree? For some applications, including bisulfite, it's important to be
able to distinguish the two ends.
Thanks very much,
Ben Berman, USC.
----
Original comment by benb...@gmail.com
on 19 Apr 2011 at 1:36
This issue is what was throwing exceptions in picard. I turned off all
validation to get most things running, but some tools that have specific PE
"modes" seem to fail.
Picards complains:
Exception in thread "main" java.lang.RuntimeException: SAM validation error:
WARNING: Record 19, Read name HWI-ST550_0085:7:1101:6030:54164#0/1, Paired read
should be marked as first of pair or second of pair.
Original comment by zack...@gmail.com
on 19 Apr 2011 at 7:24
This may be the case for some of them, but I went back to my original bsmap sam
file, and it does set the 0x0040 or 0x0080 flags for the correctly paired
alignments. But it seems to be setting them wrong. I'm not sure what the
original flags were for one you show above -- would be good to look it up.
Original comment by benb...@gmail.com
on 19 Apr 2011 at 1:31
The one you show above indeed doesn't have this flag, but this is the minority.
They are ones that are mapped but not mapped in a correct pair. I believe
this is wrong as well (0x0040 and 0x0080 flags should still be used regardless
of alignment).
samtools view -x ResultCount_B031CABXX_7_NIC1254A2.bam | grep
0085:7:1101:6030:54164
sHWI-ST550_0085:7:1101:6030:54164#0/2 0x21
chr3 150025300 255 99M = 150025397 0 CCTCATCTTTTTCATCTTCTTCATCTTCCTCTTCCTTCTTTT
TCTTACTTTTTTCAAACTTAATAACCCCTTTTTTTACTACATCAAACTTTCCTTAAC ggggggggfgggffggfgfged
dfgffeed_ddcdfgggfggggfedee\LLMML^da\_cddcf_eececdee_bb]ee^ZaXd\\cc_ddada[b_c NM
:i:7 ZS:Z:++
HWI-ST550_0085:7:1101:6030:54164#0/1 0x11
chr3 150025397 255 105M = 150025300 0 ACCCAAAATACAACAATATCCTTTTCATATTATTCTTTCAA
CTTCACGACCTTCTTTTCGTAAATCTACTTATCATCTACAACAATATTATTTCCTATCTCTCCA e\aVeffdefbffaf
`]`ZN`ddd_cbc`cad`Nd_eadfbecdadeedfdc_ecee_feefffefe^ffdfade]fdffdffffbfdffcfcef
bccc^b_bbb NM:i:0 ZS:Z:-+
Original comment by benb...@gmail.com
on 19 Apr 2011 at 1:40
---------- Forwarded message ----------
From: Ben Berman <benbfly@gmail.com>
Date: Tue, Apr 19, 2011 at 8:43 AM
Subject: Re: bsmap - Google Groups: Message Pending
[{IISIh5fX69_5NioCeWIwAUjO-5bdPoNB0}]
To: Yuanxin Xi <yxi@bcm.edu>
Cc: bsmap <bsmap+msgappr@googlegroups.com>, "Li, Wei" <wl1@bcm.tmc.edu>, zack
Ramjan <ramjan@usc.edu>
Thanks for the quick reply Yuanxin. I know the spec is not very clear, but
after looking back through some of the SAMTOOLS discussion board and code, I'm
pretty sure what I said is correct (the 0x40 should correspond to the /1 read
in Illumina, and 0x80 to the /2 read). It doesn't seem like there would be
much use for flags that only tell you which is first in mapping coordinates
(it's redundant because you could just look at the coordinates).
Also, in some cases where there is a bad alignment, I noticed that BSMAP
doesn't set either 0x0040 or 0x0080, and Picard will complain about this:
Exception in thread "main" java.lang.RuntimeException: SAM validation error:
WARNING: Record 19, Read name HWI-ST550_0085:7:1101:6030:54164#0/1, Paired read
should be marked as first of pair or second of pair.
On Tue, Apr 19, 2011 at 8:16 AM, Yuanxin Xi <yxi@bcm.edu> wrote:
Hi Ben,
According to samtools spec 1.4, flag 0x40 refers to the first segment of the
template and 0x80 refers to the last segment of the template. So my
understanding is that it depends on the mapping coordinates, instead of which
read set in pair-end seq. For properly paired reads, BSMAP set the read with
lower genomic coordinate with 0x40 and the other with 0x80.
The first two reads in your example maps to the '-+' and '--' strands, which is
the bisulfite converted Crick strand and its reverse complementary strand
respectively. Although the '-+' read comes from forward strand, its genomic
location is behind the '--' read, so '--' read is set 0x40. One difference
between bisulfite sequencing and normal sequencing is that the bisulfite
converted crick strand no longer reverse-complements the bisulfite Watson
strand, so there're actually 4 distinct reference strands.
Best regards,
Yuanxin
Original comment by benb...@gmail.com
on 19 Apr 2011 at 3:43
In the meantime, make correction script to fix BSMAP SAM output.
Original comment by zack...@gmail.com
on 19 Apr 2011 at 8:45
I made a GATK script to correct BSMAP SAM output. I tested it on a small test
file, Zack you may have to test it on a large file. Location:
http://uecgatk.svn.sourceforge.net/viewvc/uecgatk/trunk/src/edu/usc/epigenome/ue
cgatk/benWalkers/BsmapPairedEndFixerWalker.java?view=markup
Original comment by benb...@gmail.com
on 20 Apr 2011 at 11:08
Original comment by benb...@gmail.com
on 21 Apr 2011 at 1:04
Original comment by benb...@gmail.com
on 21 Apr 2011 at 1:04
--------- Forwarded message ----------
From: Yuanxin Xi <yxi@bcm.edu>
Date: Thu, Apr 21, 2011 at 8:04 AM
Subject: Re: bsmap - Google Groups: Message Pending
[{IISIh5fX69_5NioCeWIwAUjO-5bdPoNB0}]
To: Ben Berman <benbfly@gmail.com>
Cc: bsmap <bsmap+msgappr@googlegroups.com>, "Li, Wei" <wl1@bcm.edu>, zack
Ramjan <ramjan@usc.edu>
HI Ben,
I confirmed that you are right about the 0x40/0x80 flag. I've made the change
in the latest BSMAP release v1.25. Thanks for pointing this out.
Best,
Yuanxin
Original comment by benb...@gmail.com
on 21 Apr 2011 at 4:01
bsmap 1.25 deployed and in use at hpcc.
are we ready to close this ticket?
Original comment by zack...@gmail.com
on 5 May 2011 at 12:34
I confirmed that the 0x0040 and 0x0080 flags were set correctly in
ResultCount_B03BUABXX_8_NIC1252A27.bam. So that means it's working?
Original comment by benb...@gmail.com
on 5 May 2011 at 12:41
the flags are being set correctly. picard and gatk are handling inputs
correctly.
Original comment by zack...@gmail.com
on 12 Jul 2011 at 6:35
Original issue reported on code.google.com by
benb...@gmail.com
on 19 Apr 2011 at 1:25