Closed DepledgeLab closed 6 years ago
As an update, I tried mapping the reads with GMAP and filtering on their AS:A:+ / AS:A:- flags and ended up with the same plot. I think there is something amiss with the actual PacBio data and I will contact the authors to find out.
There are three types of strands. I am not sure what you are looking for.
When you use gmap and force the alignment to the sense strand, 1 and 2 are always identical. When you apply -uf
to minimap2, ts
is always +
.
BTW, see also #88
What I am trying to do is determine from which strand the original RNA originates (i.e. stranded RNA-Seq). This is simple with Illumina and nanopore as I just filter on the SAM flag field (shown in the upper and lower panels of the figure above). However, filtering on SAM flag for PacBio IsoSeq reads (aligned with minimap2) does not result in the same figure (middle panel), nor does filtering for XS tag when using gmap.
Not all iso-seq reads come from the transcript strand. For non-full-length reads, you should not use -uf
. After alignment, you need to jointly look at the ts
tag and the SAM flag field: if you see ts:A:+, the transcript strand is the alignment strand from the SAM flag field; if ts:A:-, the transcript strand is the opposite of the alignment strand. Similarly with gmap, you should not apply option -z
.
I see what you mean but the issue i'm having is that the ts tag does not appear in the SAM output if I remove the -uf option
e.g. if i run the command as
minimap2-2.7_x64-linux/minimap2 -ax splice --secondary=no reference_sequence.fasta pacbio.fastq > output.sam
the only tags in the output file are NM:i, ms:i, AS:i, nn:i, tp:A:P, cm:i, s1:i, s2:i:0, dv:f
On Thu, Jan 18, 2018 at 1:49 PM, Heng Li notifications@github.com wrote:
Not all iso-seq reads come from the transcript strand. For non-full-length reads, you should not use -uf. After alignment, you need to jointly look at the ts tag and the SAM flag field: if you see ts:A:+, the transcript strand is the alignment strand from the SAM flag field; if ts:A:-, the transcript strand is the opposite of the alignment strand. Similarly with gmap, you should not apply option -z.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lh3/minimap2/issues/95#issuecomment-358744045, or mute the thread https://github.com/notifications/unsubscribe-auth/Ae_cisbGtdmnhwmsww4pSKcgHrAcbp1Uks5tL5IpgaJpZM4RiHFz .
You only see ts
when the read is spliced and there is at least one canonical junction (i.e. GT-AG); otherwise it is not possible to infer the strand. Gmap and other RNA-seq mappers work the same way.
I'm trying to map some PacBio IsoSeq data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE97785) to compare versus my own data generated by Illumina and other platforms.
minimap2-2.7_x64-linux/minimap2 -ax splice -uf --secondary=no ../../../reference_sequences/HSV1-st17.fasta pacbio.fastq > hsv1.pacbio.minimap2.sam
The issue i'm having is that the resulting mapping are not reflective of read data after I sort for 0 and 16 flags which otherwise works on Illumina and other data types.
The attached plot shows nanopore, pacbio and illumina data and shows the issue clearly.
I'd appreciate any thoughts you have...