stephenturner / oneliners

Useful bash one-liners for bioinformatics.
1.84k stars 519 forks source link

bam to fastq one liner #13

Open LukeBraidwood opened 9 years ago

LukeBraidwood commented 9 years ago

Hey,

Thanks very much for putting these explanations and tools up. I think the one liner you have put for converting bam to fastq is inappropriate (or should be described differently). The problem is that your awk prints fields 1, 10, and 11 in the bam.

Field 10 is called SEQ and represents the query sequence to which the read is aligned. However alignment sequences are always represented on the plus strand of the reference (http://chagall.med.cornell.edu/NGScourse/SAM.pdf, http://genome.sph.umich.edu/wiki/SAM), meaning that for stranded bams this tool is inappropriate.

Thanks,

Luke

stephenturner commented 9 years ago

Thanks. Suggestion / pull request welcomed.

Stephen

Sent from mobile.

On Jun 15, 2015, at 10:26 AM, LukeBraidwood notifications@github.com wrote:

Hey,

Thanks very much for putting these explanations and tools up. I think the one liner you have put for converting bam to fastq is inappropriate (or should be described differently). The problem is that your awk prints fields 1, 10, and 11 in the bam.

Field 10 is called SEQ and represents the query sequence to which the read is aligned. However alignment sequences are always represented on the plus strand of the reference (http://chagall.med.cornell.edu/NGScourse/SAM.pdf, http://genome.sph.umich.edu/wiki/SAM), meaning that for stranded bams this tool is inappropriate.

Thanks,

Luke

— Reply to this email directly or view it on GitHub.

LukeBraidwood commented 9 years ago

Dear Stephen,

Sorry for the slow reply, just remembered this exchange. I'm currently using the samtofastq tool from picard tools, which has an option to regenerate the RC of alignments to the negative strand: http://broadinstitute.github.io/picard/command-line-overview.html#SamToFastq

Cheers,

Luke

On Mon, Jun 15, 2015 at 4:36 PM, Stephen Turner notifications@github.com wrote:

Thanks. Suggestion / pull request welcomed.

Stephen

Sent from mobile.

On Jun 15, 2015, at 10:26 AM, LukeBraidwood notifications@github.com wrote:

Hey,

Thanks very much for putting these explanations and tools up. I think the one liner you have put for converting bam to fastq is inappropriate (or should be described differently). The problem is that your awk prints fields 1, 10, and 11 in the bam.

Field 10 is called SEQ and represents the query sequence to which the read is aligned. However alignment sequences are always represented on the plus strand of the reference ( http://chagall.med.cornell.edu/NGScourse/SAM.pdf, http://genome.sph.umich.edu/wiki/SAM), meaning that for stranded bams this tool is inappropriate.

Thanks,

Luke

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stephenturner/oneliners/issues/13#issuecomment-112112691 .