vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

When converting GAM to BAM to SAM, why do all reads in the SAM file have only either a FLAG of 0, 4, or 16? #3289

Open ac2278 opened 3 years ago

ac2278 commented 3 years ago

After aligning reads for 285 samples to my genome graph using vg map, I converted each GAM file into a BAM file (using vg surject) then converted each BAM file into a SAM file (using samtools view). When inspecting the second column of each SAM file, I notice that only three FLAGs appear: 0, 4, and 16. Why is this the case? I would expect a greater variation of FLAGs to be present.

https://www.samformat.info/sam-format-flag

adamnovak commented 3 years ago

It sounds like you may not have mapped your reads as paired (-i). If you have a bunch of single-end reads, and you aren't emitting secondary mappings (which we don't by default), you only ever get forward strand mapped reads (0), reverse strand mapped reads (16), and unmapped reads (4), and never any combinations.

ac2278 commented 3 years ago

Thanks for the help, @adamnovak.

Hmm, vg seems to recognize that my input fastq files are paired without the -i argument.

This is the command I used to map my paired reads for each sample: vg map -x platinum_maf0.10.xg -g platinum_maf0.10.gcsa -f paired_trim_1.fq -f paired_trim_2.fq > platinum_maf0.10.gam

When I look at alignment statistics using vg stats -a platinum_maf0.10.gam, I see that vg recognized the inputs as paired (see 'Total properly paired'): Screen Shot 2021-05-14 at 1 19 12 PM

Could you explain what the -i argument does (what does 'fastq or GAM is interleaved paired-ended' mean)?

Screen Shot 2021-05-14 at 1 21 21 PM

glennhickey commented 3 years ago

An interleaved file is when read 2n+1 is the mate of read 2n (bwa mem -p reads such files). If you input 2 fastq inputs with -f or one interleaved input with -i, vg map will produce an interleaved GAM. This can then be surjected with vg surject -i to preserve pairing information.

adamnovak commented 3 years ago

Oh yeah, that's probably it. Surject needs to know that the GAM file is supposed to be paired (-i); we haven't taught it to autodetect that based on GAM crossreferences that ought to be there.

On 5/14/21, Glenn Hickey @.***> wrote:

An interleaved file is when read 2n+1 is the mate of read 2n (bwa mem -p reads such files). If you input 2 fastq inputs with -f or one interleaved input with -i, vg map will produce an interleaved GAM. This can then be surjected with vg surject -i to preserve pairing information.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/vgteam/vg/issues/3289#issuecomment-841494432

-- Adam Novak (He/Him) Senior Software Engineer Computational Genomics Lab UC Santa Cruz Genomics Institute "Revealing life’s code."

Personal Feedback: https://forms.gle/UXZhZc123knF65Dw5