Open ac2278 opened 3 years ago
It sounds like you may not have mapped your reads as paired (-i
). If you have a bunch of single-end reads, and you aren't emitting secondary mappings (which we don't by default), you only ever get forward strand mapped reads (0), reverse strand mapped reads (16), and unmapped reads (4), and never any combinations.
Thanks for the help, @adamnovak.
Hmm, vg seems to recognize that my input fastq files are paired without the -i
argument.
This is the command I used to map my paired reads for each sample:
vg map -x platinum_maf0.10.xg -g platinum_maf0.10.gcsa -f paired_trim_1.fq -f paired_trim_2.fq > platinum_maf0.10.gam
When I look at alignment statistics using vg stats -a platinum_maf0.10.gam
, I see that vg recognized the inputs as paired (see 'Total properly paired'):
Could you explain what the -i
argument does (what does 'fastq or GAM is interleaved paired-ended' mean)?
An interleaved file is when read 2n+1 is the mate of read 2n (bwa mem -p reads such files). If you input 2 fastq inputs with -f
or one interleaved input with -i
, vg map
will produce an interleaved GAM. This can then be surjected with vg surject -i
to preserve pairing information.
Oh yeah, that's probably it. Surject needs to know that the GAM file is supposed to be paired (-i); we haven't taught it to autodetect that based on GAM crossreferences that ought to be there.
On 5/14/21, Glenn Hickey @.***> wrote:
An interleaved file is when read 2n+1 is the mate of read 2n (bwa mem -p reads such files). If you input 2 fastq inputs with
-f
or one interleaved input with-i
,vg map
will produce an interleaved GAM. This can then be surjected withvg surject -i
to preserve pairing information.-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/vgteam/vg/issues/3289#issuecomment-841494432
-- Adam Novak (He/Him) Senior Software Engineer Computational Genomics Lab UC Santa Cruz Genomics Institute "Revealing life’s code."
Personal Feedback: https://forms.gle/UXZhZc123knF65Dw5
After aligning reads for 285 samples to my genome graph using
vg map
, I converted each GAM file into a BAM file (usingvg surject
) then converted each BAM file into a SAM file (usingsamtools view
). When inspecting the second column of each SAM file, I notice that only three FLAGs appear: 0, 4, and 16. Why is this the case? I would expect a greater variation of FLAGs to be present.https://www.samformat.info/sam-format-flag