vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

Variant Calling from HPRC Pangenome #4263

Open bipinbalan opened 2 months ago

bipinbalan commented 2 months ago

I used GraphAligner to align HiFi reads of the sample to the HPRC pangenome. The "vg stats" command revealed an alignment percentage of 90.16% (6056984/6717677). However, upon examining the generated VCF file, I noticed it only contained headers with no reported variants. Below are the steps I followed in my analysis.

Step1 : Download HPRC pangenome files from Amazon Web Services.

https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.0-mc-chm13.gfa.gz https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.0-mc-chm13.xg

Step2 : Map the HiFi reads of the sample to the HPRC pangenome using GraphAligner _GraphAligner -t 30 -g hprc-v1.0-mc-chm13.gfa -f Sample.hifireads.fastq.gz -a Sample.gam -x vg

Step3 : Executed following vg commands to perform variant calling vg pack -t 30 -x hprc-v1.0-mc-chm13.xg -g Sample.gam -Q 5 -o Sample.pack vg call -t 30 hprc-v1.0-mc-chm13.xg -k Sample.pack > Sample.vcf

Any assistance on how to troubleshoot this issue will be much appreciated.

glennhickey commented 2 months ago

Could be that the -Q 5 option is filtering all mappings. You may want to verify your gam has MAPQ values (vg view -a | grepmapping_quality`). If this is the issue (they default to 0 if unspecified), you cna rerun without the filter, or try changing your GraphAligner options to output MAPQs.

jeizenga commented 2 months ago

I vaguely recall that GraphAligner might not be able to output mapping qualities. It would be worth checking me on this though.

glennhickey commented 2 months ago

It used to not output MAPQs but from what I understand newer versions should have them https://github.com/maickrau/GraphAligner/issues/75#issuecomment-1435513385