vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

How to get all alignment results? #4169

Open SZ-qing opened 9 months ago

SZ-qing commented 9 months ago

Hi, When I prepare to align my short reads to the human pan-genome graph genome, the result is only a path, 1. what I want is to provide me all the sequences that have a mismatch and full alignment with this reads?

  1. And how can I know the annotation information of the sequences that are aligned to the reads, such as to exon regions, cds regions or intron regions, from the results?

Shell:
vg giraffe -Z hprc-v1.1-mc-grch38.gbz -p -f ./small_sim.fq -o json--max-multimaps 10 >small_sim_aln_M10.json

Results: image

SZ-qing commented 9 months ago

When i add ref-paths, the results is not be changed:
vg giraffe -Z hprc-v1.1-mc-grch38.gbz -p -f ./small_sim.fq -o json --max-multimaps 10 --ref-paths ./all_graph_paths.txt >small_sim_aln_M10_allpaths.json

jeizenga commented 9 months ago

I don't know of any tool that does exactly what you're describing. However, the --ref-paths is only relevant for SAM/BAM output, so it's expected that it would not affect the GAM. If you want to annotate the path position in the GAM, you can use vg annotate -x hprc-v1.1-mc-grch38.gbz -p -a, but this method definitely has some failure cases. Also, it will only work to annotate positions on reference paths, so you will not get positions for other haplotypes.