vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 191 forks source link

can vg gamcompare handle alignments without a refpos #4173

Closed HongboDoll closed 6 months ago

HongboDoll commented 7 months ago

Hi vg team,

I was trying to simulate some reads from a graph (one sample vcf + reference genome) using vg sim and added their positions using vg annotate -m.

All simulated reads had "refpos" but I found that when I mapped these reads to another graph for mapping evaluation, some reads cannot be assigned a refpos alignment positions.

I guess that in this case vg gamcompare will report this an incorrect alignment.

Since my graph for evaluation contains many non-reference paths such as large insertion/deletions, how can I do something to consider those reads that cannot be aligned to reference paths?

Thanks in advance

jltsiren commented 7 months ago

You can try providing a distance index with option -d / --distance-index. It will then use the distance index for determining the distance between the two alignments. The algorithm is not 100% foolproof, because the distance index is based on directed distances. In extreme cases, two alignments can be overlapping at some positions, but the positions that get compared are not reachable from each other.

HongboDoll commented 7 months ago

Hi Jouni,

Thank you but which distance index should I use? The one from my graph for evaluation or that from the graph for read simulation?

jltsiren commented 7 months ago

You use a distance index for the graph you want to measure the distances in. The alignments must be valid in that graph.

The usual way to do it is the following:

Another way would be converting the reference positions for the simulated reads to alignments to the linear reference and then those alignments to alignments in the evaluation graph. It should be possible to do that by first using vg surject for the simulated reads in the simulation graph and then vg inject for the result in the evaluation graph. (But I've never tried that myself.) After that, you should be able to compare the mapped reads and the result of vg inject by using the distance index of the evaluation graph.

HongboDoll commented 7 months ago

Thank you very much I will try