can vg gamcompare handle alignments without a refpos

HongboDoll commented 7 months ago

Hi vg team,

I was trying to simulate some reads from a graph (one sample vcf + reference genome) using vg sim and added their positions using vg annotate -m.

All simulated reads had "refpos" but I found that when I mapped these reads to another graph for mapping evaluation, some reads cannot be assigned a refpos alignment positions.

I guess that in this case vg gamcompare will report this an incorrect alignment.

Since my graph for evaluation contains many non-reference paths such as large insertion/deletions, how can I do something to consider those reads that cannot be aligned to reference paths?

Thanks in advance

jltsiren commented 7 months ago

You can try providing a distance index with option -d / --distance-index. It will then use the distance index for determining the distance between the two alignments. The algorithm is not 100% foolproof, because the distance index is based on directed distances. In extreme cases, two alignments can be overlapping at some positions, but the positions that get compared are not reachable from each other.

HongboDoll commented 7 months ago

Hi Jouni,

Thank you but which distance index should I use? The one from my graph for evaluation or that from the graph for read simulation?

jltsiren commented 7 months ago

You use a distance index for the graph you want to measure the distances in. The alignments must be valid in that graph.

The usual way to do it is the following:

Simulate reads from a specific sample the full graph.
Create a reference graph by removing that sample from the full graph (using vg gbwt -R).
Map reads to the reference graph.
Build a distance index for the full graph.
Compare the simulated and mapped alignments using the distance index for the full graph.

Another way would be converting the reference positions for the simulated reads to alignments to the linear reference and then those alignments to alignments in the evaluation graph. It should be possible to do that by first using vg surject for the simulated reads in the simulation graph and then vg inject for the result in the evaluation graph. (But I've never tried that myself.) After that, you should be able to compare the mapped reads and the result of vg inject by using the distance index of the evaluation graph.

HongboDoll commented 7 months ago

Thank you very much I will try

vgteam / vg

can vg gamcompare handle alignments without a refpos #4173