vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

Pack table generated from GAM vs. GAF drastically different #3546

Open briannadon opened 2 years ago

briannadon commented 2 years ago

1. What were you trying to do? Get (and understand!) a packed coverage table from an alignment generated with GraphAligner against my variation graph.

I have a set of reads from a locus aligned to a graph of variation at that locus. After aligning, I am converting the GAM to GAF format with vg convert -G to manually inspect the alignments. I am using the GAM to create a packed edge coverage table with vg pack -D. However, I tried making a packed edge coverage table from the GAF with vg pack -a as well afterwards, and noticed the coverages were very different.

2. What did you want to happen? Identical output.

3. What actually happened? Here is a diff of the files' heads. I have cut out the 2nd and 4th columns as they are not informative in my case (GAF edge table on left, GAM on the right):

from.id to.id   coverage                                        from.id to.id   coverage
1       5       0                                               1       5       0
2       5       2                                               2       5       2
3       5       408                                           | 3       5       728
3       6       0                                             | 3       6       80
4       5       2                                               4       5       2
5       7       8                                             | 5       7       90
5       8       396                                           | 5       8       642
6       8       6                                             | 6       8       100
7       10      8                                             | 7       10      92

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here: N/A

5. What data and command can the vg dev team use to make the problem happen? I cannot share data, but I am sure this is relatively easy to recreate by trying to pack a GAM, converting to GAF, and then packing again.

6. What does running vg version say?

vg version v1.37.0 "Monchio"
Compiled with g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 on Linux
Linked against libstd++ 20200808
Built by anovak@octagon
briannadon commented 2 years ago

As an aside, any hints as to what these edge tables actually represent and how they're calculated would be greatly appreciated. Are they simply raw counts of how many reads aligned along each edge in my graph?

briannadon commented 2 years ago

Hate to be a bother but the issue has come up again for me and i'm curious if anyone could offer any insight into how the packed coverages are calculated and what they mean.

glennhickey commented 2 years ago

The coverages should be identical whether you are using GAM or GAF. Also, they seem suspiciously high. Is it possible for you to share the data to reproduce?

briannadon commented 2 years ago

They are high coverage because the sequencing is a result of targeted hybrid-capture sequencing of an HLA locus. We expect very high coverage for this data. I'll ask about sharing data.

jiadong324 commented 2 years ago

@briannadon

Hi, I also want to convert GAM to GAF, what's the full command of the conversion. I tried vg convert -G **.gam > **.gaf, but it failed.

Thanks!

glennhickey commented 2 years ago

@jiadong324 vg convert graph.vg -G alignment.gam > alignment.gaf

jiadong324 commented 2 years ago

@glennhickey Thanks!