vgteam / sequenceTubeMap

displays multiple genomic sequences in the form of a tube map
MIT License
177 stars 24 forks source link

How to interpret variation graphs in SequenceTubeMap #450

Open gforg34 opened 2 months ago

gforg34 commented 2 months ago

Hi vgteam,

This is my first time using SequenceTubeMap and vg and I want if possible to explain to me the following figures. I did generate a vg (variant graph) from FASTA sequences from 4 different species, and I don't know how to interpret the following figures. To begin with, the first figure is an output of using the vg index vg.xg and gbwt.xg files. Apart from the reference sequence, the rest do not contain any actual name, but instead I get thread 0, thread 1 and thread 2 as a sequence-path name. Did the FASTA headers get renamed during the creation of the variant graph? Do you know how to identify which sequences correspond to the threads? Or am I looking at something else? As far as I understanf that a "thread" refers to a specific sequence or path within the variation graph.

graph(2)

In the next figure, I used the vg file instead of the vg.xg and gbwt.xg file, resulting in a different output. This time, I see multiple paths/haplotypes in addition to the regular four. What are these additional paths, are they alternative haplotypes that are being generated by vg? What is the proper way to visualize the variation graph?
graph(1)

Any help will be valuable. Thank you for your time.