lh3 / minigraph

Sequence-to-graph mapper and graph generator
https://lh3.github.io/minigraph
MIT License
420 stars 38 forks source link

Question about the coordinates in bubble output #50

Open egoltsman opened 3 years ago

egoltsman commented 3 years ago

Dear Dr. Li, When running minigraph --call, how are the positions of the "alleles" (bubble vertices) determined? I am assuming that the reference sequences are mapped to the graph, and that's how the coordinates are calculated. Is that correct? Below, see an example where I queried the graph with the same sequence that was used as the reference sample during the graph construction phase (minigraph -xasm --call foo.gfa Bd21C1.fa)

Bd21C1  481920  481939  >s1     >s3     >s2:19:+:Bd21C1:481915:481941

Notice how the allele coordinate is different from the bubble position (presumably on the same sequence). Is that a mapping artifact?

Also, what happens when a sequence maps to the bubble in two or more places?

Thank you

lh3 commented 2 years ago

Sorry for the very late response. The coordinate in the last column shows the closest minimizer positions which usually span larger distance than the bubble. When constructing graphs, minigraph tries to find the tightest bubble, but it is not doing that in the --call mode. This is why you see the discrepancy.