lh3 / minigraph

Sequence-to-graph mapper and graph generator
https://lh3.github.io/minigraph
MIT License
405 stars 41 forks source link

question about minimum shared sequence to add genome to graph #14

Closed phiweger closed 4 years ago

phiweger commented 4 years ago

hi,

thank you for this amazing tool. what is the minimum overlap 2 sequences can have, so that minigraph will construct a node shared by the 2 underlying genomes?

i am asking bc/ when i construct a DBG from a couple of genomes i used for testing, i find several 31mers that are shared among them (ie a nodes connecting the corresponding colors) -- however, when i run these genomes through minigraph, they remain disconnected (both w/ default params as well as when reducing minimum variant length).

kind regards

phiweger commented 4 years ago

i investigated a little further and the problem seems to be that minigraph does not add sequences to the graph if they diverge too much from the reference genome provided as the first in the graph, is that correct?

# genome of a different species but w/ a known shared region
[M::ggen_map::0.354*2.99] mapped 0 sequence(s) to the graph

is it possible to add everything to the graph and not discard anything?

lh3 commented 4 years ago

minigraph only inserts sequences that are contained in a linear alignment. It doesn't work with your case unfortunately.

phiweger commented 4 years ago

thank you for the quick response! would it be possible to add this as an option? my use case is bacterial pangenomics, where you regularly get only 20% sequence overlap (core genome) -- if i understand correctly, the remaining 80 % (accessory genome) will be lost, correct?

lh3 commented 4 years ago

Sorry, minigraph is not designed for this use case.