lh3 / minigraph

Sequence-to-graph mapper and graph generator
https://lh3.github.io/minigraph
MIT License
420 stars 38 forks source link

Minigraph does not find variants in simulated data #119

Open agolicz opened 4 days ago

agolicz commented 4 days ago

Hello, We are trying to understand minigraph behavior.

We built a minigraph graph with 6 simulated assemblies (only SVs > 100bp, no SNPs, simulated with VISOR) and saw that we ended up with very few extremely large nodes and many SVs we used to make simulations did not end up in the graph despite being longer than 100bp. We only ended up with less than 10 SVs per chromosome where we used on average more than 3700 SVs per chromosome to make the simulations. We verified that the simulated data was not the issue because the data works well with other pangenome graph building pipelines (Minigraph-Cactus and PGGB). Also minigraph works perfectly well with real-world data, so something is going on with the simulated data and we do not know why. Could you perhaps help explain this?

It looks like others may have faced a similar problem? https://github.com/lh3/minigraph/issues/62

jp-jong commented 3 days ago

Similar with #118 . I also encountered a similar problem when I did my own simulation (although in a very small simulated size)