Closed ptrebert closed 3 years ago
This happens sometimes when the homopolymer length consensus picks different lengths for the two sides of an edge. eg, there's a short node with just one minimizer, and the consensus picks a short homopolymer length in that node, but a longer length on the other end of a neighboring node, leading to the overlap being longer than the short node. This is also the reason why the overlap lengths are not identical between the two edge lines.
You can sidestep this by using the parameter --blunt
which will create a graph without edge overlaps. After that you should clean the graph with vg. If you're building from contigs instead of reads you might also try --no-hpc
to disable homopolymer consensus.
Thanks for the explanation. Do you happen to have any type of empirical recommendations/best practices for cleaning a blunt graph built from HiFi reads?
I've used vg: vg view -Fv graph.gfa | vg mod -n -U 100 - | vg view - > blunt-graph.gfa
It will produce graphs with reasonable topologies but it will also remove coverage information from the nodes.
Thanks
This is fixed in MBG v1.0.4
Hi Mikko,
I created a gfa using MBG v1.0.3 (via bioconda), and GraphAligner aborts the subsequent alignment immediately with the following message:
The dataset is quite large, but maybe not needed because it is a simple off-by-one error?
Best, Peter