Hybrid assemblie gives more contigs than "long-read only"

Hi, recently I also encouter this issue. In my data, I think the 'breaking' is resulted by the wrongly short-read assembly.

To test it, you could also try long-read only (only input long reads to unicycler) mode, which I think will generate less than 12 reads.

However, when you use the hybrid mode, it will only use the short-read assembly as anchors and map the long reads to these anchors for bridging. Thus, the 'breaking' is not accurate since it doesn't use long reads as backbone. With this process, it will result in the 'breaking' issue when a short-read assemblied contig is able to be mapped to two different places at the long-read assembly. When this happen, it will wrongly connect/bridge the short-reads assemblied contigs (which is also the anchor) using long read data. This issue will happen if your genomes have many repeats, which will generate a wrong contig.

Actually, I didn't resolve this issue for now. But I think removing/correcting these wrong contigs will help.

You could start from restricting the length of anchors using the parameter min_anchor_seg_len. However, in my data, very long contig can be wrongly assemblied. Thus, I still looking for the solution.

rrwick / Unicycler

Hybrid assemblie gives more contigs than "long-read only" #304