Open alekseyzimin opened 3 years ago
This is with version 1.1.11
Aleksey Zimin, According the alignments, LRScaf builds the assembly graph to do the scaffold process. On the divergence node, if there are not long reads bridging unique nodes, LRScaf will break the path with the divergence node.
Hello,
I am the developer of MaSuRCA assembler. I am looking for a good long-read scaffolder and your paper had nice results. However, when I tried using your scaffolder on a human genome assembly produced by MaSuRCA with ~9Mbp N50 contig size (about 1200 contigs), I found that the scaffolder duplicated many contigs in the scaffolds, resulting in much bigger (3.24Gbp vs 2.85Gbp) final assembly size. This is not the correct behavior. Scaffolder should output about the same amount of sequence, give or take losses in merging contigs. Contigs should never be duplicated exactly unless there is a very good reason for it, and if that is done, then duplicates must be resolved by remapping the reads and re-doing consensus. I found that duplicated contigs were always on the ends of paths in nodePaths.info. My assembly, config xml, the paf output of minimap and lrscaf output are posted here:
ftp://ftp.ccb.jhu.edu/pub/alekseyz/lrscaf_debug/
Best, Aleksey Zimin