marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
285 stars 28 forks source link

Potential graphaligner/resolution instability #176

Open skoren opened 1 year ago

skoren commented 1 year ago

I have two runs of HG002 with slightly different resolutions in the same genomic region with the same input data. The hifi graphs look similar, the new run: Screen Shot 2023-06-19 at 11 41 10 AM

vs the old: Screen Shot 2023-06-19 at 11 40 33 AM

The nodes in question are utig1-43293, utig1-59495 in the new graph and utig1-43264, utig1-59466 in the old graph. In the old graph, both haplotypes are resolved and phased. However, in the new run, the paternal haplotype is broken and has no valid path: Screen Shot 2023-06-19 at 12 03 02 PM

The assemblies are in globus under hg002/asm (original) and hg002/asm_fromscratch (new).

skoren commented 9 months ago

We have a another couple of examples from HG002 where adding ONT data makes some parts of the assembly break. These are on the T2T globus under HG002-Duplex/duplex_50x/asm_3cellONT and asm. Here, chr19 maternal (mat-0000003 in asm_3cellONT) is complete with two telomeres and gapless but adding more ont coverage (mat-0000009 in asm) makes it lose the telomere on one side. Another example from the same assembly, chr7 paternal (pat-0001315 in asm_3cellONT and pat-0001411 in asm) adds a gap with increased coverage around node utig1-28406

skoren commented 8 months ago

And another example under globus for HG002 (asm_newcorrection_repeatbranch vs asm_newcorrection_repeatbranch_rerun) where the region around node utig1-60013 is not resolved in the rerun. The graphs look identical (with same node lengths/counts/edges) and alignments in this region of the ONT reads (alns-ont-mapqfilter.gaf) are the same but in one case, this node has 3 paths using it whereas in the rerun there are 5, leading to an unresolved tangle in the final graph.

skoren commented 4 months ago

I've added some examples of resolutions not in the v2.0 release that were resolved in at least one previous run (from 1.3 to 1.4.1 to v2.0 before correction changes) in hg002/asm_verkko_v2_tip. There's a folder for each chr with a brief description of what assembly/relevant reads for previous resolution. @maickrau if you could take a look at this. The previous example from chr1 is also still present in the v2.0 assembly (comment from Jan 3).