marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
301 stars 29 forks source link

ONT&HiFi's assembly with several Ns gap #203

Closed hq66 closed 1 year ago

hq66 commented 1 year ago

Hi~ Verkko is a user-friendly and powerful software. I used ONT, HiFi and Hi-C to assemble, and then used 3D-DNA to anchor contigs into chromosomes. When I count the Ns to detect gaps in the final assembly, I found some Ns with irregular length, such as 1000bp, 25065bp 111741bp and so on. Because 3D-DNA usually use Ns in length of 500 to link contigs without overlaps, so other length seems irregular, and their position in hic matrix of juicebox were inside contigs, which means they may weren't regarded as gaps in juicer. I wonder how the irregular length of Ns generated and what did they mean. Should they be regarded as gaps? Thanks.

skoren commented 1 year ago

When verkko has phasing information (like Hi-C or trio), it can generate scaffolds when the resolution is ambiguous but long range structure is clear. The size of the gap in this case is the estimate of the sequence that could not be resolved. In some other cases, there may be a gap in one haplotype but not in the other. Here, the gap size will be the estimate of missing sequence based on the other haplotype. In some cases, a fixed gap of 5kb is used when an estimate is not possible. This is described in more detail in the verkko manuscript: https://www.nature.com/articles/s41587-023-01662-6.

The latest version of verkko (1.4.1+) will have some information on the reason for the gap in the 8-hic*/*.paths.tsv like so:

haplotype2_from_utig4-1296. utig4-1296-,[N10591N:ambig_bubble],utig4-3142- HAPLOTYPE2
haplotype2_from_utig4-1990  utig4-1990-,[N5000N:ambig_path],utig4-3120+  HAPLOTYPE2
haplotype2_from_utig4-223   utig4-223-,[N44884N:alt-utig4-504],utig4-282- HAPLOTYPE2

and you can view the context around the gap in the noseq.gfa file output by verkko.