marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
294 stars 29 forks source link

Cases to study #27

Closed snurk closed 2 years ago

snurk commented 2 years ago
skoren commented 2 years ago

Following up on latest bonobo asm, it has 3 gaps on chrY and 4 on chrX based on alignments to a Flye trio assembly. The tentative layout for Y is: utig4-1203-,utig4-131-,utig4-1139+,utig4-1057-. Looking at the. gap between utig4-1139 and utig4-1057, there are supporting ONT reads joining the unitigs (looking only at reads >150kb):

319251a5-e087-45c3-a41f-2bc334b71a6c    524237  45      404802  +       >utig1-25267>utig1-25268<utig1-25267    8509455 3879312 4284779 396373  410176  60      NM:i:13803      AS:f:312829     dv:f:0.0336514  id:f:0.966349
319251a5-e087-45c3-a41f-2bc334b71a6c    524237  402475  524236  +       <utig1-17442    1649295 0       121919  119472  123248  60      NM:i:3776       AS:f:96612.8    dv:f:0.0306374  id:f:0.969363

b1296b29-be31-4e9b-b46e-b662bfdd6d41    342464  57      256410  +       >utig1-17442    1649295 1392109 1649295 251894  259583  60      NM:i:7689       AS:f:205144     dv:f:0.0296206  id:f:0.970379
b1296b29-be31-4e9b-b46e-b662bfdd6d41    342464  254167  342442  +       >utig1-25267<utig1-25268<utig1-25267    8509455 4224677 4313145 86674   89385   60      NM:i:2711       AS:f:70219.7    dv:f:0.0303295  id:f:0.969671

a16f8fdd-b4c6-43fb-9a43-1b912304b79c    204704  70      162131  +       >utig1-25267>utig1-25268<utig1-25267    8509455 4122470 4284782 160185  163427  60      NM:i:3242       AS:f:140469     dv:f:0.0198376  id:f:0.980162
a16f8fdd-b4c6-43fb-9a43-1b912304b79c    204704  159821  204704  +       <utig1-17442    1649295 26      44922   44483   45166   60      NM:i:683        AS:f:40334.2    dv:f:0.015122   id:f:0.984878

However, node utig1-25268 is a loop so isn't considered a dead-end and is therefore not joined to utig1-17442. Looking at gap between utig4-131 and utig4-1139, utig4-131 is not a deadend because it has a connection to the other haplotype only. Lastly, the other side of utig4-131 has no dead-end either and is connected to the other haplotype.

For chrX, the tentative layout is utig4-1069+,utig1-22529,utig4-896-,utig4-893-,utig4-894+,utig4-893-,utig4-895+,utig4-1458+,utig4-1459+,utig4-1483-,utig4-1461-,utig4-1462+,utig4-1483-,utig4-1477-. There are few unresolved tangles due to length. The gap between utig4-1069 and utig4-896 is unclear and is missing about 200kb which may be filled by utig1-22529 but this unitig is not in the final graph. Gap between utig4-895 and utig4-1458 is not a deadend.

skoren commented 2 years ago

Resolved by read-correction, still need to investigate missed haplotype connections when run done.

skoren commented 2 years ago

Was resolved by correction, remaining issue appear to be HiFi coverage related.