Closed snurk closed 2 years ago
Following up on latest bonobo asm, it has 3 gaps on chrY and 4 on chrX based on alignments to a Flye trio assembly. The tentative layout for Y is: utig4-1203-,utig4-131-,utig4-1139+,utig4-1057-
. Looking at the. gap between utig4-1139 and utig4-1057, there are supporting ONT reads joining the unitigs (looking only at reads >150kb):
319251a5-e087-45c3-a41f-2bc334b71a6c 524237 45 404802 + >utig1-25267>utig1-25268<utig1-25267 8509455 3879312 4284779 396373 410176 60 NM:i:13803 AS:f:312829 dv:f:0.0336514 id:f:0.966349
319251a5-e087-45c3-a41f-2bc334b71a6c 524237 402475 524236 + <utig1-17442 1649295 0 121919 119472 123248 60 NM:i:3776 AS:f:96612.8 dv:f:0.0306374 id:f:0.969363
b1296b29-be31-4e9b-b46e-b662bfdd6d41 342464 57 256410 + >utig1-17442 1649295 1392109 1649295 251894 259583 60 NM:i:7689 AS:f:205144 dv:f:0.0296206 id:f:0.970379
b1296b29-be31-4e9b-b46e-b662bfdd6d41 342464 254167 342442 + >utig1-25267<utig1-25268<utig1-25267 8509455 4224677 4313145 86674 89385 60 NM:i:2711 AS:f:70219.7 dv:f:0.0303295 id:f:0.969671
a16f8fdd-b4c6-43fb-9a43-1b912304b79c 204704 70 162131 + >utig1-25267>utig1-25268<utig1-25267 8509455 4122470 4284782 160185 163427 60 NM:i:3242 AS:f:140469 dv:f:0.0198376 id:f:0.980162
a16f8fdd-b4c6-43fb-9a43-1b912304b79c 204704 159821 204704 + <utig1-17442 1649295 26 44922 44483 45166 60 NM:i:683 AS:f:40334.2 dv:f:0.015122 id:f:0.984878
However, node utig1-25268 is a loop so isn't considered a dead-end and is therefore not joined to utig1-17442. Looking at gap between utig4-131 and utig4-1139, utig4-131 is not a deadend because it has a connection to the other haplotype only. Lastly, the other side of utig4-131 has no dead-end either and is connected to the other haplotype.
For chrX, the tentative layout is utig4-1069+,utig1-22529,utig4-896-,utig4-893-,utig4-894+,utig4-893-,utig4-895+,utig4-1458+,utig4-1459+,utig4-1483-,utig4-1461-,utig4-1462+,utig4-1483-,utig4-1477-
. There are few unresolved tangles due to length. The gap between utig4-1069 and utig4-896 is unclear and is missing about 200kb which may be filled by utig1-22529 but this unitig is not in the final graph. Gap between utig4-895 and utig4-1458 is not a deadend.
Resolved by read-correction, still need to investigate missed haplotype connections when run done.
Was resolved by correction, remaining issue appear to be HiFi coverage related.