Closed skoren closed 2 years ago
Still present at approximately 34 Mbp in compressed space, check in corrected graph.
This still appears to be an issue in my latest run w/corrected reads. I see reads mapping across a gap in the HiFi resolved graph indicating an overlap but the nodes are not connected in the final graph. The run is on biowulf under /data/korens/test/mikko_cns/chm1/snakemake/bri_asm/. The details on the nodes are below:
Candidate joining reads:
17989a43-e3cf-44c8-aaa5-e71cda54fb55
4e7c0ea0-f6ba-4ab9-ab0a-a0d0d601975c
7f3ab207-1829-4b2a-b683-a134ba3efce0
8075751a-bdb3-491e-aa7e-937618a16634
c94415bf-0411-4287-8bd3-4929e1e58c50
cffbe44a-0f54-40c2-b934-8a3bd8371eae
d3c46a82-e680-4d7a-a222-7fd184a9d42f
e0f42659-f2ce-41d1-9082-611c509ff470
Mappings in the all.gaf file:
4e7c0ea0-f6ba-4ab9-ab0a-a0d0d601975c 26583 151 17652 + >utig1-69059 1122934 1105404 1122934 17322 17638 60 NM:i:316 AS:f:15396.4 dv:f:0.0179159 id:f:0.982084
d3c46a82-e680-4d7a-a222-7fd184a9d42f 39635 10990 39628 + <utig1-69059 1122934 0 28762 28195 29026 60 NM:i:831 AS:f:23103.5 dv:f:0.0286295 id:f:0.97137
d3c46a82-e680-4d7a-a222-7fd184a9d42f 39635 21 12523 + <utig1-63255>utig1-63254>utig1-63254>utig1-63254 251460 238870 251454 12295 12706 60 NM:i:411 AS:f:9764.74 dv:f:0.0323469 id:f:0.967653
17989a43-e3cf-44c8-aaa5-e71cda54fb55 24149 24 16805 + >utig1-69059 1122934 1106055 1122934 16538 17018 60 NM:i:480 AS:f:13584.2 dv:f:0.0282054 id:f:0.971795
cffbe44a-0f54-40c2-b934-8a3bd8371eae 50294 14521 50282 + <utig1-63254<utig1-63254<utig1-63254>utig1-63255 251460 7 35853 35332 36114 60 NM:i:782 AS:f:30552.9 dv:f:0.0216537 id:f:0.978346
cffbe44a-0f54-40c2-b934-8a3bd8371eae 50294 25 16118 + >utig1-69059 1122934 1106783 1122934 15875 16273 60 NM:i:398 AS:f:13442.3 dv:f:0.0244577 id:f:0.975542
c94415bf-0411-4287-8bd3-4929e1e58c50 60410 27393 60399 + <utig1-63254<utig1-63254<utig1-63254>utig1-63255 251460 7 33067 32455 33416 60 NM:i:961 AS:f:26605.7 dv:f:0.0287587 id:f:0.971241
c94415bf-0411-4287-8bd3-4929e1e58c50 60410 24 28966 + >utig1-69059 1122934 1093899 1122927 28539 29261 60 NM:i:722 AS:f:24133.5 dv:f:0.0246745 id:f:0.975326
e0f42659-f2ce-41d1-9082-611c509ff470 43701 27 27822 + <utig1-63255>utig1-63254>utig1-63254 251438 223521 251420 27366 28157 60 NM:i:791 AS:f:22526.9 dv:f:0.0280925 id:f:0.971908
e0f42659-f2ce-41d1-9082-611c509ff470 43701 26283 43701 + <utig1-69059 1122934 0 17552 17104 17714 60 NM:i:610 AS:f:13355.4 dv:f:0.034436 id:f:0.965564
7f3ab207-1829-4b2a-b683-a134ba3efce0 94860 25242 94860 + <utig1-63254<utig1-63254<utig1-63254>utig1-63255 251460 7 69967 68899 70386 60 NM:i:1487 AS:f:59714.6 dv:f:0.0211264 id:f:0.978874
7f3ab207-1829-4b2a-b683-a134ba3efce0 94860 22 26801 + >utig1-69059 1122934 1096091 1122916 26549 26960 60 NM:i:411 AS:f:24041.7 dv:f:0.0152448 id:f:0.984755
8075751a-bdb3-491e-aa7e-937618a16634 30537 31 20780 + >utig1-69059 1122934 1102068 1122934 20427 21039 60 NM:i:612 AS:f:16673.1 dv:f:0.0290888 id:f:0.970911
8075751a-bdb3-491e-aa7e-937618a16634 30537 19213 30537 + <utig1-63254<utig1-63254<utig1-63254>utig1-63255 251460 7 11333 11006 11526 60 NM:i:520 AS:f:7860.8 dv:f:0.0451154 id:f:0.954885
Node utig1-63254 ends up part of utig4-4815 while node utig1-69069 ends up part of utig4-5641. The mapping to the chrX reference with mashmap shows these are adjacent with a small overlap (consistent with mappings in gaf):
utig4-5641 34416084 34390000 34416083 - chrY 44220853 719 29866 99.2108
utig4-5641 34416084 32690000 34389999 - chrY 44220853 54721 1757405 99.4759
utig4-5641 34416084 31920000 33469999 - chrX 107944777 936940 2484527 99.686
utig4-5641 34416084 31450000 31929999 - chrX 107944777 2451309 2930944 99.745
utig4-5641 34416084 29130000 31449999 - chrX 107944777 3157132 5471779 99.8401
utig4-5641 34416084 1110000 29119999 - chrX 107944777 5477993 33496347 99.8846
utig4-5641 34416084 1080000 1129999 + chrX 107944777 33477123 33525432 98.774
utig4-5641 34416084 430000 1099999 - chrX 107944777 33506048 34183720 99.9217
utig4-5641 34416084 410000 429999 - chrX 107944777 34170381 34189293 100
utig4-5641 34416084 340000 389999 - chrX 107944777 34206235 34265858 99.9638
utig4-5641 34416084 0 329999 - chrX 107944777 34277704 34607548 99.9739
utig4-4815 251394 0 251393 - chrX 107944777 34605476 34856848 99.9717
However, these nodes are in two disconnected components in the graph. Would be good to trace why these nodes aren't being joined/resolved by the ONT step.
There is also a second gap 3mb away in the latest corrected assembly that was not present in my, admittedly old, uncorrected assembly (/data/rautiainenma/CHM1_test_20210930
). The gap is in the HiFi graph between nodes utig1-25789 and utig1-34476. I only see 1 ONT read joining them in the alignments (db77a086-4d9d-48dd-9454-e80301a870bf). However, in the old run there is another read (55419a7f-5227-4546-8345-d6de25abe0b5) which connects the two dead ends. In the latest run, the second read just has half of it unmapped as far as I can see. Not sure why the mapping is lost. Here are the mappings in the old run:
db77a086-4d9d-48dd-9454-e80301a870bf 121561 36800 121552 + >19541 561967 0 84780 83039 85875 60 NM:i:2836 AS:f:75308.1 dv:f:0.0330247 id:f:0.966975
db77a086-4d9d-48dd-9454-e80301a870bf 121561 66 33671 + <70316 110851 77220 110851 32893 34066 60 NM:i:1173 AS:f:29698.9 dv:f:0.0344332 id:f:0.965567
55419a7f-5227-4546-8345-d6de25abe0b5 82816 39465 82800 + >19541 561967 0 43295 42255 43775 60 NM:i:1520 AS:f:38273.4 dv:f:0.034723 id:f:0.965277
55419a7f-5227-4546-8345-d6de25abe0b5 82816 61 36342 + <70316 110851 74488 110851 35563 36739 60 NM:i:1176 AS:f:32364.9 dv:f:0.0320096 id:f:0.96799
and in the new run:
db77a086-4d9d-48dd-9454-e80301a870bf 121561 36800 121553 + <utig1-34476 561967 0 84781 83138 85834 60 NM:i:2696 AS:f:66797.6 dv:f:0.0314095 id:f:0.968591
db77a086-4d9d-48dd-9454-e80301a870bf 121561 66 33671 + >utig1-25789 110849 77220 110849 32946 34043 60 NM:i:1097 AS:f:26299 dv:f:0.032224 id:f:0.967776
55419a7f-5227-4546-8345-d6de25abe0b5 82816 61 36342 + >utig1-25789 110849 74488 110849 35703 36711 60 NM:i:1008 AS:f:29567.7 dv:f:0.0274577 id:f:0.972542
Resolved in beta version
ChrX in CHM1 is split by a 1.5kb gap spanned by ONT reads. Confirm latest version fixes the gap.