Closed mrvollger closed 2 years ago
Data is available on biowulf in / data/korens/test/mikko_cns/chm1/mitchell_bug43/
The issue seems to be a lost connection from the HiFi to ONT-resolved graph:
Blue denotes chr16 and red chr17 based on alignments to CHM13 in the ONT graph. The ONT-resolved node marked in red in the HiFi graph becomes utig2-29896. This node includes utig1-54488 causing the connection between utig1-67301 (aka utig2-29896) and utig1-67299 (aka utig2-28347) to be lost.
Looking at ONT alignments, it seems both resolutions are supported so I'm not sure why one is chosen:
% grep utig1-67301 4-processONT/alns-ont-filter-trim.gaf |grep -v utig1-56431 |awk '{l=split($6, a, ">|<"); if (l>2) print $0}'|grep utig1-54488
4d142adc-8d35-4a12-9892-fd198955a2d6 165329 1551 163681 + <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899<utig1-12478>utig1-12477>utig1-12481>utig1-53743<utig1-13190>utig1-13188>utig1-13192>utig1-21216 325100 152812 316356 155552 171011 60 NM:i:15459 AS:f:62173.1 dv:f:0.0903977 id:f:0.909602
278e1347-30b2-49ba-965c-8ab79c184614 52242 1519 50742 + <utig1-67301<utig1-54488>utig1-54486 199307 141863 191300 50858 53184 60 NM:i:2326 AS:f:36731.8 dv:f:0.043735 id:f:0.956265
cd71c36e-65e1-4079-bf2d-38ee8470a990 57150 1525 55628 + <utig1-67301<utig1-54488>utig1-54486 199307 121425 175823 56359 57870 60 NM:i:1511 AS:f:47039.7 dv:f:0.0261102 id:f:0.97389
f8ab3c19-721c-41ac-b8fe-253220ae5918 94004 1523 92488 + <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899<utig1-12478>utig1-12477 263501 152232 244148 90840 96425 60 NM:i:5585 AS:f:56768.9 dv:f:0.0579207 id:f:0.942079
168b8cc8-c9b0-4c2a-bfc3-2237f2a94c42 34771 1521 33266 + <utig1-54486>utig1-54488>utig1-67301 199307 21558 53384 34172 35171 60 NM:i:999 AS:f:28091.7 dv:f:0.0284041 id:f:0.971596
f9afbfd2-4475-4669-a845-aa1913106522 29478 1522 27978 + <utig1-54486>utig1-54488>utig1-67301 199307 15399 42012 28612 30082 60 NM:i:1470 AS:f:19665.8 dv:f:0.0488664 id:f:0.951134
040deef9-5765-4ae5-8b67-627e94f76108 35835 1522 34334 + <utig1-54486>utig1-54488>utig1-67301 199307 25223 58203 35185 36339 60 NM:i:1154 AS:f:28126.4 dv:f:0.0317565 id:f:0.968243
03c062f0-fc40-45e8-85c6-65a01ca4476a 44094 1502 42553 + <utig1-67301<utig1-54488>utig1-54486 199307 145780 187056 43404 44668 60 NM:i:1264 AS:f:35632.8 dv:f:0.0282977 id:f:0.971702
0390412b-f55d-4dc0-8dab-33516520db08 58672 1523 57147 + <utig1-54486>utig1-54488>utig1-67301 199307 17331 73340 57610 59593 60 NM:i:1983 AS:f:45417.2 dv:f:0.0332757 id:f:0.966724
1c8ff06b-3423-458b-abf8-7823d903f68f 103056 1535 101548 + <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899<utig1-12478 234815 133473 234292 100351 105207 60 NM:i:4856 AS:f:70672 dv:f:0.0461566 id:f:0.953843
17ec5f4a-6ce4-4279-b5e2-e1be96569c4e 55269 1526 53762 + <utig1-19899>utig1-19900<utig1-54486>utig1-54488>utig1-67301 233130 21794 74089 54080 55982 60 NM:i:1902 AS:f:42568.7 dv:f:0.0339752 id:f:0.966025
1176c537-1420-4b4f-b152-0e17e373a267 69820 1521 68312 + <utig1-19899>utig1-19900<utig1-54486>utig1-54488>utig1-67301 233130 29591 96570 68223 70873 60 NM:i:2650 AS:f:52142 dv:f:0.0373908 id:f:0.962609
b636457a-e7c7-4ce3-898b-667ac8a9c7fd 29323 1523 27703 + <utig1-67301<utig1-54488>utig1-54486 199307 145349 171656 28666 29608 60 NM:i:942 AS:f:22906.3 dv:f:0.0318157 id:f:0.968184
21507e8c-ac66-47b4-b564-74f4b78d2f8c 74433 1552 72925 + <utig1-67301<utig1-54488>utig1-54486 199307 108895 180745 72199 75994 60 NM:i:3795 AS:f:49098.3 dv:f:0.0499382 id:f:0.950062
e19cdabf-0f22-4ed7-80b8-e78e9500d60b 42511 1532 40958 + <utig1-67301<utig1-54488 169253 129010 168839 41470 43374 60 NM:i:1904 AS:f:29745.4 dv:f:0.0438973 id:f:0.956103
2c053df8-8257-4172-91a0-c8813d7d7f5a 23541 1570 22041 + <utig1-67301<utig1-54488>utig1-54486 199307 155029 175534 22659 23984 60 NM:i:1325 AS:f:14646.5 dv:f:0.0552452 id:f:0.944755
5efd0f90-a0bf-4fad-ac1b-e98ae0a2c93d 49408 1529 47895 + <utig1-67301<utig1-54488>utig1-54486 199307 138271 184834 47962 50369 60 NM:i:2407 AS:f:33335.4 dv:f:0.0477873 id:f:0.952213
6e1e5777-c185-493a-a053-b0a4c99802db 79883 1522 78383 + <utig1-54486>utig1-54488>utig1-67301 199307 4469 81776 78566 81057 60 NM:i:2491 AS:f:63270.9 dv:f:0.0307315 id:f:0.969269
b5c8d4f8-d86e-4153-9e2c-de014346d7c9 69524 1561 68024 + <utig1-67301<utig1-54488>utig1-54486 199307 128953 195382 68036 70309 60 NM:i:2273 AS:f:54324.8 dv:f:0.0323287 id:f:0.967671
a303d13a-099f-47ba-b5e4-3df09907947a 50485 1520 43816 + <utig1-67301<utig1-54488>utig1-54486 199307 140105 183377 43452 47142 60 NM:i:3690 AS:f:20720.6 dv:f:0.0782742 id:f:0.921726
a85221f0-1952-40af-a95f-196f9af2e0e4 72357 1524 70855 + <utig1-54486>utig1-54488>utig1-67301 199307 23507 93085 71028 73380 60 NM:i:2352 AS:f:56666.7 dv:f:0.0320523 id:f:0.967948
bc54a96c-22de-463e-a342-4b2d22557482 68101 7131 66601 + <utig1-54486>utig1-54488>utig1-67301 199307 14444 74128 61478 63265 60 NM:i:1787 AS:f:50568.6 dv:f:0.0282463 id:f:0.971754
89c21ef8-7347-4423-961e-6686584f7eb5 45635 1567 44135 + <utig1-67301<utig1-54488>utig1-54486 199307 126694 169630 43351 46983 60 NM:i:3632 AS:f:21378.9 dv:f:0.0773046 id:f:0.922695
f16ab9ce-6727-4ccc-a1c8-b498376cbee2 135448 1557 133702 + <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899 233130 69629 202834 127570 139998 60 NM:i:12428 AS:f:52374.5 dv:f:0.0887727 id:f:0.911227
6406a89c-ea2c-4744-8bf6-fdee9267cacd 75225 1560 73626 + <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899 233130 138064 210357 72550 76689 60 NM:i:4139 AS:f:47500.3 dv:f:0.0539712 id:f:0.946029
185a294e-bc26-4af4-9c23-6eb16a8ff654 39310 1526 37810 + <utig1-67301<utig1-54488>utig1-54486 199307 148109 184741 38355 40129 60 NM:i:1774 AS:f:27469.2 dv:f:0.0442074 id:f:0.955793
d22b0582-efc5-4200-8f95-210e95dfecc0 24035 1526 22527 + <utig1-67301<utig1-54488>utig1-54486 199307 150638 171785 23700 24335 60 NM:i:635 AS:f:19771.9 dv:f:0.0260941 id:f:0.973906
db46551b-4d66-4f1d-a37f-984e2714e4b3 50473 2402 48973 + <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899 233130 154788 201883 46990 51364 60 NM:i:4374 AS:f:20440.2 dv:f:0.0851569 id:f:0.914843
3a4465c1-d5a7-4471-b98b-1a5b214d4096 167251 1567 165751 + <utig1-13188>utig1-13190<utig1-53743<utig1-12481<utig1-12477>utig1-12478<utig1-19899>utig1-19900<utig1-54486>utig1-54488>utig1-67301 308584 3333 168258 161361 171019 60 NM:i:9658 AS:f:102862 dv:f:0.0564733 id:f:0.943527
7c32db83-12dd-409f-bcae-c3d9f9a3c937 86793 13320 85287 + <utig1-67301<utig1-54488>utig1-54486 199307 111829 184181 73357 76296 60 NM:i:2939 AS:f:55393.3 dv:f:0.038521 id:f:0.961479
09ad15d0-5080-49b9-9b4f-38dc312a31a9 115114 1564 112773 + <utig1-54486>utig1-54488>utig1-67301 199307 18374 131658 108983 118536 60 NM:i:9553 AS:f:50586 dv:f:0.0805916 id:f:0.919408
a24ce0d7-9dbb-45bb-8d4f-5e5ab7aab14b 49592 1546 48091 + <utig1-67301<utig1-54488>utig1-54486 199307 125490 172501 48261 50684 60 NM:i:2423 AS:f:33407.8 dv:f:0.047806 id:f:0.952194
with 32 entries vs
% grep utig1-67301 4-processONT/alns-ont-filter-trim.gaf |grep -v utig1-56431 |awk '{l=split($6, a, ">|<"); if (l>2) print $0}'|grep utig1-67299
f83c6b3c-8cc1-4c73-83d4-ab0bddad7166 13133 1521 11625 + >utig1-67299>utig1-67301 194339 25607 35796 12902 13303 60 NM:i:401 AS:f:10433.3 dv:f:0.0301436 id:f:0.969856
49b0aa13-3b7d-45f8-9cf9-665110560b0c 36816 3441 35236 + >utig1-67299>utig1-67301 194339 20159 51957 33962 35320 60 NM:i:1358 AS:f:25750.7 dv:f:0.0384485 id:f:0.961552
f5b2ceb9-a15b-46c1-a14f-51074f48ffa4 38272 1516 36718 + <utig1-67301<utig1-67299 194339 145031 180393 37554 38736 60 NM:i:1182 AS:f:30329.9 dv:f:0.0305143 id:f:0.969486
47f052ca-cd17-402a-b530-9646e75b979b 122990 1526 121490 + >utig1-67299>utig1-67301 194339 26017 146357 119731 125110 60 NM:i:5379 AS:f:87139.9 dv:f:0.0429942 id:f:0.957006
7147b6a9-3d25-4c8b-8aa1-242be488e974 64289 1518 62789 + >utig1-53902<utig1-67300>utig1-67299>utig1-67301 216540 7128 68698 63165 65146 60 NM:i:1981 AS:f:51077.5 dv:f:0.0304086 id:f:0.969591
db630cd6-aac9-4a57-b2cf-0c889b2cf241 46628 1524 45110 + >utig1-67299>utig1-67301 194339 15891 59717 45496 47409 60 NM:i:1913 AS:f:33845.4 dv:f:0.040351 id:f:0.959649
c27853ae-995f-4d04-a666-307c19e7b089 53630 1523 52123 + >utig1-67299>utig1-67301 194339 16131 67001 52974 54221 60 NM:i:1247 AS:f:45295 dv:f:0.0229985 id:f:0.977002
bdd014a7-7589-47de-b199-c80b3b68071c 64386 1551 60780 + <utig1-67301<utig1-67299>utig1-67300 199821 134666 194404 59522 64020 60 NM:i:4498 AS:f:32272.3 dv:f:0.0702593 id:f:0.929741
871efd15-db23-4f9d-aafe-0eb0585732c5 62791 1579 60177 + >utig1-67299>utig1-67301 194339 17256 76125 59439 63008 60 NM:i:3569 AS:f:37828.5 dv:f:0.0566436 id:f:0.943356
3e9c26b7-230a-4553-b156-ca3cdd67e3ed 109925 1615 108303 + >utig1-67299>utig1-67301 194339 3636 110537 106154 111928 60 NM:i:5774 AS:f:71233.2 dv:f:0.0515867 id:f:0.948413
227100ac-80e6-41bc-814a-cd60fb7ee0b7 83967 1526 82457 + <utig1-67301<utig1-67299>utig1-67300<utig1-53902 216540 127329 208580 82589 85073 60 NM:i:2484 AS:f:67387.6 dv:f:0.0291985 id:f:0.970802
c0b7fa70-7a3e-4e00-a8b7-b8c1b871d1a4 42816 1532 41316 + <utig1-67301<utig1-67299>utig1-67300 199821 154474 194477 41300 43793 60 NM:i:2493 AS:f:26180.6 dv:f:0.0569269 id:f:0.943073
458ba829-fc38-49ec-9fb8-fa7c159b07e8 22194 1544 20694 + >utig1-67299>utig1-67301 194339 16269 35567 21748 22521 60 NM:i:773 AS:f:17001.8 dv:f:0.0343235 id:f:0.965676
0abff07d-4cc2-47da-a683-d22c74e0cd11 28565 1528 27065 + >utig1-67299>utig1-67301 194339 22564 48223 27980 29001 60 NM:i:1021 AS:f:21737.1 dv:f:0.0352057 id:f:0.964794
d47d0ec0-6d8d-471f-b9ea-d434b85ec3fe 80446 1557 78944 + >utig1-67299>utig1-67301 194339 10453 88169 78661 81646 60 NM:i:2985 AS:f:60506.9 dv:f:0.0365603 id:f:0.96344
1210650d-2edf-4a99-a800-62f498cbdfe3 52547 1522 51047 + >utig1-67299>utig1-67301 194339 10818 60711 51635 53356 60 NM:i:1721 AS:f:41063.1 dv:f:0.032255 id:f:0.967745
c6cb342b-7827-49a9-bf19-a45524ccc7d9 23513 1531 22013 + >utig1-67299>utig1-67301 194339 25972 46607 23086 23830 60 NM:i:744 AS:f:18527 dv:f:0.0312211 id:f:0.968779
cd7a9442-0846-4268-837f-89c51260cea7 75074 1524 73563 + <utig1-67301<utig1-67299>utig1-67300<utig1-53902 216540 131152 203384 73907 75950 60 NM:i:2043 AS:f:61432.6 dv:f:0.0268993 id:f:0.973101
978cd24f-0868-4c44-98c8-584e472e9d3d 38349 1521 36796 + >utig1-67299>utig1-67301 194339 25166 60506 37810 38636 60 NM:i:826 AS:f:32773.8 dv:f:0.021379 id:f:0.978621
ac7c8d01-ff02-4255-85d7-17769f9de2db 53795 1555 52295 + <utig1-67301<utig1-67299 194339 125606 176749 50935 55564 60 NM:i:4629 AS:f:22910.9 dv:f:0.0833093 id:f:0.916691
3aa40bf9-0953-47f6-b819-76fe8dc83814 105188 1556 103466 + <utig1-53900>utig1-53898>utig1-53902<utig1-67300>utig1-67299>utig1-67301 218328 1148 102992 101279 106931 60 NM:i:5652 AS:f:67267.7 dv:f:0.0528565 id:f:0.947143
f0ef3169-e480-4743-a1c1-1105c1506233 32261 1522 30760 + >utig1-67299>utig1-67301 194339 8965 38334 31863 32579 60 NM:i:716 AS:f:27469.4 dv:f:0.0219773 id:f:0.978023
c3192174-3ec1-4073-aa84-0804e6f22efe 62617 1523 61109 + <utig1-67301<utig1-67299>utig1-67300<utig1-53902 216540 142405 202324 61211 63669 60 NM:i:2458 AS:f:46215.7 dv:f:0.0386059 id:f:0.961394
6e198916-4db2-42a9-b7ac-8f2f29b0a6b5 58123 1522 56615 + <utig1-67301<utig1-67299>utig1-67300<utig1-53902 216540 158955 214240 57342 58769 60 NM:i:1427 AS:f:48589.2 dv:f:0.0242815 id:f:0.975718
with 24 entries.
The crosslink removal was falsely removing the within-chr17 connection. 1a68d3e fixes the false removal and keeps the within-chr17 connection, but still doesn't remove the cross-chromosome connections because both of them have higher ONT support than the within-chr17 connection.
Keeping all the connections given the read evidence here is correct I think.
@mrvollger the issue in that region seems to be a chromosome rearrangement in CHM1 where chr16/17 have a translocation between them in a subset of the cells. There's good ONT evidence for both versions. If you have a chance to re-run with latest from GitHub to confirm the final graph has all 4 traversals, that'd be good.
I am surprised by the cross chromosome connections. I don't see any SDs here so I wonder why it's happening. Maybe a SD missing from T2T.
Forgot about that. Thanks! And yeah I'll rerun as soon as I get a chance!
I assume this is resolved
sorry, yes it is.
I got a CHM1 assembly to run from start to end! Thanks so much for developing this and thanks @skoren for all the help in running.
I did notice that in both the ~Dec assembly you shared with us and the one I just finished there is a break on chr16 (chr16:88,754,899-88,760,270). I don't see any SDs, repeats, or GA tract, so I thought it might be a possible bug. Let me know if you want to investigate and if so what files I should share.