marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
305 stars 30 forks source link

Break at ~chr16:88,754,899-88,760,270 #43

Closed mrvollger closed 2 years ago

mrvollger commented 2 years ago

I got a CHM1 assembly to run from start to end! Thanks so much for developing this and thanks @skoren for all the help in running.

I did notice that in both the ~Dec assembly you shared with us and the one I just finished there is a break on chr16 (chr16:88,754,899-88,760,270). I don't see any SDs, repeats, or GA tract, so I thought it might be a possible bug. Let me know if you want to investigate and if so what files I should share.

skoren commented 2 years ago

Data is available on biowulf in / data/korens/test/mikko_cns/chm1/mitchell_bug43/

skoren commented 2 years ago

The issue seems to be a lost connection from the HiFi to ONT-resolved graph:

Screen Shot 2022-02-02 at 9 48 37 AM Screen Shot 2022-02-02 at 9 48 30 AM

Blue denotes chr16 and red chr17 based on alignments to CHM13 in the ONT graph. The ONT-resolved node marked in red in the HiFi graph becomes utig2-29896. This node includes utig1-54488 causing the connection between utig1-67301 (aka utig2-29896) and utig1-67299 (aka utig2-28347) to be lost.

Looking at ONT alignments, it seems both resolutions are supported so I'm not sure why one is chosen:

% grep utig1-67301 4-processONT/alns-ont-filter-trim.gaf |grep -v utig1-56431 |awk '{l=split($6, a, ">|<"); if (l>2) print $0}'|grep utig1-54488
4d142adc-8d35-4a12-9892-fd198955a2d6    165329  1551    163681  +       <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899<utig1-12478>utig1-12477>utig1-12481>utig1-53743<utig1-13190>utig1-13188>utig1-13192>utig1-21216    325100  152812  316356  155552  171011  60      NM:i:15459      AS:f:62173.1    dv:f:0.0903977  id:f:0.909602
278e1347-30b2-49ba-965c-8ab79c184614    52242   1519    50742   +       <utig1-67301<utig1-54488>utig1-54486    199307  141863  191300  50858   53184   60      NM:i:2326       AS:f:36731.8    dv:f:0.043735   id:f:0.956265
cd71c36e-65e1-4079-bf2d-38ee8470a990    57150   1525    55628   +       <utig1-67301<utig1-54488>utig1-54486    199307  121425  175823  56359   57870   60      NM:i:1511       AS:f:47039.7    dv:f:0.0261102  id:f:0.97389
f8ab3c19-721c-41ac-b8fe-253220ae5918    94004   1523    92488   +       <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899<utig1-12478>utig1-12477    263501  152232  244148  90840   96425   60      NM:i:5585       AS:f:56768.9    dv:f:0.0579207  id:f:0.942079
168b8cc8-c9b0-4c2a-bfc3-2237f2a94c42    34771   1521    33266   +       <utig1-54486>utig1-54488>utig1-67301    199307  21558   53384   34172   35171   60      NM:i:999        AS:f:28091.7    dv:f:0.0284041  id:f:0.971596
f9afbfd2-4475-4669-a845-aa1913106522    29478   1522    27978   +       <utig1-54486>utig1-54488>utig1-67301    199307  15399   42012   28612   30082   60      NM:i:1470       AS:f:19665.8    dv:f:0.0488664  id:f:0.951134
040deef9-5765-4ae5-8b67-627e94f76108    35835   1522    34334   +       <utig1-54486>utig1-54488>utig1-67301    199307  25223   58203   35185   36339   60      NM:i:1154       AS:f:28126.4    dv:f:0.0317565  id:f:0.968243
03c062f0-fc40-45e8-85c6-65a01ca4476a    44094   1502    42553   +       <utig1-67301<utig1-54488>utig1-54486    199307  145780  187056  43404   44668   60      NM:i:1264       AS:f:35632.8    dv:f:0.0282977  id:f:0.971702
0390412b-f55d-4dc0-8dab-33516520db08    58672   1523    57147   +       <utig1-54486>utig1-54488>utig1-67301    199307  17331   73340   57610   59593   60      NM:i:1983       AS:f:45417.2    dv:f:0.0332757  id:f:0.966724
1c8ff06b-3423-458b-abf8-7823d903f68f    103056  1535    101548  +       <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899<utig1-12478        234815  133473  234292  100351  105207  60      NM:i:4856       AS:f:70672      dv:f:0.0461566  id:f:0.953843
17ec5f4a-6ce4-4279-b5e2-e1be96569c4e    55269   1526    53762   +       <utig1-19899>utig1-19900<utig1-54486>utig1-54488>utig1-67301    233130  21794   74089   54080   55982   60      NM:i:1902       AS:f:42568.7    dv:f:0.0339752  id:f:0.966025
1176c537-1420-4b4f-b152-0e17e373a267    69820   1521    68312   +       <utig1-19899>utig1-19900<utig1-54486>utig1-54488>utig1-67301    233130  29591   96570   68223   70873   60      NM:i:2650       AS:f:52142      dv:f:0.0373908  id:f:0.962609
b636457a-e7c7-4ce3-898b-667ac8a9c7fd    29323   1523    27703   +       <utig1-67301<utig1-54488>utig1-54486    199307  145349  171656  28666   29608   60      NM:i:942        AS:f:22906.3    dv:f:0.0318157  id:f:0.968184
21507e8c-ac66-47b4-b564-74f4b78d2f8c    74433   1552    72925   +       <utig1-67301<utig1-54488>utig1-54486    199307  108895  180745  72199   75994   60      NM:i:3795       AS:f:49098.3    dv:f:0.0499382  id:f:0.950062
e19cdabf-0f22-4ed7-80b8-e78e9500d60b    42511   1532    40958   +       <utig1-67301<utig1-54488        169253  129010  168839  41470   43374   60      NM:i:1904       AS:f:29745.4    dv:f:0.0438973  id:f:0.956103
2c053df8-8257-4172-91a0-c8813d7d7f5a    23541   1570    22041   +       <utig1-67301<utig1-54488>utig1-54486    199307  155029  175534  22659   23984   60      NM:i:1325       AS:f:14646.5    dv:f:0.0552452  id:f:0.944755
5efd0f90-a0bf-4fad-ac1b-e98ae0a2c93d    49408   1529    47895   +       <utig1-67301<utig1-54488>utig1-54486    199307  138271  184834  47962   50369   60      NM:i:2407       AS:f:33335.4    dv:f:0.0477873  id:f:0.952213
6e1e5777-c185-493a-a053-b0a4c99802db    79883   1522    78383   +       <utig1-54486>utig1-54488>utig1-67301    199307  4469    81776   78566   81057   60      NM:i:2491       AS:f:63270.9    dv:f:0.0307315  id:f:0.969269
b5c8d4f8-d86e-4153-9e2c-de014346d7c9    69524   1561    68024   +       <utig1-67301<utig1-54488>utig1-54486    199307  128953  195382  68036   70309   60      NM:i:2273       AS:f:54324.8    dv:f:0.0323287  id:f:0.967671
a303d13a-099f-47ba-b5e4-3df09907947a    50485   1520    43816   +       <utig1-67301<utig1-54488>utig1-54486    199307  140105  183377  43452   47142   60      NM:i:3690       AS:f:20720.6    dv:f:0.0782742  id:f:0.921726
a85221f0-1952-40af-a95f-196f9af2e0e4    72357   1524    70855   +       <utig1-54486>utig1-54488>utig1-67301    199307  23507   93085   71028   73380   60      NM:i:2352       AS:f:56666.7    dv:f:0.0320523  id:f:0.967948
bc54a96c-22de-463e-a342-4b2d22557482    68101   7131    66601   +       <utig1-54486>utig1-54488>utig1-67301    199307  14444   74128   61478   63265   60      NM:i:1787       AS:f:50568.6    dv:f:0.0282463  id:f:0.971754
89c21ef8-7347-4423-961e-6686584f7eb5    45635   1567    44135   +       <utig1-67301<utig1-54488>utig1-54486    199307  126694  169630  43351   46983   60      NM:i:3632       AS:f:21378.9    dv:f:0.0773046  id:f:0.922695
f16ab9ce-6727-4ccc-a1c8-b498376cbee2    135448  1557    133702  +       <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899    233130  69629   202834  127570  139998  60      NM:i:12428      AS:f:52374.5    dv:f:0.0887727  id:f:0.911227
6406a89c-ea2c-4744-8bf6-fdee9267cacd    75225   1560    73626   +       <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899    233130  138064  210357  72550   76689   60      NM:i:4139       AS:f:47500.3    dv:f:0.0539712  id:f:0.946029
185a294e-bc26-4af4-9c23-6eb16a8ff654    39310   1526    37810   +       <utig1-67301<utig1-54488>utig1-54486    199307  148109  184741  38355   40129   60      NM:i:1774       AS:f:27469.2    dv:f:0.0442074  id:f:0.955793
d22b0582-efc5-4200-8f95-210e95dfecc0    24035   1526    22527   +       <utig1-67301<utig1-54488>utig1-54486    199307  150638  171785  23700   24335   60      NM:i:635        AS:f:19771.9    dv:f:0.0260941  id:f:0.973906
db46551b-4d66-4f1d-a37f-984e2714e4b3    50473   2402    48973   +       <utig1-67301<utig1-54488>utig1-54486<utig1-19900>utig1-19899    233130  154788  201883  46990   51364   60      NM:i:4374       AS:f:20440.2    dv:f:0.0851569  id:f:0.914843
3a4465c1-d5a7-4471-b98b-1a5b214d4096    167251  1567    165751  +       <utig1-13188>utig1-13190<utig1-53743<utig1-12481<utig1-12477>utig1-12478<utig1-19899>utig1-19900<utig1-54486>utig1-54488>utig1-67301    308584  3333    168258  161361  171019  60      NM:i:9658       AS:f:102862     dv:f:0.0564733  id:f:0.943527
7c32db83-12dd-409f-bcae-c3d9f9a3c937    86793   13320   85287   +       <utig1-67301<utig1-54488>utig1-54486    199307  111829  184181  73357   76296   60      NM:i:2939       AS:f:55393.3    dv:f:0.038521   id:f:0.961479
09ad15d0-5080-49b9-9b4f-38dc312a31a9    115114  1564    112773  +       <utig1-54486>utig1-54488>utig1-67301    199307  18374   131658  108983  118536  60      NM:i:9553       AS:f:50586      dv:f:0.0805916  id:f:0.919408
a24ce0d7-9dbb-45bb-8d4f-5e5ab7aab14b    49592   1546    48091   +       <utig1-67301<utig1-54488>utig1-54486    199307  125490  172501  48261   50684   60      NM:i:2423       AS:f:33407.8    dv:f:0.047806   id:f:0.952194

with 32 entries vs

% grep utig1-67301 4-processONT/alns-ont-filter-trim.gaf |grep -v utig1-56431 |awk '{l=split($6, a, ">|<"); if (l>2) print $0}'|grep utig1-67299
f83c6b3c-8cc1-4c73-83d4-ab0bddad7166    13133   1521    11625   +       >utig1-67299>utig1-67301        194339  25607   35796   12902   13303   60      NM:i:401        AS:f:10433.3    dv:f:0.0301436  id:f:0.969856
49b0aa13-3b7d-45f8-9cf9-665110560b0c    36816   3441    35236   +       >utig1-67299>utig1-67301        194339  20159   51957   33962   35320   60      NM:i:1358       AS:f:25750.7    dv:f:0.0384485  id:f:0.961552
f5b2ceb9-a15b-46c1-a14f-51074f48ffa4    38272   1516    36718   +       <utig1-67301<utig1-67299        194339  145031  180393  37554   38736   60      NM:i:1182       AS:f:30329.9    dv:f:0.0305143  id:f:0.969486
47f052ca-cd17-402a-b530-9646e75b979b    122990  1526    121490  +       >utig1-67299>utig1-67301        194339  26017   146357  119731  125110  60      NM:i:5379       AS:f:87139.9    dv:f:0.0429942  id:f:0.957006
7147b6a9-3d25-4c8b-8aa1-242be488e974    64289   1518    62789   +       >utig1-53902<utig1-67300>utig1-67299>utig1-67301        216540  7128    68698   63165   65146   60      NM:i:1981       AS:f:51077.5    dv:f:0.0304086  id:f:0.969591
db630cd6-aac9-4a57-b2cf-0c889b2cf241    46628   1524    45110   +       >utig1-67299>utig1-67301        194339  15891   59717   45496   47409   60      NM:i:1913       AS:f:33845.4    dv:f:0.040351   id:f:0.959649
c27853ae-995f-4d04-a666-307c19e7b089    53630   1523    52123   +       >utig1-67299>utig1-67301        194339  16131   67001   52974   54221   60      NM:i:1247       AS:f:45295      dv:f:0.0229985  id:f:0.977002
bdd014a7-7589-47de-b199-c80b3b68071c    64386   1551    60780   +       <utig1-67301<utig1-67299>utig1-67300    199821  134666  194404  59522   64020   60      NM:i:4498       AS:f:32272.3    dv:f:0.0702593  id:f:0.929741
871efd15-db23-4f9d-aafe-0eb0585732c5    62791   1579    60177   +       >utig1-67299>utig1-67301        194339  17256   76125   59439   63008   60      NM:i:3569       AS:f:37828.5    dv:f:0.0566436  id:f:0.943356
3e9c26b7-230a-4553-b156-ca3cdd67e3ed    109925  1615    108303  +       >utig1-67299>utig1-67301        194339  3636    110537  106154  111928  60      NM:i:5774       AS:f:71233.2    dv:f:0.0515867  id:f:0.948413
227100ac-80e6-41bc-814a-cd60fb7ee0b7    83967   1526    82457   +       <utig1-67301<utig1-67299>utig1-67300<utig1-53902        216540  127329  208580  82589   85073   60      NM:i:2484       AS:f:67387.6    dv:f:0.0291985  id:f:0.970802
c0b7fa70-7a3e-4e00-a8b7-b8c1b871d1a4    42816   1532    41316   +       <utig1-67301<utig1-67299>utig1-67300    199821  154474  194477  41300   43793   60      NM:i:2493       AS:f:26180.6    dv:f:0.0569269  id:f:0.943073
458ba829-fc38-49ec-9fb8-fa7c159b07e8    22194   1544    20694   +       >utig1-67299>utig1-67301        194339  16269   35567   21748   22521   60      NM:i:773        AS:f:17001.8    dv:f:0.0343235  id:f:0.965676
0abff07d-4cc2-47da-a683-d22c74e0cd11    28565   1528    27065   +       >utig1-67299>utig1-67301        194339  22564   48223   27980   29001   60      NM:i:1021       AS:f:21737.1    dv:f:0.0352057  id:f:0.964794
d47d0ec0-6d8d-471f-b9ea-d434b85ec3fe    80446   1557    78944   +       >utig1-67299>utig1-67301        194339  10453   88169   78661   81646   60      NM:i:2985       AS:f:60506.9    dv:f:0.0365603  id:f:0.96344
1210650d-2edf-4a99-a800-62f498cbdfe3    52547   1522    51047   +       >utig1-67299>utig1-67301        194339  10818   60711   51635   53356   60      NM:i:1721       AS:f:41063.1    dv:f:0.032255   id:f:0.967745
c6cb342b-7827-49a9-bf19-a45524ccc7d9    23513   1531    22013   +       >utig1-67299>utig1-67301        194339  25972   46607   23086   23830   60      NM:i:744        AS:f:18527      dv:f:0.0312211  id:f:0.968779
cd7a9442-0846-4268-837f-89c51260cea7    75074   1524    73563   +       <utig1-67301<utig1-67299>utig1-67300<utig1-53902        216540  131152  203384  73907   75950   60      NM:i:2043       AS:f:61432.6    dv:f:0.0268993  id:f:0.973101
978cd24f-0868-4c44-98c8-584e472e9d3d    38349   1521    36796   +       >utig1-67299>utig1-67301        194339  25166   60506   37810   38636   60      NM:i:826        AS:f:32773.8    dv:f:0.021379   id:f:0.978621
ac7c8d01-ff02-4255-85d7-17769f9de2db    53795   1555    52295   +       <utig1-67301<utig1-67299        194339  125606  176749  50935   55564   60      NM:i:4629       AS:f:22910.9    dv:f:0.0833093  id:f:0.916691
3aa40bf9-0953-47f6-b819-76fe8dc83814    105188  1556    103466  +       <utig1-53900>utig1-53898>utig1-53902<utig1-67300>utig1-67299>utig1-67301        218328  1148    102992  101279  106931  60      NM:i:5652       AS:f:67267.7    dv:f:0.0528565  id:f:0.947143
f0ef3169-e480-4743-a1c1-1105c1506233    32261   1522    30760   +       >utig1-67299>utig1-67301        194339  8965    38334   31863   32579   60      NM:i:716        AS:f:27469.4    dv:f:0.0219773  id:f:0.978023
c3192174-3ec1-4073-aa84-0804e6f22efe    62617   1523    61109   +       <utig1-67301<utig1-67299>utig1-67300<utig1-53902        216540  142405  202324  61211   63669   60      NM:i:2458       AS:f:46215.7    dv:f:0.0386059  id:f:0.961394
6e198916-4db2-42a9-b7ac-8f2f29b0a6b5    58123   1522    56615   +       <utig1-67301<utig1-67299>utig1-67300<utig1-53902        216540  158955  214240  57342   58769   60      NM:i:1427       AS:f:48589.2    dv:f:0.0242815  id:f:0.975718

with 24 entries.

maickrau commented 2 years ago

The crosslink removal was falsely removing the within-chr17 connection. 1a68d3e fixes the false removal and keeps the within-chr17 connection, but still doesn't remove the cross-chromosome connections because both of them have higher ONT support than the within-chr17 connection.

skoren commented 2 years ago

Keeping all the connections given the read evidence here is correct I think.

@mrvollger the issue in that region seems to be a chromosome rearrangement in CHM1 where chr16/17 have a translocation between them in a subset of the cells. There's good ONT evidence for both versions. If you have a chance to re-run with latest from GitHub to confirm the final graph has all 4 traversals, that'd be good.

mrvollger commented 2 years ago

I am surprised by the cross chromosome connections. I don't see any SDs here so I wonder why it's happening. Maybe a SD missing from T2T.

mrvollger commented 2 years ago

Forgot about that. Thanks! And yeah I'll rerun as soon as I get a chance!

skoren commented 2 years ago

I assume this is resolved

mrvollger commented 2 years ago

sorry, yes it is.