marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

Bimodal distribution of corrected read length #2220

Closed AnthonyPiot91 closed 1 year ago

AnthonyPiot91 commented 1 year ago

Hello,

I am trying to assemble a small but repetitive bacterial genome with numerous linear and circular plasmids using Oxford nanopore long reads.

While using canu, I'm concerned about the read length distribution of the corrected reads. The distribution is bimodal with very few reads between 5'000 and 12'000bp. This does not reflect the distribution of the original raw reads.

I don't really know what could cause this drop in the read length distribution. Is it something I should worry about ? If so what could I do to improve the assembly ?

Command used : canu -p $STRAIN \ -d $OUTPUT_DIR/"$STRAIN" \ genomeSize=1.4m \ maxThreads=16 \ useGrid=false \ -nanopore $INPUT_FILE_PATH

Version: canu 2.2 on computing server

Here is canu's report, thanks for your help.

[CORRECTION/READS]
--
-- In sequence store './BB16-15-2.seqStore':
--   Found 60471 reads.
--   Found 280000920 bases (200 times coverage).
--    Histogram of raw reads:
--    
--    G=280000920                        sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        19657      1025     28001146  ||       1000-2839        27423|---------------------------------------------------------------
--    00020        13378      2795     56004028  ||       2840-4679        13549|--------------------------------
--    00030        10246      5203     84001133  ||       4680-6519         7359|-----------------
--    00040         8154      8282    112001583  ||       6520-8359         4257|----------
--    00050         6528     12119    140001997  ||       8360-10199        2618|-------
--    00060         5212     16923    168001824  ||      10200-12039        1621|----
--    00070         4099     22984    196001178  ||      12040-13879        1090|---
--    00080         3063     30879    224001734  ||      13880-15719         707|--
--    00090         2090     41884    252001025  ||      15720-17559         465|--
--    00100         1000     60470    280000920  ||      17560-19399         316|-
--    001.000x               60471    280000920  ||      19400-21239         242|-
--                                               ||      21240-23079         194|-
--                                               ||      23080-24919         137|-
--                                               ||      24920-26759         104|-
--                                               ||      26760-28599          89|-
--                                               ||      28600-30439          79|-
--                                               ||      30440-32279          35|-
--                                               ||      32280-34119          37|-
--                                               ||      34120-35959          36|-
--                                               ||      35960-37799          20|-
--                                               ||      37800-39639          14|-
--                                               ||      39640-41479          11|-
--                                               ||      41480-43319           7|-
--                                               ||      43320-45159          11|-
--                                               ||      45160-46999          11|-
--                                               ||      47000-48839           7|-
--                                               ||      48840-50679           3|-
--                                               ||      50680-52519           5|-
--                                               ||      52520-54359           6|-
--                                               ||      54360-56199           2|-
--                                               ||      56200-58039           3|-
--                                               ||      58040-59879           1|-
--                                               ||      59880-61719           4|-
--                                               ||      61720-63559           3|-
--                                               ||      63560-65399           1|-
--                                               ||      65400-67239           1|-
--                                               ||      67240-69079           0|
--                                               ||      69080-70919           0|
--                                               ||      70920-72759           0|
--                                               ||      72760-74599           0|
--                                               ||      74600-76439           0|
--                                               ||      76440-78279           0|
--                                               ||      78280-80119           2|-
--                                               ||      80120-81959           0|
--                                               ||      81960-83799           0|
--                                               ||      83800-85639           0|
--                                               ||      85640-87479           0|
--                                               ||      87480-89319           0|
--                                               ||      89320-91159           0|
--                                               ||      91160-92999           1|-
--

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2   7942340 ********************************************************************** 0.4679 0.0675
--       3-     4   4541581 ****************************************                               0.6457 0.1060
--       5-     7   1850087 ****************                                                       0.7880 0.1508
--       8-    11    763899 ******                                                                 0.8608 0.1864
--      12-    16    340474 ***                                                                    0.8953 0.2117
--      17-    22    160794 *                                                                      0.9118 0.2291
--      23-    29     83316                                                                        0.9200 0.2410
--      30-    37     49154                                                                        0.9244 0.2495
--      38-    46     35876                                                                        0.9271 0.2561
--      47-    56     33724                                                                        0.9292 0.2624
--      57-    67     39970                                                                        0.9311 0.2699
--      68-    79     50938                                                                        0.9335 0.2808
--      80-    92     69375                                                                        0.9366 0.2973
--      93-   106     94457                                                                        0.9407 0.3234
--     107-   121    132079 *                                                                      0.9464 0.3647
--     122-   137    176901 *                                                                      0.9543 0.4305
--     138-   154    205111 *                                                                      0.9649 0.5298
--     155-   172    182599 *                                                                      0.9769 0.6578
--     173-   191    110267                                                                        0.9875 0.7823
--     192-   211     41114                                                                        0.9937 0.8640
--     212-   232     14696                                                                        0.9960 0.8971
--     233-   254      7614                                                                        0.9968 0.9103
--     255-   277      5770                                                                        0.9972 0.9180
--     278-   301      4951                                                                        0.9976 0.9245
--     302-   326      4713                                                                        0.9979 0.9305
--     327-   352      4423                                                                        0.9981 0.9368
--     353-   379      3626                                                                        0.9984 0.9432
--     380-   407      3505                                                                        0.9986 0.9488
--     408-   436      3402                                                                        0.9988 0.9547
--     437-   466      3061                                                                        0.9990 0.9607
--     467-   497      2988                                                                        0.9992 0.9666
--     498-   529      2903                                                                        0.9994 0.9727
--     530-   562      2408                                                                        0.9996 0.9791
--     563-   596      1911                                                                        0.9997 0.9846
--     597-   631      1142                                                                        0.9998 0.9893
--     632-   667       647                                                                        0.9999 0.9922
--     668-   704       286                                                                        0.9999 0.9939
--     705-   742       199                                                                        0.9999 0.9947
--     743-   781       206                                                                        0.9999 0.9954
--     782-   821       118                                                                        0.9999 0.9960
--
--           0 (max occurrences)
--   235311086 (total mers, non-unique)
--    16973392 (distinct mers, non-unique)
--           0 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads              44719         60768
--   Number of Bases          250024003      29865851
--   Coverage                   178.589        21.333
--   Median                        4060             0
--   Mean                          5591           491
--   N50                           7341          1942
--   Minimum                       1000             0
--   Maximum                      92979         10723
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads              48790           2836          2836           2536          2536
--   Number of Bases          256042073       56318038      56011425        7735802       5915188
--   Coverage                   182.887         40.227        40.008          5.526         4.225
--   Median                        3783          17316         17177           2769          1961
--   Mean                          5247          19858         19750           3050          2332
--   N50                           7195          19551         19459           3078          2549
--   Minimum                       1000          13170         13155           1309          1002
--   Maximum                      92979          79768         77964          11255         11171
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads             100115        100115
--   Number of Bases          215836014     133228205
--   Coverage                   154.169        95.163
--   Median                        1251             0
--   Mean                          2155          1330
--   N50                           5342          6622
--   Minimum                          0             0
--   Maximum                      92979         92966
--   
--   Maximum Memory          1492791932

[TRIMMING/READS]
--
-- In sequence store './BB16-15-2.seqStore':
--   Found 5361 reads.
--   Found 62227099 bases (44.44 times coverage).
--    Histogram of corrected reads:
--    
--    G=62227099                         sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        34356       145      6251216  ||       1002-2313         1604|---------------------------------------------------------------
--    00020        27573       350     12447602  ||       2314-3625          575|-----------------------
--    00030        23509       597     18688208  ||       3626-4937          208|---------
--    00040        20754       879     24906982  ||       4938-6249           95|----
--    00050        18523      1197     31129187  ||       6250-7561           37|--
--    00060        16755      1551     37351710  ||       7562-8873           10|-
--    00070        15424      1938     43566975  ||       8874-10185           1|-
--    00080        14303      2358     49789046  ||      10186-11497           2|-
--    00090        13282      2809     56015614  ||      11498-12809           2|-
--    00100         1002      5360     62227099  ||      12810-14121         380|---------------
--    001.000x                5361     62227099  ||      14122-15433         513|---------------------
--                                               ||      15434-16745         381|---------------
--                                               ||      16746-18057         266|-----------
--                                               ||      18058-19369         223|---------
--                                               ||      19370-20681         179|--------
--                                               ||      20682-21993         148|------
--                                               ||      21994-23305         126|-----
--                                               ||      23306-24617         108|-----
--                                               ||      24618-25929          71|---
--                                               ||      25930-27241          64|---
--                                               ||      27242-28553          69|---
--                                               ||      28554-29865          48|--
--                                               ||      29866-31177          43|--
--                                               ||      31178-32489          23|-
--                                               ||      32490-33801          29|--
--                                               ||      33802-35113          19|-
--                                               ||      35114-36425          30|--
--                                               ||      36426-37737          13|-
--                                               ||      37738-39049          13|-
--                                               ||      39050-40361           8|-
--                                               ||      40362-41673           7|-
--                                               ||      41674-42985           6|-
--                                               ||      42986-44297           9|-
--                                               ||      44298-45609           9|-
--                                               ||      45610-46921           8|-
--                                               ||      46922-48233           4|-
--                                               ||      48234-49545           3|-
--                                               ||      49546-50857           4|-
--                                               ||      50858-52169           4|-
--                                               ||      52170-53481           2|-
--                                               ||      53482-54793           5|-
--                                               ||      54794-56105           0|
--                                               ||      56106-57417           1|-
--                                               ||      57418-58729           2|-
--                                               ||      58730-60041           0|
--                                               ||      60042-61353           1|-
--                                               ||      61354-62665           4|-
--                                               ||      62666-63977           1|-
--                                               ||      63978-65289           1|-
--                                               ||      65290-66601           2|-
--

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2     91243 ******************                                                     0.0629 0.0030
--       3-     4     66898 *************                                                          0.0913 0.0050
--       5-     7     57069 ***********                                                            0.1236 0.0084
--       8-    11     64328 ************                                                           0.1607 0.0145
--      12-    16     58356 ***********                                                            0.2041 0.0253
--      17-    22     57078 ***********                                                            0.2398 0.0378
--      23-    29     58918 ***********                                                            0.2801 0.0575
--      30-    37     72218 **************                                                         0.3178 0.0813
--      38-    46    169567 *********************************                                      0.3704 0.1246
--      47-    56    349915 ********************************************************************** 0.5006 0.2588
--      57-    67    306598 *************************************************************          0.7482 0.5673
--      68-    79     58583 ***********                                                            0.9384 0.8442
--      80-    92      5518 *                                                                      0.9735 0.9039
--      93-   106      2690                                                                        0.9769 0.9107
--     107-   121      2517                                                                        0.9787 0.9151
--     122-   137      5283 *                                                                      0.9804 0.9197
--     138-   154      6395 *                                                                      0.9844 0.9323
--     155-   172      4765                                                                        0.9885 0.9465
--     173-   191      5253 *                                                                      0.9920 0.9603
--     192-   211      1827                                                                        0.9953 0.9743
--     212-   232      2897                                                                        0.9966 0.9803
--     233-   254      1577                                                                        0.9986 0.9912
--     255-   277        74                                                                        0.9996 0.9966
--     278-   301        89                                                                        0.9996 0.9969
--     302-   326        55                                                                        0.9997 0.9973
--     327-   352        98                                                                        0.9998 0.9977
--     353-   379        98                                                                        0.9998 0.9982
--     380-   407        35                                                                        0.9999 0.9987
--     408-   436        95                                                                        0.9999 0.9990
--     437-   466         4                                                                        1.0000 0.9996
--     467-   497         4                                                                        1.0000 0.9996
--     498-   529        21                                                                        1.0000 0.9997
--     530-   562         3                                                                        1.0000 0.9998
--     563-   596        10                                                                        1.0000 0.9999
--     597-   631         4                                                                        1.0000 1.0000
--
--           0 (max occurrences)
--    61333983 (total mers, non-unique)
--     1450083 (distinct mers, non-unique)
--           0 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1200    (use overlaps at or below this fraction error)
--      500    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        2    (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--  105487 reads     62227099 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--    1255 reads     21214316 bases (trimmed reads output)
--    4104 reads     40623595 bases (reads with no change, kept as is)
--  100126 reads            0 bases (reads with no overlaps, deleted)
--       2 reads         2146 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--     328 reads        51030 bases (bases trimmed from the 5' end of a read)
--    1001 reads       336012 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1200    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--    5359 reads     62224953 bases (reads processed)
--  100128 reads         2146 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--       0 reads            0 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--    5359 reads     62224953 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--      10 reads           10 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--      10 reads         3048 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--       0 reads            0 bases (trimmed from the 5' end of the read)
--      10 reads        78927 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In sequence store './BB16-15-2.seqStore':
--   Found 5359 reads.
--   Found 61758984 bases (44.11 times coverage).
--    Histogram of corrected-trimmed reads:
--    
--    G=61758984                         sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        33989       146      6200964  ||       1002-2313         1605|---------------------------------------------------------------
--    00020        27446       352     12370972  ||       2314-3625          576|-----------------------
--    00030        23326       598     18547108  ||       3626-4937          206|---------
--    00040        20593       880     24721519  ||       4938-6249           96|----
--    00050        18381      1198     30896108  ||       6250-7561           40|--
--    00060        16658      1551     37060100  ||       7562-8873           11|-
--    00070        15343      1938     43242717  ||       8874-10185           4|-
--    00080        14253      2356     49411299  ||      10186-11497           5|-
--    00090        13054      2805     55586208  ||      11498-12809           8|-
--    00100         1002      5358     61758984  ||      12810-14121         390|----------------
--    001.000x                5359     61758984  ||      14122-15433         508|--------------------
--                                               ||      15434-16745         382|---------------
--                                               ||      16746-18057         262|-----------
--                                               ||      18058-19369         226|---------
--                                               ||      19370-20681         173|-------
--                                               ||      20682-21993         145|------
--                                               ||      21994-23305         122|-----
--                                               ||      23306-24617         105|-----
--                                               ||      24618-25929          70|---
--                                               ||      25930-27241          64|---
--                                               ||      27242-28553          71|---
--                                               ||      28554-29865          49|--
--                                               ||      29866-31177          39|--
--                                               ||      31178-32489          24|-
--                                               ||      32490-33801          26|--
--                                               ||      33802-35113          20|-
--                                               ||      35114-36425          30|--
--                                               ||      36426-37737          14|-
--                                               ||      37738-39049          12|-
--                                               ||      39050-40361           7|-
--                                               ||      40362-41673           7|-
--                                               ||      41674-42985           5|-
--                                               ||      42986-44297           9|-
--                                               ||      44298-45609           9|-
--                                               ||      45610-46921           8|-
--                                               ||      46922-48233           4|-
--                                               ||      48234-49545           4|-
--                                               ||      49546-50857           3|-
--                                               ||      50858-52169           3|-
--                                               ||      52170-53481           2|-
--                                               ||      53482-54793           4|-
--                                               ||      54794-56105           0|
--                                               ||      56106-57417           1|-
--                                               ||      57418-58729           2|-
--                                               ||      58730-60041           0|
--                                               ||      60042-61353           1|-
--                                               ||      61354-62665           4|-
--                                               ||      62666-63977           1|-
--                                               ||      63978-65289           0|
--                                               ||      65290-66601           2|-
--

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2     90390 ******************                                                     0.0627 0.0030
--       3-     4     64950 *************                                                          0.0905 0.0049
--       5-     7     55655 ***********                                                            0.1231 0.0084
--       8-    11     66961 *************                                                          0.1584 0.0142
--      12-    16     56661 ***********                                                            0.2039 0.0255
--      17-    22     55139 ***********                                                            0.2380 0.0376
--      23-    29     57404 ***********                                                            0.2786 0.0573
--      30-    37     74338 **************                                                         0.3151 0.0805
--      38-    46    175777 ***********************************                                    0.3695 0.1252
--      47-    56    347273 ********************************************************************** 0.5057 0.2658
--      57-    67    302468 ************************************************************           0.7512 0.5722
--      68-    79     55800 ***********                                                            0.9393 0.8456
--      80-    92      5459 *                                                                      0.9734 0.9038
--      93-   106      2654                                                                        0.9768 0.9105
--     107-   121      2537                                                                        0.9786 0.9149
--     122-   137      5947 *                                                                      0.9803 0.9195
--     138-   154      5753 *                                                                      0.9846 0.9329
--     155-   172      4812                                                                        0.9885 0.9463
--     173-   191      5206 *                                                                      0.9921 0.9605
--     192-   211      1852                                                                        0.9954 0.9746
--     212-   232      2923                                                                        0.9966 0.9805
--     233-   254      1469                                                                        0.9987 0.9916
--     255-   277        74                                                                        0.9996 0.9966
--     278-   301        89                                                                        0.9996 0.9969
--     302-   326        67                                                                        0.9997 0.9973
--     327-   352        86                                                                        0.9998 0.9977
--     353-   379        98                                                                        0.9998 0.9982
--     380-   407        35                                                                        0.9999 0.9987
--     408-   436        95                                                                        0.9999 0.9990
--     437-   466         4                                                                        1.0000 0.9996
--     467-   497         9                                                                        1.0000 0.9996
--     498-   529        18                                                                        1.0000 0.9997
--     530-   562         1                                                                        1.0000 0.9999
--     563-   596        10                                                                        1.0000 0.9999
--     597-   631         4                                                                        1.0000 1.0000
--
--           0 (max occurrences)
--    60978465 (total mers, non-unique)
--     1442018 (distinct mers, non-unique)
--           0 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing          0    0.00        0.00 +- 0.00             0.00 +- 0.00       (bad trimming)
--   middle-hump             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (bad trimming)
--   no-5-prime              3    0.06    14032.33 +- 5484.00         12.00 +- 19.08      (bad trimming)
--   no-3-prime              7    0.13    21045.86 +- 13782.69       324.00 +- 466.99     (bad trimming)
--   
--   low-coverage           80    1.49     4774.94 +- 6121.65          8.22 +- 2.52       (easy to assemble, potential for lower quality consensus)
--   unique               3765   70.26    12173.06 +- 10297.54        52.63 +- 9.22       (easy to assemble, perfect, yay)
--   repeat-cont           719   13.42     4241.81 +- 4027.79        185.66 +- 37.90      (potential for consensus errors, no impact on assembly)
--   repeat-dove             3    0.06     6480.33 +- 396.65         195.30 +- 24.41      (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat           385    7.18    16990.96 +- 11263.72      3688.18 +- 4315.80    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont      335    6.25    12856.97 +- 8410.06                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove       48    0.90    23878.71 +- 8037.93                             (will end contigs, potential to misassemble)
--   uniq-anchor            14    0.26    20851.36 +- 15323.84     12638.64 +- 12904.90   (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/ERROR RATES]
--  
--  ERROR RATES
--  -----------
--                                                   --------threshold------
--  5552                         fraction error      fraction        percent
--  samples                              (1e-5)         error          error
--                   --------------------------      --------       --------
--  command line (-eg)                           ->  12000.00       12.0000%
--  command line (-ef)                           ->  -----.--      ---.----%
--  command line (-eM)                           ->  12000.00       12.0000%
--  mean + std.dev     110.07 +-  12 *   697.52  ->   8480.34        8.4803%  (enabled)
--  median + mad         0.00 +-  12 *     0.00  ->      0.00        0.0000%
--  90th percentile                              ->    166.00        0.1660%
--  
--  BEST EDGE FILTERING
--  -------------------
--  At graph threshold 12.0000%, reads:
--    available to have edges:          213
--    with at least one edge:           213
--  
--  At max threshold 12.0000%, reads:  (not computed)
--    available to have edges:            0
--    with at least one edge:             0
--  
--  At tight threshold 0.1660%, reads with:
--    both edges below error threshold:       118  (80.00% minReadsBest threshold = 170)
--    one  edge  above error threshold:        48
--    both edges above error threshold:        47
--    at least one edge:                      213
--  
--  At loose threshold 8.4803%, reads with:
--    both edges below error threshold:       199  (80.00% minReadsBest threshold = 170)
--    one  edge  above error threshold:         9
--    both edges above error threshold:         5
--    at least one edge:                      213
--  
--  
--  INITIAL EDGES
--  -------- ----------------------------------------
--      5097 reads are contained
--    100149 reads have no best edges (singleton)
--        17 reads have only one best edge (spur) 
--                 17 are mutual best
--       224 reads have two best edges 
--                 23 have one mutual best edge
--                191 have two mutual best edges
--  
--  
--  FINAL EDGES
--  -------- ----------------------------------------
--      5097 reads are contained
--    100163 reads have no best edges (singleton)
--        18 reads have only one best edge (spur) 
--                 18 are mutual best
--       209 reads have two best edges 
--                 10 have one mutual best edge
--                193 have two mutual best edges
--  
--  
--  EDGE FILTERING
--  -------- ------------------------------------------
--         0 reads are ignored
--        21 reads have a gap in overlap coverage
--         6 reads have lopsided best edges

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      13 sequences, total length 1383973 bp (including 5 repeats of total length 135256 bp).
--   bubbles:      4 sequences, total length 155365 bp.
--   unassembled:  87 sequences, total length 1547959 bp.
--
-- Contig sizes based on genome size 1.4mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10      942904             1      942904
--     20      942904             1      942904
--     30      942904             1      942904
--     40      942904             1      942904
--     50      942904             1      942904
--     60      942904             1      942904
--     70       80829             2     1023733
--     80       48218             4     1144419
--     90       29217             8     1279348
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      13 sequences, total length 1383614 bp (including 5 repeats of total length 134529 bp).
--   bubbles:      4 sequences, total length 154710 bp.
--   unassembled:  87 sequences, total length 1547959 bp.
--
-- Contig sizes based on genome size 1.4mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10      942676             1      942676
--     20      942676             1      942676
--     30      942676             1      942676
--     40      942676             1      942676
--     50      942676             1      942676
--     60      942676             1      942676
--     70       81030             2     1023706
--     80       48226             4     1144158
--     90       29199             8     1279087
--
skoren commented 1 year ago

The distribution is fine. Canu tries to correct the longest 40x of reads for assembly. Since you have a lot of coverage, it's able to use only the long reads over 12kb for this part. You can see this in the correction report where the corrected read median is expected to be 17kb vs 3kb for all input data. The short reads are "rescued", these are corrected if the longest 40x doesn't represent them well and so likely come from shorter plasmids in your sample.

AnthonyPiot91 commented 1 year ago

Thanks for the information.

This answers my second concern. If I understand correctly, I should not worry about canu selecting only the longest reads, not representing small plasmids (smaller than 10Kbp), because "rescued" reads will likely represent these sequences?

skoren commented 1 year ago

Yes, it should though it's not always perfect. When you have your assembly you can re-map the raw data to see if there are reads w/o good explanation/mappings to see if anything is missed.

AnthonyPiot91 commented 1 year ago

Very good, thanks a lot!