marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

Ran on wrong genomeSize #2274

Closed aglendening closed 7 months ago

aglendening commented 8 months ago

Hi!

I've come to realize that I ran canu with genomeSize set at double what I'm now expecting from my species (set 2.3Gb, reality ~1Gb). How much, if at all, would this affect the output? Metrics look normal, other than longer than expected assembly size - though this may be heterozygosity. Is there a way to have canu "reinterpret" the results with the new genomeSize without simply restarting the pipeline from scratch? I tried to re-run with genomeSize set to the new value, but the pipeline finished after seconds, automatically skipping Correction, Trimming, and Unitigging after 'seeing' the files already there. Apologies if this has been answered but I couldn't find anything in github or readthedocs.

My thanks, AMG

Canu command used (using the assembly too large/slow recommended configs): canu -p canu -d canu_assembly genomeSize=2.3g -nanopore ...fastq gridOptions="--partition=general --qos=general --mem-per-cpu=8000m --cpus-per-task=24" corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 512 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975 Canu 2.2, via module load on a linux HPC cluster

brianwalenz commented 8 months ago

Minimal impact. The genome size estimate is primarily used to discard potentially useless data - we'll only correct the longest 40x of data for example. So by setting genome size too high your assembly used "too much" (relative to a normal run) data. This was mostly for compute performance, but we've seen some degradation of contig size with "too much" data; twice the usual amount should be OK though.

You could take the existing trimmed reads and run them through just the assembly phase (canu ... -trimmed -nanopore {assembly1}/*trimmed*fasta.gz) to see the impact. This should take (wild guess) about 40% as long as the original run.

Check the summary file to see how much data was actually used and/or post it. I'm not sure if there is anything in there that would indicate rerun would help, more just to see how much coverage was actually assembled.

aglendening commented 8 months ago

summary as in canu_assembly/unitigging/canu.ovlStore.summary? In that case, pasted below.

Your first suggestion is running currently, I can update as to whether there are any differences when it finishes. And thank you!

category            reads     %          read length        feature size or coverage  analysis
----------------  -------  -------  ----------------------  ------------------------  --------------------
middle-missing         22    0.00    24655.64 +- 18903.14      3937.73 +- 5487.70    (bad trimming)
middle-hump            36    0.00     8483.75 +- 8166.77       1595.89 +- 3030.18    (bad trimming)
no-5-prime            316    0.01    18578.94 +- 16122.24      1928.75 +- 3617.19    (bad trimming)
no-3-prime            151    0.01    13547.35 +- 11941.56      1623.21 +- 3518.53    (bad trimming)

low-coverage         3621    0.14     5966.95 +- 5447.03          3.23 +- 1.58       (easy to assemble, potential for lower quality consensus)
unique              84178    3.16    11586.68 +- 7723.34         26.05 +- 4.62       (easy to assemble, perfect, yay)
repeat-cont       1474716   55.39    15341.48 +- 8801.64         80.41 +- 111.11     (potential for consensus errors, no impact on assembly)
repeat-dove         46302    1.74    38627.29 +- 7758.87         66.13 +- 43.24      (hard to assemble, likely won't assemble correctly or even at all)

span-repeat         97870    3.68    19093.41 +- 9376.37       7978.36 +- 8070.51    (read spans a large repeat, usually easy to assemble)
uniq-repeat-cont   464717   17.45    18678.87 +- 7702.52                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
uniq-repeat-dove    61735    2.32    35132.66 +- 8456.50                             (will end contigs, potential to misassemble)
uniq-anchor        428180   16.08    24048.91 +- 9468.36       9075.94 +- 7788.61    (repeat read, with unique section, probable bad read)
brianwalenz commented 8 months ago

Sorry, my mistake. I was after the canu.report in the main directory. It will have two read-length histograms at the start; these will also tell the number of reads and bases in the data.

aglendening commented 7 months ago

No problem, and thank you! Sorry for the delay, canu.report printed below. The assembly only re-run should also be done within a couple days.

[CORRECTION/READS]
--
-- In sequence store './canu.seqStore':
--   Found 3947665 reads.
--   Found 58224787463 bases (25.31 times coverage).
--    Histogram of raw reads:
--
--    G=58224787463                      sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        38478    124087   5822505932  ||       1000-4350       798706|---------------------------------------------------------------
--    00020        31640    292441  11644960784  ||       4351-7701       509184|-----------------------------------------
--    00030        27406    490841  17467450380  ||       7702-11052      406037|---------------------------------
--    00040        24190    717347  23289922371  ||      11053-14403      406673|---------------------------------
--    00050        21462    973010  29112414440  ||      14404-17754      423437|----------------------------------
--    00060        18899   1261994  34934884926  ||      17755-21105      393448|--------------------------------
--    00070        16255   1593424  40757365104  ||      21106-24456      314785|-------------------------
--    00080        13080   1990076  46579841375  ||      24457-27807      227829|------------------
--    00090         8601   2527176  52402314776  ||      27808-31158      157299|-------------
--    00100         1000   3947664  58224787463  ||      31159-34509      105779|---------
--    001.000x             3947665  58224787463  ||      34510-37860       70566|------
--                                               ||      37861-41211       45723|----
--                                               ||      41212-44562       30038|---
--                                               ||      44563-47913       19429|--
--                                               ||      47914-51264       12531|-
--                                               ||      51265-54615        8424|-
--                                               ||      54616-57966        5407|-
--                                               ||      57967-61317        3655|-
--                                               ||      61318-64668        2512|-
--                                               ||      64669-68019        1722|-
--                                               ||      68020-71370        1192|-
--                                               ||      71371-74721         834|-
--                                               ||      74722-78072         604|-
--                                               ||      78073-81423         480|-
--                                               ||      81424-84774         319|-
--                                               ||      84775-88125         241|-
--                                               ||      88126-91476         196|-
--                                               ||      91477-94827         155|-
--                                               ||      94828-98178         112|-
--                                               ||      98179-101529         78|-
--                                               ||     101530-104880         59|-
--                                               ||     104881-108231         42|-
--                                               ||     108232-111582         39|-
--                                               ||     111583-114933         32|-
--                                               ||     114934-118284         23|-
--                                               ||     118285-121635         16|-
--                                               ||     121636-124986         21|-
--                                               ||     124987-128337          9|-
--                                               ||     128338-131688          3|-
--                                               ||     131689-135039          4|-
--                                               ||     135040-138390          4|-
--                                               ||     138391-141741          6|-
--                                               ||     141742-145092          2|-
--                                               ||     145093-148443          2|-
--                                               ||     148444-151794          3|-
--                                               ||     151795-155145          2|-
--                                               ||     155146-158496          0|
--                                               ||     158497-161847          0|
--                                               ||     161848-165198          1|-
--                                               ||     165199-168549          2|-
--

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2 228507300 ********************************************************************** 0.2407 0.0079
--       3-     4 196058827 ************************************************************           0.3714 0.0144
--       5-     7  92004849 ****************************                                           0.4940 0.0232
--       8-    11  34654069 **********                                                             0.5582 0.0303
--      12-    16  12728470 ***                                                                    0.5848 0.0348
--      17-    22   8408238 **                                                                     0.5956 0.0373
--      23-    29  22731028 ******                                                                 0.6051 0.0406
--      30-    37  39961278 ************                                                           0.6320 0.0528
--      38-    46  27455292 ********                                                               0.6735 0.0762
--      47-    56  27982024 ********                                                               0.7002 0.0948
--      57-    67  59456720 ******************                                                     0.7320 0.1226
--      68-    79  51936168 ***************                                                        0.7961 0.1893
--      80-    92  21972012 ******                                                                 0.8475 0.2515
--      93-   106  16054359 ****                                                                   0.8692 0.2823
--     107-   121  14680082 ****                                                                   0.8860 0.3100
--     122-   137  15708112 ****                                                                   0.9013 0.3390
--     138-   154  12585765 ***                                                                    0.9178 0.3744
--     155-   172   8723114 **                                                                     0.9307 0.4054
--     173-   191   7440327 **                                                                     0.9398 0.4298
--     192-   211   6667531 **                                                                     0.9475 0.4532
--     212-   232   5390048 *                                                                      0.9545 0.4763
--     233-   254   4450551 *                                                                      0.9601 0.4968
--     255-   277   3854983 *                                                                      0.9647 0.5154
--     278-   301   3268307 *                                                                      0.9688 0.5331
--     302-   326   2787154                                                                        0.9722 0.5494
--     327-   352   2415885                                                                        0.9751 0.5644
--     353-   379   2082227                                                                        0.9776 0.5786
--     380-   407   1812314                                                                        0.9798 0.5917
--     408-   436   1581313                                                                        0.9817 0.6040
--     437-   466   1384513                                                                        0.9834 0.6155
--     467-   497   1218270                                                                        0.9848 0.6263
--     498-   529   1074948                                                                        0.9861 0.6364
--     530-   562    954028                                                                        0.9872 0.6460
--     563-   596    849933                                                                        0.9882 0.6550
--     597-   631    758598                                                                        0.9891 0.6635
--     632-   667    677431                                                                        0.9899 0.6715
--     668-   704    609120                                                                        0.9906 0.6791
--     705-   742    551268                                                                        0.9912 0.6863
--     743-   781    501195                                                                        0.9918 0.6932
--     782-   821    460080                                                                        0.9924 0.6998
--
--           0 (max occurrences)
-- 57723434531 (total mers, non-unique)
--   949207107 (distinct mers, non-unique)
--           0 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads            3315190        632475
--   Number of Bases        56634484650    1066870678
--   Coverage                    24.624         0.464
--   Median                       15723             0
--   Mean                         17083          1686
--   N50                          21808          4030
--   Minimum                       2000             0
--   Maximum                     168516         42044
--
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads            3588886        2853680       2853680              0             0
--   Number of Bases        57685463931    54016628179   51241694972              0             0
--   Coverage                    25.081         23.485        22.279          0.000         0.000
--   Median                       14659          17500         17277              0             0
--   Mean                         16073          18928         17956              0             0
--   N50                          21580          22368         22805              0             0
--   Minimum                       2000           2000             1              0             0
--   Maximum                     168516         168366        152147              0             0
--
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads            1093985       1093985
--   Number of Bases         3684727149        168472
--   Coverage                     1.602         0.000
--   Median                        3004             0
--   Mean                          3368             0
--   N50                           5670             0
--   Minimum                          0             0
--   Maximum                     168516        168472
--
--   Maximum Memory          2145207854

[TRIMMING/READS]
--
-- In sequence store './canu.seqStore':
--   Found 2697395 reads.
--   Found 49529700987 bases (21.53 times coverage).
--    Histogram of corrected reads:
--
--    G=49529700987                      sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        38442    107666   4953007570  ||       1000-4031       237727|------------------------------------------
--    00020        32215    249527   9905953660  ||       4032-7063       175558|--------------------------------
--    00030        28262    414196  14858932734  ||       7064-10095      182077|---------------------------------
--    00040        25255    599870  19811900234  ||      10096-13127      244924|--------------------------------------------
--    00050        22733    806707  24764854100  ||      13128-16159      325409|----------------------------------------------------------
--    00060        20441   1036484  29717840815  ||      16160-19191      356746|---------------------------------------------------------------
--    00070        18178   1293230  34670802478  ||      19192-22223      320514|---------------------------------------------------------
--    00080        15692   1585583  39623766036  ||      22224-25255      254588|---------------------------------------------
--    00090        12230   1938259  44576731594  ||      25256-28287      186996|----------------------------------
--    00100         1000   2697394  49529700987  ||      28288-31319      132907|------------------------
--    001.000x             2697395  49529700987  ||      31320-34351       92190|-----------------
--                                               ||      34352-37383       63440|------------
--                                               ||      37384-40415       42197|--------
--                                               ||      40416-43447       28023|-----
--                                               ||      43448-46479       18123|----
--                                               ||      46480-49511       11893|---
--                                               ||      49512-52543        7942|--
--                                               ||      52544-55575        5155|-
--                                               ||      55576-58607        3380|-
--                                               ||      58608-61639        2307|-
--                                               ||      61640-64671        1543|-
--                                               ||      64672-67703        1043|-
--                                               ||      67704-70735         707|-
--                                               ||      70736-73767         518|-
--                                               ||      73768-76799         357|-
--                                               ||      76800-79831         287|-
--                                               ||      79832-82863         223|-
--                                               ||      82864-85895         151|-
--                                               ||      85896-88927         101|-
--                                               ||      88928-91959          85|-
--                                               ||      91960-94991          72|-
--                                               ||      94992-98023          46|-
--                                               ||      98024-101055         42|-
--                                               ||     101056-104087         24|-
--                                               ||     104088-107119         25|-
--                                               ||     107120-110151         15|-
--                                               ||     110152-113183         13|-
--                                               ||     113184-116215          8|-
--                                               ||     116216-119247         10|-
--                                               ||     119248-122279          7|-
--                                               ||     122280-125311          7|-
--                                               ||     125312-128343          3|-
--                                               ||     128344-131375          1|-
--                                               ||     131376-134407          2|-
--                                               ||     134408-137439          3|-
--                                               ||     137440-140471          2|-
--                                               ||     140472-143503          1|-
--                                               ||     143504-146535          0|
--                                               ||     146536-149567          2|-
--                                               ||     149568-152599          0|
--                                               ||     152600-155631          1|-
--

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2  17777757 *******                                                                0.0241 0.0007
--       3-     4  12623526 *****                                                                  0.0346 0.0012
--       5-     7   9874910 ***                                                                    0.0462 0.0020
--       8-    11  11403894 ****                                                                   0.0584 0.0032
--      12-    16  17223813 ******                                                                 0.0741 0.0057
--      17-    22  36331429 **************                                                         0.0993 0.0114
--      23-    29  75856558 ******************************                                         0.1544 0.0286
--      30-    37  81659813 ********************************                                       0.2615 0.0722
--      38-    46  53941874 *********************                                                  0.3654 0.1253
--      47-    56 120189601 ************************************************                       0.4383 0.1723
--      57-    67 174534447 ********************************************************************** 0.6155 0.3135
--      68-    79  55708511 **********************                                                 0.8410 0.5242
--      80-    92  12321961 ****                                                                   0.9051 0.5939
--      93-   106   9453260 ***                                                                    0.9210 0.6146
--     107-   121  10195140 ****                                                                   0.9336 0.6335
--     122-   137   8215658 ***                                                                    0.9475 0.6574
--     138-   154   4604364 *                                                                      0.9582 0.6781
--     155-   172   3644958 *                                                                      0.9643 0.6914
--     173-   191   3243119 *                                                                      0.9691 0.7034
--     192-   211   2471648                                                                        0.9735 0.7153
--     212-   232   2029213                                                                        0.9768 0.7253
--     233-   254   1733577                                                                        0.9795 0.7343
--     255-   277   1441961                                                                        0.9818 0.7428
--     278-   301   1230450                                                                        0.9837 0.7506
--     302-   326   1055876                                                                        0.9854 0.7577
--     327-   352    899910                                                                        0.9868 0.7644
--     353-   379    771832                                                                        0.9880 0.7706
--     380-   407    665490                                                                        0.9891 0.7763
--     408-   436    581090                                                                        0.9900 0.7816
--     437-   466    513119                                                                        0.9907 0.7865
--     467-   497    457764                                                                        0.9914 0.7912
--     498-   529    409947                                                                        0.9921 0.7956
--     530-   562    379564                                                                        0.9926 0.7999
--     563-   596    381461                                                                        0.9931 0.8041
--     597-   631    365212                                                                        0.9936 0.8086
--     632-   667    350967                                                                        0.9941 0.8131
--     668-   704    313346                                                                        0.9946 0.8178
--     705-   742    273775                                                                        0.9950 0.8221
--     743-   781    225387                                                                        0.9954 0.8261
--     782-   821    191480                                                                        0.9957 0.8296
--
--           0 (max occurrences)
-- 49315358976 (total mers, non-unique)
--   738541145 (distinct mers, non-unique)
--           0 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1200    (use overlaps at or below this fraction error)
--      500    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        2    (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--
--  INPUT READS:
--  -----------
--  3947665 reads  49529700987 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--
--  OUTPUT READS:
--  ------------
--  243378 reads   4710051310 bases (trimmed reads output)
--  2419023 reads  43733190114 bases (reads with no change, kept as is)
--  1281253 reads    412658367 bases (reads with no overlaps, deleted)
--    4011 reads     31863103 bases (reads with short trimmed length, deleted)
--
--  TRIMMING DETAILS:
--  ----------------
--  117357 reads    202199022 bases (bases trimmed from the 5' end of a read)
--  138471 reads    439739071 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1200    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--  2662401 reads  49085179517 bases (reads processed)
--  1285264 reads    444521470 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--      22 reads       253030 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--  2662379 reads  49084926487 bases (processed for subreads)
--
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--     411 reads          417 signals (number of subread signal)
--
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--     417 reads        99538 bases (size of subread signal)
--
--  TRIMMING:
--  --------
--      74 reads      1227443 bases (trimmed from the 5' end of the read)
--     336 reads      5815592 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In sequence store './canu.seqStore':
--   Found 2662400 reads.
--   Found 48436196500 bases (21.05 times coverage).
--    Histogram of corrected-trimmed reads:
--
--    G=48436196500                      sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        37110    111546   4843625788  ||       1000-3124       172129|--------------------------------------------
--    00020        31447    254275   9687258805  ||       3125-5249       130039|---------------------------------
--    00030        27761    418647  14530884167  ||       5250-7374       115747|------------------------------
--    00040        24909    603095  19374487544  ||       7375-9499       123959|--------------------------------
--    00050        22486    807868  24218109824  ||       9500-11624      151427|---------------------------------------
--    00060        20265   1034757  29061734787  ||      11625-13749      191890|-------------------------------------------------
--    00070        18059   1287746  33905343249  ||      13750-15874      231631|-----------------------------------------------------------
--    00080        15614   1575286  38748959004  ||      15875-17999      250873|---------------------------------------------------------------
--    00090        12198   1921502  43592584265  ||      18000-20124      244411|--------------------------------------------------------------
--    00100         1000   2662399  48436196500  ||      20125-22249      219884|--------------------------------------------------------
--    001.000x             2662400  48436196500  ||      22250-24374      186025|-----------------------------------------------
--                                               ||      24375-26499      151400|---------------------------------------
--                                               ||      26500-28624      119490|-------------------------------
--                                               ||      28625-30749       93450|------------------------
--                                               ||      30750-32874       72055|-------------------
--                                               ||      32875-34999       55116|--------------
--                                               ||      35000-37124       41564|-----------
--                                               ||      37125-39249       30741|--------
--                                               ||      39250-41374       22788|------
--                                               ||      41375-43499       16512|-----
--                                               ||      43500-45624       11794|---
--                                               ||      45625-47749        8598|---
--                                               ||      47750-49874        6093|--
--                                               ||      49875-51999        4429|--
--                                               ||      52000-54124        3135|-
--                                               ||      54125-56249        2178|-
--                                               ||      56250-58374        1580|-
--                                               ||      58375-60499        1093|-
--                                               ||      60500-62624         801|-
--                                               ||      62625-64749         482|-
--                                               ||      64750-66874         376|-
--                                               ||      66875-68999         231|-
--                                               ||      69000-71124         171|-
--                                               ||      71125-73249         115|-
--                                               ||      73250-75374          73|-
--                                               ||      75375-77499          44|-
--                                               ||      77500-79624          30|-
--                                               ||      79625-81749          20|-
--                                               ||      81750-83874           7|-
--                                               ||      83875-85999           4|-
--                                               ||      86000-88124           4|-
--                                               ||      88125-90249           1|-
--                                               ||      90250-92374           0|
--                                               ||      92375-94499           4|-
--                                               ||      94500-96624           1|-
--                                               ||      96625-98749           2|-
--                                               ||      98750-100874          2|-
--                                               ||     100875-102999          0|
--                                               ||     103000-105124          0|
--                                               ||     105125-107249          1|-
--

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2  17099195 *******                                                                0.0232 0.0007
--       3-     4  12407998 *****                                                                  0.0335 0.0012
--       5-     7   9906923 ****                                                                   0.0451 0.0020
--       8-    11  11517892 ****                                                                   0.0574 0.0033
--      12-    16  17605862 *******                                                                0.0732 0.0058
--      17-    22  37533899 ***************                                                        0.0991 0.0118
--      23-    29  77657644 *******************************                                        0.1563 0.0300
--      30-    37  80852429 *********************************                                      0.2656 0.0753
--      38-    46  54548407 **********************                                                 0.3683 0.1288
--      47-    56 128271955 ****************************************************                   0.4432 0.1782
--      57-    67 170549849 ********************************************************************** 0.6318 0.3314
--      68-    79  49091804 ********************                                                   0.8501 0.5393
--      80-    92  11901871 ****                                                                   0.9064 0.6017
--      93-   106   9420495 ***                                                                    0.9220 0.6223
--     107-   121  10382489 ****                                                                   0.9346 0.6416
--     122-   137   7790837 ***                                                                    0.9487 0.6664
--     138-   154   4485134 *                                                                      0.9588 0.6864
--     155-   172   3623794 *                                                                      0.9648 0.6997
--     173-   191   3215084 *                                                                      0.9696 0.7120
--     192-   211   2439163 *                                                                      0.9739 0.7239
--     212-   232   2030577                                                                        0.9772 0.7340
--     233-   254   1753275                                                                        0.9799 0.7433
--     255-   277   1454511                                                                        0.9823 0.7521
--     278-   301   1252115                                                                        0.9842 0.7600
--     302-   326   1096770                                                                        0.9859 0.7675
--     327-   352    925288                                                                        0.9874 0.7746
--     353-   379    792225                                                                        0.9886 0.7811
--     380-   407    693668                                                                        0.9897 0.7870
--     408-   436    609646                                                                        0.9906 0.7927
--     437-   466    520267                                                                        0.9915 0.7980
--     467-   497    455410                                                                        0.9922 0.8028
--     498-   529    398835                                                                        0.9928 0.8074
--     530-   562    350756                                                                        0.9933 0.8116
--     563-   596    315210                                                                        0.9938 0.8155
--     597-   631    282125                                                                        0.9942 0.8193
--     632-   667    258207                                                                        0.9946 0.8229
--     668-   704    231713                                                                        0.9949 0.8264
--     705-   742    210182                                                                        0.9953 0.8297
--     743-   781    195072                                                                        0.9955 0.8328
--     782-   821    177353                                                                        0.9958 0.8359
--
--           0 (max occurrences)
-- 48250496218 (total mers, non-unique)
--   737224022 (distinct mers, non-unique)
--           0 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing         22    0.00    24655.64 +- 18903.14      3937.73 +- 5487.70    (bad trimming)
--   middle-hump            36    0.00     8483.75 +- 8166.77       1595.89 +- 3030.18    (bad trimming)
--   no-5-prime            316    0.01    18578.94 +- 16122.24      1928.75 +- 3617.19    (bad trimming)
--   no-3-prime            151    0.01    13547.35 +- 11941.56      1623.21 +- 3518.53    (bad trimming)
--
--   low-coverage         3621    0.14     5966.95 +- 5447.03          3.23 +- 1.58       (easy to assemble, potential for lower quality consensus)
--   unique              84178    3.16    11586.68 +- 7723.34         26.05 +- 4.62       (easy to assemble, perfect, yay)
--   repeat-cont       1474716   55.39    15341.48 +- 8801.64         80.41 +- 111.11     (potential for consensus errors, no impact on assembly)
--   repeat-dove         46302    1.74    38627.29 +- 7758.87         66.13 +- 43.24      (hard to assemble, likely won't assemble correctly or even at all)
--
--   span-repeat         97870    3.68    19093.41 +- 9376.37       7978.36 +- 8070.51    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont   464717   17.45    18678.87 +- 7702.52                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove    61735    2.32    35132.66 +- 8456.50                             (will end contigs, potential to misassemble)
--   uniq-anchor        428180   16.08    24048.91 +- 9468.36       9075.94 +- 7788.61    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/ERROR RATES]
--
--  ERROR RATES
--  -----------
--                                                   --------threshold------
--  2851801                      fraction error      fraction        percent
--  samples                              (1e-5)         error          error
--                   --------------------------      --------       --------
--  command line (-eg)                           ->  12000.00       12.0000%
--  command line (-ef)                           ->  -----.--      ---.----%
--  command line (-eM)                           ->  12000.00       12.0000%
--  mean + std.dev     163.11 +-  12 *   826.22  ->  10077.79       10.0778%  (enabled)
--  median + mad         0.00 +-  12 *     0.00  ->      0.00        0.0000%
--  90th percentile                              ->    225.00        0.2250%
--
--  BEST EDGE FILTERING
--  -------------------
--  At graph threshold 12.0000%, reads:
--    available to have edges:       191922
--    with at least one edge:        190872
--
--  At max threshold 12.0000%, reads:  (not computed)
--    available to have edges:            0
--    with at least one edge:             0
--
--  At tight threshold 0.2250%, reads with:
--    both edges below error threshold:     97856  (80.00% minReadsBest threshold = 152697)
--    one  edge  above error threshold:     47866
--    both edges above error threshold:     45150
--    at least one edge:                   190872
--
--  At loose threshold 10.0778%, reads with:
--    both edges below error threshold:    186525  (80.00% minReadsBest threshold = 152697)
--    one  edge  above error threshold:      4275
--    both edges above error threshold:        72
--    at least one edge:                   190872
--
--
--  INITIAL EDGES
--  -------- ----------------------------------------
--   2456502 reads are contained
--   1298030 reads have no best edges (singleton)
--       576 reads have only one best edge (spur)
--                428 are mutual best
--    192557 reads have two best edges
--              16504 have one mutual best edge
--             175206 have two mutual best edges
--
--
--  FINAL EDGES
--  -------- ----------------------------------------
--   2456502 reads are contained
--   1298705 reads have no best edges (singleton)
--       786 reads have only one best edge (spur)
--                641 are mutual best
--    191672 reads have two best edges
--              16233 have one mutual best edge
--             174685 have two mutual best edges
--
--
--  EDGE FILTERING
--  -------- ------------------------------------------
--         0 reads are ignored
--     11960 reads have a gap in overlap coverage
--       481 reads have lopsided best edges

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      6568 sequences, total length 1334410489 bp (including 1430 repeats of total length 65021591 bp).
--   bubbles:      2869 sequences, total length 215893986 bp.
--   unassembled:  30982 sequences, total length 445297704 bp.
--
-- Contig sizes based on genome size 2.3gbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     1103901           147   230914462
--     20      662889           418   460228246
--     30      378352           878   690037030
--     40      186041          1767   920185437
--     50       93467          3560  1150092374
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      6568 sequences, total length 1328299669 bp (including 1430 repeats of total length 64555993 bp).
--   bubbles:      2869 sequences, total length 215361682 bp.
--   unassembled:  30982 sequences, total length 445297704 bp.
--
-- Contig sizes based on genome size 2.3gbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     1097884           148   230753569
--     20      655509           422   460246964
--     30      372778           889   690332572
--     40      182290          1793   920181490
--     50       91936          3619  1150083022
--
skoren commented 7 months ago

Given that the input coverage was below 40x, none of the data would have been removed so I don't think the genome size affected your run. You can see the histogram plots consistent show the coverage is about double what is expected and the final assembly is about 1.3 Gbp with another 200mb of bubble sequences (alt haplotype). So I wouldn't try re-running with the corrected genome size.

However, looking at the raw data plot, the input data looks quite good. There is a clear peak at about 57-67x coverage even before correction. If you are going to run anything, I'd suggest trying the uncorrected ONT assembly parameters from here: https://canu.readthedocs.io/en/latest/quick-start.html#assembling-with-multiple-technologies-and-multiple-files (see the uncorrected ONT assembly section).

skoren commented 7 months ago

Idle