Closed aglendening closed 11 months ago
Minimal impact. The genome size estimate is primarily used to discard potentially useless data - we'll only correct the longest 40x of data for example. So by setting genome size too high your assembly used "too much" (relative to a normal run) data. This was mostly for compute performance, but we've seen some degradation of contig size with "too much" data; twice the usual amount should be OK though.
You could take the existing trimmed reads and run them through just the assembly phase (canu ... -trimmed -nanopore {assembly1}/*trimmed*fasta.gz
) to see the impact. This should take (wild guess) about 40% as long as the original run.
Check the summary
file to see how much data was actually used and/or post it. I'm not sure if there is anything in there that would indicate rerun would help, more just to see how much coverage was actually assembled.
summary
as in canu_assembly/unitigging/canu.ovlStore.summary
? In that case, pasted below.
Your first suggestion is running currently, I can update as to whether there are any differences when it finishes. And thank you!
category reads % read length feature size or coverage analysis
---------------- ------- ------- ---------------------- ------------------------ --------------------
middle-missing 22 0.00 24655.64 +- 18903.14 3937.73 +- 5487.70 (bad trimming)
middle-hump 36 0.00 8483.75 +- 8166.77 1595.89 +- 3030.18 (bad trimming)
no-5-prime 316 0.01 18578.94 +- 16122.24 1928.75 +- 3617.19 (bad trimming)
no-3-prime 151 0.01 13547.35 +- 11941.56 1623.21 +- 3518.53 (bad trimming)
low-coverage 3621 0.14 5966.95 +- 5447.03 3.23 +- 1.58 (easy to assemble, potential for lower quality consensus)
unique 84178 3.16 11586.68 +- 7723.34 26.05 +- 4.62 (easy to assemble, perfect, yay)
repeat-cont 1474716 55.39 15341.48 +- 8801.64 80.41 +- 111.11 (potential for consensus errors, no impact on assembly)
repeat-dove 46302 1.74 38627.29 +- 7758.87 66.13 +- 43.24 (hard to assemble, likely won't assemble correctly or even at all)
span-repeat 97870 3.68 19093.41 +- 9376.37 7978.36 +- 8070.51 (read spans a large repeat, usually easy to assemble)
uniq-repeat-cont 464717 17.45 18678.87 +- 7702.52 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
uniq-repeat-dove 61735 2.32 35132.66 +- 8456.50 (will end contigs, potential to misassemble)
uniq-anchor 428180 16.08 24048.91 +- 9468.36 9075.94 +- 7788.61 (repeat read, with unique section, probable bad read)
Sorry, my mistake. I was after the canu.report
in the main directory. It will have two read-length histograms at the start; these will also tell the number of reads and bases in the data.
No problem, and thank you! Sorry for the delay, canu.report printed below. The assembly only re-run should also be done within a couple days.
[CORRECTION/READS]
--
-- In sequence store './canu.seqStore':
-- Found 3947665 reads.
-- Found 58224787463 bases (25.31 times coverage).
-- Histogram of raw reads:
--
-- G=58224787463 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 38478 124087 5822505932 || 1000-4350 798706|---------------------------------------------------------------
-- 00020 31640 292441 11644960784 || 4351-7701 509184|-----------------------------------------
-- 00030 27406 490841 17467450380 || 7702-11052 406037|---------------------------------
-- 00040 24190 717347 23289922371 || 11053-14403 406673|---------------------------------
-- 00050 21462 973010 29112414440 || 14404-17754 423437|----------------------------------
-- 00060 18899 1261994 34934884926 || 17755-21105 393448|--------------------------------
-- 00070 16255 1593424 40757365104 || 21106-24456 314785|-------------------------
-- 00080 13080 1990076 46579841375 || 24457-27807 227829|------------------
-- 00090 8601 2527176 52402314776 || 27808-31158 157299|-------------
-- 00100 1000 3947664 58224787463 || 31159-34509 105779|---------
-- 001.000x 3947665 58224787463 || 34510-37860 70566|------
-- || 37861-41211 45723|----
-- || 41212-44562 30038|---
-- || 44563-47913 19429|--
-- || 47914-51264 12531|-
-- || 51265-54615 8424|-
-- || 54616-57966 5407|-
-- || 57967-61317 3655|-
-- || 61318-64668 2512|-
-- || 64669-68019 1722|-
-- || 68020-71370 1192|-
-- || 71371-74721 834|-
-- || 74722-78072 604|-
-- || 78073-81423 480|-
-- || 81424-84774 319|-
-- || 84775-88125 241|-
-- || 88126-91476 196|-
-- || 91477-94827 155|-
-- || 94828-98178 112|-
-- || 98179-101529 78|-
-- || 101530-104880 59|-
-- || 104881-108231 42|-
-- || 108232-111582 39|-
-- || 111583-114933 32|-
-- || 114934-118284 23|-
-- || 118285-121635 16|-
-- || 121636-124986 21|-
-- || 124987-128337 9|-
-- || 128338-131688 3|-
-- || 131689-135039 4|-
-- || 135040-138390 4|-
-- || 138391-141741 6|-
-- || 141742-145092 2|-
-- || 145093-148443 2|-
-- || 148444-151794 3|-
-- || 151795-155145 2|-
-- || 155146-158496 0|
-- || 158497-161847 0|
-- || 161848-165198 1|-
-- || 165199-168549 2|-
--
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 228507300 ********************************************************************** 0.2407 0.0079
-- 3- 4 196058827 ************************************************************ 0.3714 0.0144
-- 5- 7 92004849 **************************** 0.4940 0.0232
-- 8- 11 34654069 ********** 0.5582 0.0303
-- 12- 16 12728470 *** 0.5848 0.0348
-- 17- 22 8408238 ** 0.5956 0.0373
-- 23- 29 22731028 ****** 0.6051 0.0406
-- 30- 37 39961278 ************ 0.6320 0.0528
-- 38- 46 27455292 ******** 0.6735 0.0762
-- 47- 56 27982024 ******** 0.7002 0.0948
-- 57- 67 59456720 ****************** 0.7320 0.1226
-- 68- 79 51936168 *************** 0.7961 0.1893
-- 80- 92 21972012 ****** 0.8475 0.2515
-- 93- 106 16054359 **** 0.8692 0.2823
-- 107- 121 14680082 **** 0.8860 0.3100
-- 122- 137 15708112 **** 0.9013 0.3390
-- 138- 154 12585765 *** 0.9178 0.3744
-- 155- 172 8723114 ** 0.9307 0.4054
-- 173- 191 7440327 ** 0.9398 0.4298
-- 192- 211 6667531 ** 0.9475 0.4532
-- 212- 232 5390048 * 0.9545 0.4763
-- 233- 254 4450551 * 0.9601 0.4968
-- 255- 277 3854983 * 0.9647 0.5154
-- 278- 301 3268307 * 0.9688 0.5331
-- 302- 326 2787154 0.9722 0.5494
-- 327- 352 2415885 0.9751 0.5644
-- 353- 379 2082227 0.9776 0.5786
-- 380- 407 1812314 0.9798 0.5917
-- 408- 436 1581313 0.9817 0.6040
-- 437- 466 1384513 0.9834 0.6155
-- 467- 497 1218270 0.9848 0.6263
-- 498- 529 1074948 0.9861 0.6364
-- 530- 562 954028 0.9872 0.6460
-- 563- 596 849933 0.9882 0.6550
-- 597- 631 758598 0.9891 0.6635
-- 632- 667 677431 0.9899 0.6715
-- 668- 704 609120 0.9906 0.6791
-- 705- 742 551268 0.9912 0.6863
-- 743- 781 501195 0.9918 0.6932
-- 782- 821 460080 0.9924 0.6998
--
-- 0 (max occurrences)
-- 57723434531 (total mers, non-unique)
-- 949207107 (distinct mers, non-unique)
-- 0 (unique mers)
[CORRECTION/LAYOUT]
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 3315190 632475
-- Number of Bases 56634484650 1066870678
-- Coverage 24.624 0.464
-- Median 15723 0
-- Mean 17083 1686
-- N50 21808 4030
-- Minimum 2000 0
-- Maximum 168516 42044
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 3588886 2853680 2853680 0 0
-- Number of Bases 57685463931 54016628179 51241694972 0 0
-- Coverage 25.081 23.485 22.279 0.000 0.000
-- Median 14659 17500 17277 0 0
-- Mean 16073 18928 17956 0 0
-- N50 21580 22368 22805 0 0
-- Minimum 2000 2000 1 0 0
-- Maximum 168516 168366 152147 0 0
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 1093985 1093985
-- Number of Bases 3684727149 168472
-- Coverage 1.602 0.000
-- Median 3004 0
-- Mean 3368 0
-- N50 5670 0
-- Minimum 0 0
-- Maximum 168516 168472
--
-- Maximum Memory 2145207854
[TRIMMING/READS]
--
-- In sequence store './canu.seqStore':
-- Found 2697395 reads.
-- Found 49529700987 bases (21.53 times coverage).
-- Histogram of corrected reads:
--
-- G=49529700987 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 38442 107666 4953007570 || 1000-4031 237727|------------------------------------------
-- 00020 32215 249527 9905953660 || 4032-7063 175558|--------------------------------
-- 00030 28262 414196 14858932734 || 7064-10095 182077|---------------------------------
-- 00040 25255 599870 19811900234 || 10096-13127 244924|--------------------------------------------
-- 00050 22733 806707 24764854100 || 13128-16159 325409|----------------------------------------------------------
-- 00060 20441 1036484 29717840815 || 16160-19191 356746|---------------------------------------------------------------
-- 00070 18178 1293230 34670802478 || 19192-22223 320514|---------------------------------------------------------
-- 00080 15692 1585583 39623766036 || 22224-25255 254588|---------------------------------------------
-- 00090 12230 1938259 44576731594 || 25256-28287 186996|----------------------------------
-- 00100 1000 2697394 49529700987 || 28288-31319 132907|------------------------
-- 001.000x 2697395 49529700987 || 31320-34351 92190|-----------------
-- || 34352-37383 63440|------------
-- || 37384-40415 42197|--------
-- || 40416-43447 28023|-----
-- || 43448-46479 18123|----
-- || 46480-49511 11893|---
-- || 49512-52543 7942|--
-- || 52544-55575 5155|-
-- || 55576-58607 3380|-
-- || 58608-61639 2307|-
-- || 61640-64671 1543|-
-- || 64672-67703 1043|-
-- || 67704-70735 707|-
-- || 70736-73767 518|-
-- || 73768-76799 357|-
-- || 76800-79831 287|-
-- || 79832-82863 223|-
-- || 82864-85895 151|-
-- || 85896-88927 101|-
-- || 88928-91959 85|-
-- || 91960-94991 72|-
-- || 94992-98023 46|-
-- || 98024-101055 42|-
-- || 101056-104087 24|-
-- || 104088-107119 25|-
-- || 107120-110151 15|-
-- || 110152-113183 13|-
-- || 113184-116215 8|-
-- || 116216-119247 10|-
-- || 119248-122279 7|-
-- || 122280-125311 7|-
-- || 125312-128343 3|-
-- || 128344-131375 1|-
-- || 131376-134407 2|-
-- || 134408-137439 3|-
-- || 137440-140471 2|-
-- || 140472-143503 1|-
-- || 143504-146535 0|
-- || 146536-149567 2|-
-- || 149568-152599 0|
-- || 152600-155631 1|-
--
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 17777757 ******* 0.0241 0.0007
-- 3- 4 12623526 ***** 0.0346 0.0012
-- 5- 7 9874910 *** 0.0462 0.0020
-- 8- 11 11403894 **** 0.0584 0.0032
-- 12- 16 17223813 ****** 0.0741 0.0057
-- 17- 22 36331429 ************** 0.0993 0.0114
-- 23- 29 75856558 ****************************** 0.1544 0.0286
-- 30- 37 81659813 ******************************** 0.2615 0.0722
-- 38- 46 53941874 ********************* 0.3654 0.1253
-- 47- 56 120189601 ************************************************ 0.4383 0.1723
-- 57- 67 174534447 ********************************************************************** 0.6155 0.3135
-- 68- 79 55708511 ********************** 0.8410 0.5242
-- 80- 92 12321961 **** 0.9051 0.5939
-- 93- 106 9453260 *** 0.9210 0.6146
-- 107- 121 10195140 **** 0.9336 0.6335
-- 122- 137 8215658 *** 0.9475 0.6574
-- 138- 154 4604364 * 0.9582 0.6781
-- 155- 172 3644958 * 0.9643 0.6914
-- 173- 191 3243119 * 0.9691 0.7034
-- 192- 211 2471648 0.9735 0.7153
-- 212- 232 2029213 0.9768 0.7253
-- 233- 254 1733577 0.9795 0.7343
-- 255- 277 1441961 0.9818 0.7428
-- 278- 301 1230450 0.9837 0.7506
-- 302- 326 1055876 0.9854 0.7577
-- 327- 352 899910 0.9868 0.7644
-- 353- 379 771832 0.9880 0.7706
-- 380- 407 665490 0.9891 0.7763
-- 408- 436 581090 0.9900 0.7816
-- 437- 466 513119 0.9907 0.7865
-- 467- 497 457764 0.9914 0.7912
-- 498- 529 409947 0.9921 0.7956
-- 530- 562 379564 0.9926 0.7999
-- 563- 596 381461 0.9931 0.8041
-- 597- 631 365212 0.9936 0.8086
-- 632- 667 350967 0.9941 0.8131
-- 668- 704 313346 0.9946 0.8178
-- 705- 742 273775 0.9950 0.8221
-- 743- 781 225387 0.9954 0.8261
-- 782- 821 191480 0.9957 0.8296
--
-- 0 (max occurrences)
-- 49315358976 (total mers, non-unique)
-- 738541145 (distinct mers, non-unique)
-- 0 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- 500 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 2 (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 3947665 reads 49529700987 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 243378 reads 4710051310 bases (trimmed reads output)
-- 2419023 reads 43733190114 bases (reads with no change, kept as is)
-- 1281253 reads 412658367 bases (reads with no overlaps, deleted)
-- 4011 reads 31863103 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 117357 reads 202199022 bases (bases trimmed from the 5' end of a read)
-- 138471 reads 439739071 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 2662401 reads 49085179517 bases (reads processed)
-- 1285264 reads 444521470 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 22 reads 253030 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 2662379 reads 49084926487 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 411 reads 417 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 417 reads 99538 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 74 reads 1227443 bases (trimmed from the 5' end of the read)
-- 336 reads 5815592 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In sequence store './canu.seqStore':
-- Found 2662400 reads.
-- Found 48436196500 bases (21.05 times coverage).
-- Histogram of corrected-trimmed reads:
--
-- G=48436196500 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 37110 111546 4843625788 || 1000-3124 172129|--------------------------------------------
-- 00020 31447 254275 9687258805 || 3125-5249 130039|---------------------------------
-- 00030 27761 418647 14530884167 || 5250-7374 115747|------------------------------
-- 00040 24909 603095 19374487544 || 7375-9499 123959|--------------------------------
-- 00050 22486 807868 24218109824 || 9500-11624 151427|---------------------------------------
-- 00060 20265 1034757 29061734787 || 11625-13749 191890|-------------------------------------------------
-- 00070 18059 1287746 33905343249 || 13750-15874 231631|-----------------------------------------------------------
-- 00080 15614 1575286 38748959004 || 15875-17999 250873|---------------------------------------------------------------
-- 00090 12198 1921502 43592584265 || 18000-20124 244411|--------------------------------------------------------------
-- 00100 1000 2662399 48436196500 || 20125-22249 219884|--------------------------------------------------------
-- 001.000x 2662400 48436196500 || 22250-24374 186025|-----------------------------------------------
-- || 24375-26499 151400|---------------------------------------
-- || 26500-28624 119490|-------------------------------
-- || 28625-30749 93450|------------------------
-- || 30750-32874 72055|-------------------
-- || 32875-34999 55116|--------------
-- || 35000-37124 41564|-----------
-- || 37125-39249 30741|--------
-- || 39250-41374 22788|------
-- || 41375-43499 16512|-----
-- || 43500-45624 11794|---
-- || 45625-47749 8598|---
-- || 47750-49874 6093|--
-- || 49875-51999 4429|--
-- || 52000-54124 3135|-
-- || 54125-56249 2178|-
-- || 56250-58374 1580|-
-- || 58375-60499 1093|-
-- || 60500-62624 801|-
-- || 62625-64749 482|-
-- || 64750-66874 376|-
-- || 66875-68999 231|-
-- || 69000-71124 171|-
-- || 71125-73249 115|-
-- || 73250-75374 73|-
-- || 75375-77499 44|-
-- || 77500-79624 30|-
-- || 79625-81749 20|-
-- || 81750-83874 7|-
-- || 83875-85999 4|-
-- || 86000-88124 4|-
-- || 88125-90249 1|-
-- || 90250-92374 0|
-- || 92375-94499 4|-
-- || 94500-96624 1|-
-- || 96625-98749 2|-
-- || 98750-100874 2|-
-- || 100875-102999 0|
-- || 103000-105124 0|
-- || 105125-107249 1|-
--
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 17099195 ******* 0.0232 0.0007
-- 3- 4 12407998 ***** 0.0335 0.0012
-- 5- 7 9906923 **** 0.0451 0.0020
-- 8- 11 11517892 **** 0.0574 0.0033
-- 12- 16 17605862 ******* 0.0732 0.0058
-- 17- 22 37533899 *************** 0.0991 0.0118
-- 23- 29 77657644 ******************************* 0.1563 0.0300
-- 30- 37 80852429 ********************************* 0.2656 0.0753
-- 38- 46 54548407 ********************** 0.3683 0.1288
-- 47- 56 128271955 **************************************************** 0.4432 0.1782
-- 57- 67 170549849 ********************************************************************** 0.6318 0.3314
-- 68- 79 49091804 ******************** 0.8501 0.5393
-- 80- 92 11901871 **** 0.9064 0.6017
-- 93- 106 9420495 *** 0.9220 0.6223
-- 107- 121 10382489 **** 0.9346 0.6416
-- 122- 137 7790837 *** 0.9487 0.6664
-- 138- 154 4485134 * 0.9588 0.6864
-- 155- 172 3623794 * 0.9648 0.6997
-- 173- 191 3215084 * 0.9696 0.7120
-- 192- 211 2439163 * 0.9739 0.7239
-- 212- 232 2030577 0.9772 0.7340
-- 233- 254 1753275 0.9799 0.7433
-- 255- 277 1454511 0.9823 0.7521
-- 278- 301 1252115 0.9842 0.7600
-- 302- 326 1096770 0.9859 0.7675
-- 327- 352 925288 0.9874 0.7746
-- 353- 379 792225 0.9886 0.7811
-- 380- 407 693668 0.9897 0.7870
-- 408- 436 609646 0.9906 0.7927
-- 437- 466 520267 0.9915 0.7980
-- 467- 497 455410 0.9922 0.8028
-- 498- 529 398835 0.9928 0.8074
-- 530- 562 350756 0.9933 0.8116
-- 563- 596 315210 0.9938 0.8155
-- 597- 631 282125 0.9942 0.8193
-- 632- 667 258207 0.9946 0.8229
-- 668- 704 231713 0.9949 0.8264
-- 705- 742 210182 0.9953 0.8297
-- 743- 781 195072 0.9955 0.8328
-- 782- 821 177353 0.9958 0.8359
--
-- 0 (max occurrences)
-- 48250496218 (total mers, non-unique)
-- 737224022 (distinct mers, non-unique)
-- 0 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 22 0.00 24655.64 +- 18903.14 3937.73 +- 5487.70 (bad trimming)
-- middle-hump 36 0.00 8483.75 +- 8166.77 1595.89 +- 3030.18 (bad trimming)
-- no-5-prime 316 0.01 18578.94 +- 16122.24 1928.75 +- 3617.19 (bad trimming)
-- no-3-prime 151 0.01 13547.35 +- 11941.56 1623.21 +- 3518.53 (bad trimming)
--
-- low-coverage 3621 0.14 5966.95 +- 5447.03 3.23 +- 1.58 (easy to assemble, potential for lower quality consensus)
-- unique 84178 3.16 11586.68 +- 7723.34 26.05 +- 4.62 (easy to assemble, perfect, yay)
-- repeat-cont 1474716 55.39 15341.48 +- 8801.64 80.41 +- 111.11 (potential for consensus errors, no impact on assembly)
-- repeat-dove 46302 1.74 38627.29 +- 7758.87 66.13 +- 43.24 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 97870 3.68 19093.41 +- 9376.37 7978.36 +- 8070.51 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 464717 17.45 18678.87 +- 7702.52 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 61735 2.32 35132.66 +- 8456.50 (will end contigs, potential to misassemble)
-- uniq-anchor 428180 16.08 24048.91 +- 9468.36 9075.94 +- 7788.61 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/ERROR RATES]
--
-- ERROR RATES
-- -----------
-- --------threshold------
-- 2851801 fraction error fraction percent
-- samples (1e-5) error error
-- -------------------------- -------- --------
-- command line (-eg) -> 12000.00 12.0000%
-- command line (-ef) -> -----.-- ---.----%
-- command line (-eM) -> 12000.00 12.0000%
-- mean + std.dev 163.11 +- 12 * 826.22 -> 10077.79 10.0778% (enabled)
-- median + mad 0.00 +- 12 * 0.00 -> 0.00 0.0000%
-- 90th percentile -> 225.00 0.2250%
--
-- BEST EDGE FILTERING
-- -------------------
-- At graph threshold 12.0000%, reads:
-- available to have edges: 191922
-- with at least one edge: 190872
--
-- At max threshold 12.0000%, reads: (not computed)
-- available to have edges: 0
-- with at least one edge: 0
--
-- At tight threshold 0.2250%, reads with:
-- both edges below error threshold: 97856 (80.00% minReadsBest threshold = 152697)
-- one edge above error threshold: 47866
-- both edges above error threshold: 45150
-- at least one edge: 190872
--
-- At loose threshold 10.0778%, reads with:
-- both edges below error threshold: 186525 (80.00% minReadsBest threshold = 152697)
-- one edge above error threshold: 4275
-- both edges above error threshold: 72
-- at least one edge: 190872
--
--
-- INITIAL EDGES
-- -------- ----------------------------------------
-- 2456502 reads are contained
-- 1298030 reads have no best edges (singleton)
-- 576 reads have only one best edge (spur)
-- 428 are mutual best
-- 192557 reads have two best edges
-- 16504 have one mutual best edge
-- 175206 have two mutual best edges
--
--
-- FINAL EDGES
-- -------- ----------------------------------------
-- 2456502 reads are contained
-- 1298705 reads have no best edges (singleton)
-- 786 reads have only one best edge (spur)
-- 641 are mutual best
-- 191672 reads have two best edges
-- 16233 have one mutual best edge
-- 174685 have two mutual best edges
--
--
-- EDGE FILTERING
-- -------- ------------------------------------------
-- 0 reads are ignored
-- 11960 reads have a gap in overlap coverage
-- 481 reads have lopsided best edges
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 6568 sequences, total length 1334410489 bp (including 1430 repeats of total length 65021591 bp).
-- bubbles: 2869 sequences, total length 215893986 bp.
-- unassembled: 30982 sequences, total length 445297704 bp.
--
-- Contig sizes based on genome size 2.3gbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 1103901 147 230914462
-- 20 662889 418 460228246
-- 30 378352 878 690037030
-- 40 186041 1767 920185437
-- 50 93467 3560 1150092374
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 6568 sequences, total length 1328299669 bp (including 1430 repeats of total length 64555993 bp).
-- bubbles: 2869 sequences, total length 215361682 bp.
-- unassembled: 30982 sequences, total length 445297704 bp.
--
-- Contig sizes based on genome size 2.3gbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 1097884 148 230753569
-- 20 655509 422 460246964
-- 30 372778 889 690332572
-- 40 182290 1793 920181490
-- 50 91936 3619 1150083022
--
Given that the input coverage was below 40x, none of the data would have been removed so I don't think the genome size affected your run. You can see the histogram plots consistent show the coverage is about double what is expected and the final assembly is about 1.3 Gbp with another 200mb of bubble sequences (alt haplotype). So I wouldn't try re-running with the corrected genome size.
However, looking at the raw data plot, the input data looks quite good. There is a clear peak at about 57-67x coverage even before correction. If you are going to run anything, I'd suggest trying the uncorrected ONT assembly parameters from here: https://canu.readthedocs.io/en/latest/quick-start.html#assembling-with-multiple-technologies-and-multiple-files (see the uncorrected ONT assembly section).
Idle
Hi!
I've come to realize that I ran canu with genomeSize set at double what I'm now expecting from my species (set 2.3Gb, reality ~1Gb). How much, if at all, would this affect the output? Metrics look normal, other than longer than expected assembly size - though this may be heterozygosity. Is there a way to have canu "reinterpret" the results with the new genomeSize without simply restarting the pipeline from scratch? I tried to re-run with genomeSize set to the new value, but the pipeline finished after seconds, automatically skipping Correction, Trimming, and Unitigging after 'seeing' the files already there. Apologies if this has been answered but I couldn't find anything in github or readthedocs.
My thanks, AMG
Canu command used (using the assembly too large/slow recommended configs):
canu -p canu -d canu_assembly genomeSize=2.3g -nanopore ...fastq gridOptions="--partition=general --qos=general --mem-per-cpu=8000m --cpus-per-task=24" corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 512 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975
Canu 2.2, via module load on a linux HPC cluster