Closed pjm43 closed 5 years ago
Your corrected read median rate is about what we normally see for nanopore data so I'm not sure multiple rounds would help. Instead of that, I would suggest using the latest nanopore flip-flop basecaller, that seems to boost accuracy at least on human/mammal data.
What does the histogram distribution look like? Is there a clear peak, does it look like your plant is heterozygous? If it is, you can also add the heterozygous parameters from the FAQ.
Hmm, I'll look into the flip flop basecaller - we've been using guppy. Here's the full .report with the histograms - it should be a fairly homozygous species (inbreeder) - do you see any signs of heterozygousity?
[CORRECTION/READS]
--
-- In sequence store './canu1.8_Cwatsonii.seqStore':
-- Found 3674618 reads.
-- Found 45504226620 bases (91 times coverage).
--
-- Read length histogram (one '*' equals 19168.95 reads):
-- 0 4999 362527 ******************
-- 5000 9999 1062734 *******************************************************
-- 10000 14999 1341827 **********************************************************************
-- 15000 19999 544266 ****************************
-- 20000 24999 191256 *********
-- 25000 29999 80462 ****
-- 30000 34999 39756 **
-- 35000 39999 21384 *
-- 40000 44999 11966
-- 45000 49999 7350
-- 50000 54999 4339
-- 55000 59999 2745
-- 60000 64999 1623
-- 65000 69999 961
-- 70000 74999 549
-- 75000 79999 334
-- 80000 84999 214
-- 85000 89999 124
-- 90000 94999 68
-- 95000 99999 46
-- 100000 104999 21
-- 105000 109999 20
-- 110000 114999 11
-- 115000 119999 10
-- 120000 124999 5
-- 125000 129999 7
-- 130000 134999 2
-- 135000 139999 3
-- 140000 144999 3
-- 145000 149999 0
-- 150000 154999 0
-- 155000 159999 2
-- 160000 164999 1
-- 165000 169999 0
-- 170000 174999 1
-- 175000 179999 0
-- 180000 184999 0
-- 185000 189999 0
-- 190000 194999 1
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 196614414 **************************************** 0.1082 0.0087
-- 3- 4 331108347 ******************************************************************** 0.2060 0.0205
-- 5- 7 336886262 ********************************************************************** 0.3625 0.0485
-- 8- 11 264871097 ******************************************************* 0.5206 0.0923
-- 12- 16 185326314 ************************************** 0.6473 0.1448
-- 17- 22 124637569 ************************* 0.7381 0.1986
-- 23- 29 84745998 ***************** 0.8006 0.2494
-- 30- 37 61508571 ************ 0.8440 0.2960
-- 38- 46 49068501 ********** 0.8763 0.3404
-- 47- 56 40076047 ******** 0.9024 0.3852
-- 57- 67 30856753 ****** 0.9237 0.4300
-- 68- 79 22869582 **** 0.9401 0.4712
-- 80- 92 17209377 *** 0.9523 0.5075
-- 93- 106 13282535 ** 0.9616 0.5396
-- 107- 121 10346232 ** 0.9687 0.5683
-- 122- 137 8092640 * 0.9743 0.5940
-- 138- 154 6395924 * 0.9786 0.6168
-- 155- 172 5106579 * 0.9821 0.6372
-- 173- 191 4114303 0.9849 0.6554
-- 192- 211 3346296 0.9871 0.6718
-- 212- 232 2751581 0.9889 0.6865
-- 233- 254 2278472 0.9904 0.6999
-- 255- 277 1906224 0.9917 0.7121
-- 278- 301 1608964 0.9927 0.7232
-- 302- 326 1362693 0.9936 0.7334
-- 327- 352 1162661 0.9943 0.7428
-- 353- 379 993926 0.9950 0.7515
-- 380- 407 853895 0.9955 0.7595
-- 408- 436 734321 0.9960 0.7669
-- 437- 466 635831 0.9964 0.7737
-- 467- 497 551923 0.9967 0.7800
-- 498- 529 483288 0.9970 0.7858
-- 530- 562 425269 0.9973 0.7913
-- 563- 596 375077 0.9975 0.7964
-- 597- 631 333745 0.9977 0.8012
-- 632- 667 296306 0.9979 0.8057
-- 668- 704 265592 0.9981 0.8100
-- 705- 742 237097 0.9982 0.8140
-- 743- 781 214416 0.9984 0.8178
-- 782- 821 192922 0.9985 0.8214
--
-- 0 (max occurrences)
-- 45254819515 (total mers, non-unique)
-- 1816718316 (distinct mers, non-unique)
-- 0 (unique mers)
[CORRECTION/LAYOUT]
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 3496253 178365
-- Number of Bases 44676732654 807353130
-- Coverage 89.353 1.615
-- Median 11490 3643
-- Mean 12778 4526
-- N50 13888 5119
-- Minimum 2000 0
-- Maximum 190190 45290
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 3620404 972932 972932 43539 43539
-- Number of Bases 45005553344 20180544737 20000007715 384907673 376913877
-- Coverage 90.011 40.361 40.000 0.770 0.754
-- Median 11271 18229 18081 9028 8924
-- Mean 12431 20741 20556 8840 8656
-- N50 13819 19835 19629 9657 9584
-- Minimum 2000 14420 14419 2001 1001
-- Maximum 190190 173186 156754 50846 14419
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 2658147 2658147
-- Number of Bases 24918633374 21056287888
-- Coverage 49.837 42.113
-- Median 9797 9433
-- Mean 9374 7921
-- N50 10822 11041
-- Minimum 0 0
-- Maximum 190190 188706
--
-- Maximum Memory 2315925278
[TRIMMING/READS]
--
-- In sequence store './canu1.8_Cwatsonii.seqStore':
-- Found 1016421 reads.
-- Found 19952833652 bases (39.9 times coverage).
--
-- Read length histogram (one '*' equals 7199.87 reads):
-- 0 4999 5131
-- 5000 9999 28382 ***
-- 10000 14999 153023 *********************
-- 15000 19999 503991 **********************************************************************
-- 20000 24999 172908 ************************
-- 25000 29999 72263 **********
-- 30000 34999 35337 ****
-- 35000 39999 19035 **
-- 40000 44999 10590 *
-- 45000 49999 6430
-- 50000 54999 3737
-- 55000 59999 2340
-- 60000 64999 1384
-- 65000 69999 803
-- 70000 74999 450
-- 75000 79999 260
-- 80000 84999 165
-- 85000 89999 73
-- 90000 94999 44
-- 95000 99999 29
-- 100000 104999 16
-- 105000 109999 10
-- 110000 114999 5
-- 115000 119999 5
-- 120000 124999 2
-- 125000 129999 5
-- 130000 134999 0
-- 135000 139999 0
-- 140000 144999 2
-- 145000 149999 0
-- 150000 154999 0
-- 155000 159999 1
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 305927822 ********************************************************************** 0.3211 0.0352
-- 3- 4 193748129 ******************************************** 0.4517 0.0567
-- 5- 7 101345153 *********************** 0.5718 0.0857
-- 8- 11 59785945 ************* 0.6511 0.1154
-- 12- 16 41872965 ********* 0.7043 0.1455
-- 17- 22 37736130 ******** 0.7446 0.1783
-- 23- 29 56739955 ************ 0.7842 0.2229
-- 30- 37 80370823 ****************** 0.8475 0.3175
-- 38- 46 43273859 ********* 0.9295 0.4719
-- 47- 56 9114318 ** 0.9684 0.5613
-- 57- 67 5212685 * 0.9767 0.5848
-- 68- 79 4145726 0.9821 0.6034
-- 80- 92 2415494 0.9862 0.6202
-- 93- 106 1876864 0.9887 0.6318
-- 107- 121 1382352 0.9906 0.6424
-- 122- 137 1068747 0.9920 0.6513
-- 138- 154 867780 0.9931 0.6592
-- 155- 172 730396 0.9940 0.6664
-- 173- 191 627446 0.9948 0.6732
-- 192- 211 537982 0.9954 0.6797
-- 212- 232 435028 0.9960 0.6859
-- 233- 254 352033 0.9964 0.6914
-- 255- 277 295788 0.9968 0.6963
-- 278- 301 252535 0.9971 0.7008
-- 302- 326 218702 0.9974 0.7050
-- 327- 352 191725 0.9976 0.7090
-- 353- 379 169811 0.9978 0.7127
-- 380- 407 149809 0.9980 0.7162
-- 408- 436 133504 0.9981 0.7196
-- 437- 466 118422 0.9983 0.7229
-- 467- 497 105796 0.9984 0.7259
-- 498- 529 94626 0.9985 0.7289
-- 530- 562 84859 0.9986 0.7316
-- 563- 596 76531 0.9987 0.7343
-- 597- 631 70189 0.9988 0.7369
-- 632- 667 63708 0.9988 0.7393
-- 668- 704 58743 0.9989 0.7417
-- 705- 742 53039 0.9990 0.7440
-- 743- 781 48739 0.9990 0.7462
-- 782- 821 45176 0.9991 0.7484
--
-- 0 (max occurrences)
-- 17362747871 (total mers, non-unique)
-- 952639470 (distinct mers, non-unique)
-- 0 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- 1 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 3674618 reads 19952833652 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 826131 reads 14459470441 bases (trimmed reads output)
-- 189483 reads 3376897607 bases (reads with no change, kept as is)
-- 2658532 reads 1691483 bases (reads with no overlaps, deleted)
-- 472 reads 3250716 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 516307 reads 1044594022 bases (bases trimmed from the 5' end of a read)
-- 701024 reads 1066929383 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 1015614 reads 19947891453 bases (reads processed)
-- 2659004 reads 4942199 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 77 reads 1168549 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 1015537 reads 19946722904 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 995 reads 1004 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 1004 reads 404327 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 369 reads 2803988 bases (trimmed from the 5' end of the read)
-- 627 reads 5902955 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In sequence store './canu1.8_Cwatsonii.seqStore':
-- Found 1015613 reads.
-- Found 17827660343 bases (35.65 times coverage).
--
-- Read length histogram (one '*' equals 6129.84 reads):
-- 0 4999 13776 **
-- 5000 9999 87295 **************
-- 10000 14999 244114 ***************************************
-- 15000 19999 429089 **********************************************************************
-- 20000 24999 133921 *********************
-- 25000 29999 51809 ********
-- 30000 34999 24981 ****
-- 35000 39999 13230 **
-- 40000 44999 7209 *
-- 45000 49999 4397
-- 50000 54999 2466
-- 55000 59999 1485
-- 60000 64999 856
-- 65000 69999 478
-- 70000 74999 261
-- 75000 79999 117
-- 80000 84999 78
-- 85000 89999 23
-- 90000 94999 16
-- 95000 99999 10
-- 100000 104999 0
-- 105000 109999 1
-- 110000 114999 0
-- 115000 119999 1
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 255642300 ********************************************************************** 0.3042 0.0323
-- 3- 4 165368556 ********************************************* 0.4301 0.0524
-- 5- 7 88346759 ************************ 0.5475 0.0797
-- 8- 11 53455256 ************** 0.6265 0.1084
-- 12- 16 38888289 ********** 0.6808 0.1382
-- 17- 22 38023505 ********** 0.7237 0.1720
-- 23- 29 60413087 **************** 0.7698 0.2224
-- 30- 37 77033475 ********************* 0.8457 0.3321
-- 38- 46 35563108 ********* 0.9331 0.4909
-- 47- 56 7577945 ** 0.9688 0.5701
-- 57- 67 4837427 * 0.9768 0.5921
-- 68- 79 3509373 0.9825 0.6109
-- 80- 92 2107339 0.9865 0.6264
-- 93- 106 1633577 0.9889 0.6376
-- 107- 121 1188442 0.9908 0.6477
-- 122- 137 930874 0.9922 0.6561
-- 138- 154 761609 0.9932 0.6636
-- 155- 172 645629 0.9941 0.6706
-- 173- 191 558844 0.9949 0.6772
-- 192- 211 456453 0.9956 0.6836
-- 212- 232 366801 0.9961 0.6893
-- 233- 254 297116 0.9965 0.6944
-- 255- 277 250906 0.9969 0.6989
-- 278- 301 214810 0.9972 0.7031
-- 302- 326 186711 0.9974 0.7070
-- 327- 352 163903 0.9976 0.7107
-- 353- 379 146136 0.9978 0.7142
-- 380- 407 128174 0.9980 0.7176
-- 408- 436 113587 0.9982 0.7208
-- 437- 466 100354 0.9983 0.7238
-- 467- 497 90092 0.9984 0.7266
-- 498- 529 80509 0.9985 0.7294
-- 530- 562 71583 0.9986 0.7320
-- 563- 596 65735 0.9987 0.7344
-- 597- 631 59515 0.9988 0.7368
-- 632- 667 54607 0.9989 0.7391
-- 668- 704 49469 0.9989 0.7414
-- 705- 742 46013 0.9990 0.7435
-- 743- 781 41882 0.9990 0.7456
-- 782- 821 38405 0.9991 0.7476
--
-- 0 (max occurrences)
-- 15825236058 (total mers, non-unique)
-- 840241886 (distinct mers, non-unique)
-- 0 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 762 0.08 16633.16 +- 8429.90 1607.93 +- 1964.57 (bad trimming)
-- middle-hump 93 0.01 6588.52 +- 3791.93 778.74 +- 1293.99 (bad trimming)
-- no-5-prime 1884 0.19 13081.59 +- 6847.66 505.78 +- 1137.99 (bad trimming)
-- no-3-prime 1565 0.15 12971.92 +- 7071.60 451.00 +- 993.82 (bad trimming)
--
-- low-coverage 7035 0.69 7652.26 +- 4165.76 6.28 +- 2.63 (easy to assemble, potential for lower quality consensus)
-- unique 606789 59.75 18072.09 +- 6696.33 33.00 +- 6.95 (easy to assemble, perfect, yay)
-- repeat-cont 181053 17.83 16226.92 +- 7507.82 3272.53 +- 1699.60 (potential for consensus errors, no impact on assembly)
-- repeat-dove 520 0.05 29894.39 +- 17714.73 2066.97 +- 2409.62 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 39045 3.84 20998.46 +- 10307.43 5816.93 +- 4979.84 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 141553 13.94 15337.16 +- 5384.26 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 27997 2.76 24095.18 +- 10456.11 (will end contigs, potential to misassemble)
-- uniq-anchor 7162 0.71 18174.99 +- 7184.36 10099.35 +- 7193.20 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 3517 sequences, total length 536255101 bp (including 377 repeats of total length 8742209 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 126153 sequences, total length 1987992988 bp.
--
-- Contig sizes based on genome size 500mbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2814540 13 51607288
-- 20 1700537 38 101106989
-- 30 1141962 74 150565492
-- 40 817197 126 200006038
-- 50 600580 199 250519651
-- 60 442265 296 300268184
-- 70 298594 432 350257657
-- 80 186223 644 400069451
-- 90 106562 995 450035317
-- 100 40228 1753 500007493
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 3517 sequences, total length 537158622 bp (including 377 repeats of total length 8726072 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 126153 sequences, total length 1987986220 bp.
--
-- Contig sizes based on genome size 500mbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2824535 13 51752597
-- 20 1705398 38 101395186
-- 30 1145103 74 150963902
-- 40 819608 126 200520991
-- 50 602301 198 250535231
-- 60 445368 294 300094489
-- 70 302763 429 350153922
-- 80 188856 640 400187312
-- 90 107907 987 450086403
-- 100 41262 1730 500015335
--
It looks pretty homozygous from the plots so no need to include the het parameters.
I'm working on a plant species with moderately high repeat fraction (60%). I have approximately 90X coverage with Nanopore R9 1D reads. My initial assembly produced an N50 ~600 Kb, which I know is not bad, but I'm looking for ways I might improve it. I've seen in other posts that perhaps a couple rounds of correction (with the parameters -correct corMhapSensivity=normal corOutCoverage=80), followed by assembly with correctedErrorRate=0.15 ovlMerDistinct=0.975 was suggested. Would that be appropriate for this assembly? I've pasted in my original command, bottom portion of the final report and the unitigging/4-unitigger/001thr000.num000.log. Let me know if you need to see anything else.
Thanks for your help!
My original command:
canu -d canu_v1.8_Cwatsonii -p canu1.8_Cwatsonii genomeSize=500m maxMemory=500g maxThreads=24 corMhapSensitivity=normal corOutCoverage=40 \ merylMemory=500g merylThreads=24 ovlThreads=24 obtovlMemory=5g ovlMerThreshold=500 \ -nanopore-raw Final_watsonii_trimmed.q8_l2000.porechop.fq.gz
report: