Closed SarahSaadain closed 11 months ago
The overlap step can be slow. It looks like the step is running and making progress, it was running 8 jobs at a time but is up to 41, the last job. The FAQ has parameters that can help make this faster potentially after correction, specifically ovlMerDistinct=0.975
. However, this requires a restart so I would probably just wait for the jobs to complete. You could run with the the --fast option but typically that produces a less continuous assembly.
Thank you for your fast reply!
The last output that was created (according to the 1-overlapper folder) was on October 3rd and that was overlap41, so after that it seems to be stuck for 5 weeks now.
I've come across the FAQ where you recommend the parameters that can be tweaked when having a very repetitive or large genome by using: corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 512 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975.
The genome I want to assemble will be used to find transposable elements and piRNA clusters, hence I do not want to exclude too much of the repetitive regions and was not sure how stringent this code will be.
Since you already have corrected reads, only the last parameter would matter. It would filter what k-mers are allowed to seed an overlap with more repetitive ones not allowed to seed. Any repeats that still have unique k-mers w/in them would still be part of an overlap. However, like I said, this parameter change would require starting over from scratch.
What does top show for the system the jobs are running on? Are they still getting CPU/doing work.
Thank you again for your reply!
This is what top is showing me:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
50365 vetlinu+ 20 0 53.0g 45.5g 216 R 199.3 9.0 225840:30 overlapInCore
and when I do the htop I see that 8 threads are assigned but only two are running (see screenshot attached)
Initially it was using all 40 cores I assigned, ever since it is in the 'obtovl' step (which was beginning of october), it is only using 1-2 cores. Is this normal?
We do sometimes see that there are straggler reads which take longer to compute overlaps (either because they have more noise or more simple sequence k-mers seeding hits) which is why it's only using a couple of cores. The other subsets of reads have completed.
The default error rate for nanopore data of 12% is probably too high for modern datasets. I'd suggest killing the one job that didn't finish and editing the overlapping shell script to drop the error rate from 0.12 to 0.085 or even 0.065 to see if the job completes quicker (re-run it as overlap.sh 41
). The other options I mentioned above would require a restart but you could launch one of those in parallel to see if it completes faster than this run. I'd drop the 12% there as well using correctedErrorRate=0.085
.
Thank you for your reply!
I checked the overlap.out files and it is actually not 41 that is stuck but 1. So overlap 2-41 are all fine. To drop the error rate I change the --maxerate in overlap.sh to 0.0065?
Since your last reply I started running the same trimming script with your recommended ovlMerDistinct=0.975 on another computer, this time using 150x coverage. I double checked it now and from all 41 overlaps it is also overlap 1 that is stuck, while all others worked.
So to double check, I should kill the one job that is stuck (overlap 1), and change --maxerate in the output.sh to 0.065? Can I then just run the overlap.sh again and it will automatically continue where it was before? While the main script I submitted is still running?
Yes, that is correct, kill both the main script and the overlap.sh job that is stalled. Then edit as you said. Re-run overlap.sh 1
and then re-run the main script. It should resume the next step after overlapping once it sees all the output files are present.
Thank you very much! Both my trimming runs have finally finished: the one where I only used ovlMerDistinct=0.975 took 10 days and the other run where I included ovlMerDistinct=0.975 and maxerate of 0.0065 only took a coupe of days.
Is there any additional setting I can include to speed up the assembly step? I am using a correctedErrorRate of 0.14256 (because the recommended rate for Nanopore is 0.144 and I decreased it by 1% because I use high coverage). Should I use ovlMerDistinct=0.975 again?
Best regards
Yes, include ovlMerDistinct. I think you can probably drop the ONT error rate quite a bit from the 14% default. You could try with 0.065 since that was fast and see what the assembly looks like, post the report when it's done as that will have info on what error rates for most reads looked like. If that ends up being too low, you could always re-run with 0.105 or similar but go with the faster option first.
Thank you for your reply! The assembly with ovlMerDistinct=0.975 and an error rate of 0.144 is running for 2.5 weeks now and it seems it is again the overlap 1 that takes the longest, all others have finished a while ago. I now started the assembly on a different computer with your recommended error rate of 0.065 and hope it will be a bit faster.
Sorry I made a mistake in my previous message: The trimming script that finished fast had a correctedErrorRate=0.085, not 0.065. Therefore in the assembly script I am running now I also used the setting correctedErrorRate=0.085. I hope my logic makes sense. Or should I even decrease it down to 0.065, although the trimming used 0.085?
The assembly should use a higher or equal error rate to trimming so 0.085 is good. You don't want to lower the assembly rate vs trimming since the trimming could have left errors at up to that rate in the reads which would then have no overlaps in the assembly step.
okay, thank you for clarifying!
The one assembly with the "faster" settings has finished, here are the parameters that I used:
./softwares/canu-2.2/bin/canu corThreads=30 ovlMerDistinct=0.975 correctedErrorRate=0.085 -p derecta -d results/canu_correct genomeSize=145m -trimmed -corrected -nanopore results/canu_correct/derecta.trimmedReads.fasta.gz
And this is the report:
[CORRECTION/READS]
--
-- In sequence store './derecta.seqStore':
-- Found 136072 reads.
-- Found 14500076338 bases (100 times coverage).
-- Histogram of raw reads:
--
-- G=14500076338 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 157050 7580 1450108360 || 80368-101874 78319|---------------------------------------------------------------
-- 00020 133286 17681 2900113590 || 101875-123381 32213|--------------------------
-- 00030 119805 29192 4350052527 || 123382-144888 13792|------------
-- 00040 110219 41833 5800051868 || 144889-166395 6239|------
-- 00050 102934 55462 7250077163 || 166396-187902 2823|---
-- 00060 97046 69981 8700137391 || 187903-209409 1302|--
-- 00070 92021 85335 10150125992 || 209410-230916 632|-
-- 00080 87665 101486 11600104333 || 230917-252423 288|-
-- 00090 83842 118403 13050151732 || 252424-273930 160|-
-- 00100 80368 136071 14500076338 || 273931-295437 103|-
-- 001.000x 136072 14500076338 || 295438-316944 45|-
-- || 316945-338451 27|-
-- || 338452-359958 23|-
-- || 359959-381465 20|-
-- || 381466-402972 16|-
-- || 402973-424479 8|-
-- || 424480-445986 8|-
-- || 445987-467493 11|-
-- || 467494-489000 5|-
-- || 489001-510507 8|-
-- || 510508-532014 4|-
-- || 532015-553521 5|-
-- || 553522-575028 1|-
-- || 575029-596535 2|-
-- || 596536-618042 4|-
-- || 618043-639549 2|-
-- || 639550-661056 1|-
-- || 661057-682563 1|-
-- || 682564-704070 1|-
-- || 704071-725577 0|
-- || 725578-747084 0|
-- || 747085-768591 1|-
-- || 768592-790098 0|
-- || 790099-811605 1|-
-- || 811606-833112 0|
-- || 833113-854619 0|
-- || 854620-876126 0|
-- || 876127-897633 0|
-- || 897634-919140 1|-
-- || 919141-940647 1|-
-- || 940648-962154 1|-
-- || 962155-983661 1|-
-- || 983662-1005168 0|
-- || 1005169-1026675 1|-
-- || 1026676-1048182 0|
-- || 1048183-1069689 0|
-- || 1069690-1091196 1|-
-- || 1091197-1112703 0|
-- || 1112704-1134210 0|
-- || 1134211-1155717 1|-
--
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 248674277 ********************************************************************** 0.3958 0.0356
-- 3- 4 180564055 ************************************************** 0.5860 0.0613
-- 5- 7 64529184 ****************** 0.7363 0.0907
-- 8- 11 18784914 ***** 0.7980 0.1093
-- 12- 16 5889744 * 0.8188 0.1187
-- 17- 22 2152641 0.8261 0.1235
-- 23- 29 921796 0.8289 0.1261
-- 30- 37 489651 0.8302 0.1276
-- 38- 46 485163 0.8309 0.1287
-- 47- 56 921941 0.8317 0.1303
-- 57- 67 3335924 0.8333 0.1341
-- 68- 79 15334749 **** 0.8394 0.1517
-- 80- 92 36193813 ********** 0.8664 0.2435
-- 93- 106 29894155 ******** 0.9250 0.4736
-- 107- 121 8472079 ** 0.9698 0.6741
-- 122- 137 1363515 0.9818 0.7347
-- 138- 154 748845 0.9837 0.7460
-- 155- 172 1684975 0.9850 0.7541
-- 173- 191 2598153 0.9877 0.7749
-- 192- 211 1781528 0.9919 0.8089
-- 212- 232 664419 0.9946 0.8335
-- 233- 254 354558 0.9956 0.8435
-- 255- 277 399351 0.9962 0.8496
-- 278- 301 396207 0.9968 0.8573
-- 302- 326 257783 0.9974 0.8654
-- 327- 352 167917 0.9978 0.8711
-- 353- 379 149104 0.9981 0.8751
-- 380- 407 133662 0.9983 0.8790
-- 408- 436 102005 0.9985 0.8828
-- 437- 466 82033 0.9987 0.8858
-- 467- 497 72164 0.9988 0.8885
-- 498- 529 62695 0.9989 0.8909
-- 530- 562 53929 0.9990 0.8932
-- 563- 596 45293 0.9991 0.8953
-- 597- 631 40167 0.9992 0.8972
-- 632- 667 34691 0.9993 0.8990
-- 668- 704 30711 0.9993 0.9006
-- 705- 742 27504 0.9994 0.9021
-- 743- 781 24457 0.9994 0.9035
-- 782- 821 22014 0.9994 0.9048
--
-- 0 (max occurrences)
-- 13964914381 (total mers, non-unique)
-- 628272341 (distinct mers, non-unique)
-- 0 (unique mers)
[CORRECTION/LAYOUT]
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 129367 6705
-- Number of Bases 13853602641 391543023
-- Coverage 95.542 2.700
-- Median 98261 82964
-- Mean 107087 58395
-- N50 103582 88996
-- Minimum 80368 0
-- Maximum 1155702 214316
-- -- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 132593 42515 42515 243 243
-- Number of Bases 14147354079 5825528856 5800049114 22067145 21636739
-- Coverage 97.568 40.176 40.000 0.152 0.149
-- Median 97867 127214 126763 89567 88391
-- Mean 106697 137022 136423 90811 89040
-- N50 103108 131958 131320 90405 89331
-- Minimum 80368 108170 108167 80401 45033
-- Maximum 1155702 1006825 984031 110773 107537
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 93314 93314
-- Number of Bases 8397549663 7608139687
-- Coverage 57.914 52.470
-- Median 90450 88991
-- Mean 89992 81532
-- N50 91868 91490
-- Minimum 0 0
-- Maximum 1155702 988212
--
-- Maximum Memory 10924732928
[TRIMMING/READS]
--
-- In sequence store './derecta.seqStore':
-- Found 41759 reads.
-- Found 5642164200 bases (38.91 times coverage).
-- Histogram of corrected reads:
--
-- G=5642164200 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 184517 2617 564358383 || 17666-29203 2|-
-- 00020 161088 5914 1128567575 || 29204-40741 3|-
-- 00030 147797 9582 1692708198 || 40742-52279 4|-
-- 00040 138161 13534 2256893764 || 52280-63817 8|-
-- 00050 130888 17734 2821097746 || 63818-75355 26|-
-- 00060 125002 22148 3385338086 || 75356-86893 164|-
-- 00070 119925 26758 3949542316 || 86894-98431 172|-
-- 00080 115485 31554 4513737864 || 98432-109969 2611|-------------
-- 00090 111565 36526 5078016566 || 109970-121507 13546|---------------------------------------------------------------
-- 00100 17666 41758 5642164200 || 121508-133045 8893|------------------------------------------
-- 001.000x 41759 5642164200 || 133046-144583 5542|--------------------------
-- || 144584-156121 3758|------------------
-- || 156122-167659 2346|-----------
-- || 167660-179197 1531|--------
-- || 179198-190735 1047|-----
-- || 190736-202273 666|----
-- || 202274-213811 464|---
-- || 213812-225349 307|--
-- || 225350-236887 223|--
-- || 236888-248425 119|-
-- || 248426-259963 84|-
-- || 259964-271501 73|-
-- || 271502-283039 62|-
-- || 283040-294577 27|-
-- || 294578-306115 19|-
-- || 306116-317653 15|-
-- || 317654-329191 12|-
-- || 329192-340729 5|-
-- || 340730-352267 6|-
-- || 352268-363805 4|-
-- || 363806-375343 2|-
-- || 375344-386881 2|-
-- || 386882-398419 2|-
-- || 398420-409957 1|-
-- || 409958-421495 1|-
-- || 421496-433033 1|-
-- || 433034-444571 0|
-- || 444572-456109 1|-
-- || 456110-467647 3|-
-- || 467648-479185 1|-
-- || 479186-490723 1|-
-- || 490724-502261 2|-
-- || 502262-513799 0|
-- || 513800-525337 1|-
-- || 525338-536875 1|-
-- || 536876-548413 0|
-- || 548414-559951 0|
-- || 559952-571489 0|
-- || 571490-583027 0|
-- || 583028-594565 1|-
--
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 456343 0.0038 0.0002
-- 3- 4 275194 0.0053 0.0003
-- 5- 7 147403 0.0066 0.0004
-- 8- 11 120886 0.0076 0.0005
-- 12- 16 229767 0.0085 0.0007
-- 17- 22 897183 * 0.0109 0.0015
-- 23- 29 4318648 ***** 0.0199 0.0055
-- 30- 37 24298136 ******************************** 0.0651 0.0323
-- 38- 46 52706201 ********************************************************************** 0.3000 0.2083
-- 47- 56 30421717 **************************************** 0.7391 0.6094
-- 57- 67 3978010 ***** 0.9585 0.8484
-- 68- 79 403722 0.9840 0.8812
-- 80- 92 373254 0.9872 0.8863
-- 93- 106 272704 0.9903 0.8919
-- 107- 121 166496 0.9925 0.8966
-- 122- 137 114305 0.9938 0.8998
-- 138- 154 92775 0.9947 0.9025
-- 155- 172 64583 0.9955 0.9048
-- 173- 191 58223 0.9960 0.9066
-- 192- 211 44976 0.9965 0.9085
-- 212- 232 39615 0.9969 0.9101
-- 233- 254 31118 0.9972 0.9116
-- 255- 277 24492 0.9974 0.9130
-- 278- 301 25322 0.9977 0.9141
-- 302- 326 19437 0.9979 0.9154
-- 327- 352 17457 0.9980 0.9165
-- 353- 379 14889 0.9982 0.9176
-- 380- 407 14506 0.9983 0.9185
-- 408- 436 13595 0.9984 0.9195
-- 437- 466 12514 0.9985 0.9206
-- 467- 497 11049 0.9986 0.9216
-- 498- 529 9196 0.9987 0.9225
-- 530- 562 8445 0.9988 0.9233
-- 563- 596 7640 0.9989 0.9241
-- 597- 631 7528 0.9989 0.9249
-- 632- 667 7037 0.9990 0.9258
-- 668- 704 5830 0.9991 0.9266
-- 705- 742 5500 0.9991 0.9273
-- 743- 781 4914 0.9991 0.9280
-- 782- 821 4341 0.9992 0.9286
--
-- 0 (max occurrences)
-- 5631711220 (total mers, non-unique)
-- 119818115 (distinct mers, non-unique)
-- 0 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0850 (use overlaps at or below this fraction error)
-- 500 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 2 (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 136072 reads 5642164200 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 8376 reads 1039795585 bases (trimmed reads output)
-- 33295 reads 4505945499 bases (reads with no change, kept as is)
-- 94346 reads 3400096 bases (reads with no overlaps, deleted)
-- 55 reads 7723074 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 6010 reads 40651339 bases (bases trimmed from the 5' end of a read)
-- 3032 reads 44648607 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0850 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 41671 reads 5631041030 bases (reads processed)
-- 94401 reads 11123170 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 25 reads 5229328 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 41646 reads 5625811702 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 1 reads 1 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 1 reads 155 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 0 reads 0 bases (trimmed from the 5' end of the read)
-- 1 reads 1760 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In sequence store './derecta.seqStore':
-- Found 41671 reads.
-- Found 5545739324 bases (38.24 times coverage).
-- Histogram of corrected-trimmed reads:
--
-- G=5545739324 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 182283 2615 554587431 || 1010-12685 115|-
-- 00020 159463 5891 1109288324 || 12686-24361 24|-
-- 00030 146650 9526 1663764666 || 24362-36037 22|-
-- 00040 137179 13440 2218383078 || 36038-47713 40|-
-- 00050 129997 17596 2772969955 || 47714-59389 90|-
-- 00060 124231 21962 3327548448 || 59390-71065 142|-
-- 00070 119223 26520 3882129588 || 71066-82741 248|--
-- 00080 114866 31260 4436592446 || 82742-94417 271|--
-- 00090 111037 36172 4991236172 || 94418-106093 488|---
-- 00100 1010 41670 5545739324 || 106094-117769 12210|---------------------------------------------------------------
-- 001.000x 41671 5545739324 || 117770-129445 10040|----------------------------------------------------
-- || 129446-141121 6276|---------------------------------
-- || 141122-152797 4128|----------------------
-- || 152798-164473 2654|--------------
-- || 164474-176149 1703|---------
-- || 176150-187825 1092|------
-- || 187826-199501 693|----
-- || 199502-211177 478|---
-- || 211178-222853 325|--
-- || 222854-234529 215|--
-- || 234530-246205 118|-
-- || 246206-257881 85|-
-- || 257882-269557 66|-
-- || 269558-281233 57|-
-- || 281234-292909 26|-
-- || 292910-304585 14|-
-- || 304586-316261 11|-
-- || 316262-327937 12|-
-- || 327938-339613 5|-
-- || 339614-351289 4|-
-- || 351290-362965 5|-
-- || 362966-374641 1|-
-- || 374642-386317 1|-
-- || 386318-397993 2|-
-- || 397994-409669 1|-
-- || 409670-421345 1|-
-- || 421346-433021 1|-
-- || 433022-444697 0|
-- || 444698-456373 1|-
-- || 456374-468049 2|-
-- || 468050-479725 1|-
-- || 479726-491401 0|
-- || 491402-503077 0|
-- || 503078-514753 0|
-- || 514754-526429 1|-
-- || 526430-538105 1|-
-- || 538106-549781 0|
-- || 549782-561457 0|
-- || 561458-573133 0|
-- || 573134-584809 1|-
--
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 230020 0.0019 0.0001
-- 3- 6 205675 0.0027 0.0002
-- 7- 13 167111 0.0039 0.0003
-- 14- 23 1384258 * 0.0055 0.0007
-- 24- 36 24757538 ********************* 0.0198 0.0072
-- 37- 52 80039921 ********************************************************************** 0.2653 0.1837
-- 53- 71 10721401 ********* 0.9138 0.8033
-- 72- 93 647215 0.9853 0.8913
-- 94- 118 381191 0.9907 0.9009
-- 119- 146 185148 0.9937 0.9078
-- 147- 177 118536 0.9953 0.9122
-- 178- 211 84356 0.9962 0.9156
-- 212- 248 61750 0.9969 0.9185
-- 249- 288 42108 0.9975 0.9210
-- 289- 331 35388 0.9978 0.9231
-- 332- 377 27072 0.9981 0.9250
-- 378- 426 24351 0.9983 0.9267
-- 427- 478 21344 0.9985 0.9285
-- 479- 533 15938 0.9987 0.9302
-- 534- 591 13556 0.9988 0.9316
-- 592- 652 12501 0.9990 0.9330
-- 653- 716 10399 0.9991 0.9344
-- 717- 783 8003 0.9991 0.9357
-- 784- 853 6913 0.9992 0.9368
-- 854- 926 5093 0.9993 0.9378
-- 927- 1002 4856 0.9993 0.9386
-- 1003- 1081 4190 0.9994 0.9394
-- 1082- 1163 4269 0.9994 0.9402
-- 1164- 1248 6393 0.9994 0.9411
-- 1249- 1336 6056 0.9995 0.9425
-- 1337- 1427 3186 0.9995 0.9439
-- 1428- 1521 2560 0.9996 0.9447
-- 1522- 1618 2451 0.9996 0.9454
-- 1619- 1718 2933 0.9996 0.9461
-- 1719- 1821 3652 0.9996 0.9469
-- 1822- 1927 3181 0.9997 0.9481
-- 1928- 2036 2479 0.9997 0.9492
-- 2037- 2148 4172 0.9997 0.9501
-- 2149- 2263 1654 0.9997 0.9517
-- 2264- 2381 904 0.9997 0.9523
--
-- 0 (max occurrences)
-- 5541777172 (total mers, non-unique)
-- 119288981 (distinct mers, non-unique)
-- 0 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 72 0.17 125575.38 +- 88642.67 29363.92 +- 34573.11 (bad trimming)
-- middle-hump 3 0.01 108004.00 +- 51863.79 44073.00 +- 50111.01 (bad trimming)
-- no-5-prime 35 0.08 97546.09 +- 46218.33 23718.60 +- 22400.18 (bad trimming)
-- no-3-prime 33 0.08 111794.48 +- 48361.57 14413.91 +- 18512.81 (bad trimming)
--
-- low-coverage 625 1.50 98408.61 +- 41776.90 5.47 +- 2.36 (easy to assemble, potential for lower quality consensus)
-- unique 39195 94.06 133703.26 +- 28519.46 41.99 +- 7.50 (easy to assemble, perfect, yay)
-- repeat-cont 26 0.06 94521.31 +- 34629.93 107.50 +- 23.75 (potential for consensus errors, no impact on assembly)
-- repeat-dove 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 1018 2.44 138097.88 +- 37519.36 28652.41 +- 28424.31 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 485 1.16 117335.64 +- 35797.47 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 164 0.39 162290.30 +- 51546.67 (will end contigs, potential to misassemble)
-- uniq-anchor 3 0.01 91306.00 +- 36220.45 24663.67 +- 9528.51 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/ERROR RATES]
--
-- ERROR RATES
-- -----------
-- --------threshold------
-- 46544 fraction error fraction percent
-- samples (1e-5) error error
-- -------------------------- -------- --------
-- command line (-eg) -> 8500.00 8.5000%
-- command line (-ef) -> -----.-- ---.----%
-- command line (-eM) -> 8500.00 8.5000%
-- mean + std.dev 40.94 +- 12 * 359.06 -> 4349.69 4.3497% (enabled)
-- median + mad 0.00 +- 12 * 0.00 -> 0.00 0.0000%
-- 90th percentile -> 4.00 0.0040%
--
-- BEST EDGE FILTERING
-- -------------------
-- At graph threshold 8.5000%, reads:
-- available to have edges: 5034
-- with at least one edge: 5002
--
-- At max threshold 8.5000%, reads: (not computed)
-- available to have edges: 0
-- with at least one edge: 0
--
-- At tight threshold 0.0040%, reads with:
-- both edges below error threshold: 3190 (80.00% minReadsBest threshold = 4001)
-- one edge above error threshold: 857
-- both edges above error threshold: 955
-- at least one edge: 5002
--
-- At loose threshold 4.3497%, reads with:
-- both edges below error threshold: 4933 (80.00% minReadsBest threshold = 4001)
-- one edge above error threshold: 63
-- both edges above error threshold: 6
-- at least one edge: 5002
--
--
-- INITIAL EDGES
-- -------- ----------------------------------------
-- 36410 reads are contained
-- 94731 reads have no best edges (singleton)
-- 91 reads have only one best edge (spur)
-- 70 are mutual best
-- 4840 reads have two best edges
-- 110 have one mutual best edge
-- 4695 have two mutual best edges
--
--
-- FINAL EDGES
-- -------- ----------------------------------------
-- 36410 reads are contained
-- 94786 reads have no best edges (singleton)
-- 95 reads have only one best edge (spur)
-- 83 are mutual best
-- 4781 reads have two best edges
-- 83 have one mutual best edge
-- 4676 have two mutual best edges
--
--
-- EDGE FILTERING
-- -------- ------------------------------------------
-- 0 reads are ignored
-- 265 reads have a gap in overlap coverage
-- 27 reads have lopsided best edges
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 42 sequences, total length 143364405 bp (including 10 repeats of total length 3621731 bp).
-- bubbles: 8 sequences, total length 2671930 bp.
-- unassembled: 625 sequences, total length 61476931 bp.
--
-- Contig sizes based on genome size 145mbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 30632353 1 30632353
-- 20 30632353 1 30632353
-- 30 26935139 2 57567492
-- 40 26349955 3 83917447
-- 50 26349955 3 83917447
-- 60 21998810 4 105916257
-- 70 21998810 4 105916257
-- 80 20055920 5 125972177
-- 90 1617177 7 131561269
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 42 sequences, total length 143304869 bp (including 10 repeats of total length 3615982 bp).
-- bubbles: 8 sequences, total length 2664384 bp.
-- unassembled: 625 sequences, total length 61476908 bp.
--
-- Contig sizes based on genome size 145mbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 30626116 1 30626116
-- 20 30626116 1 30626116
-- 30 26933743 2 57559859
-- 40 26337489 3 83897348
-- 50 26337489 3 83897348
-- 60 21994750 4 105892098
-- 70 21994750 4 105892098
-- 80 20037162 5 125929260
-- 90 1614787 7 131512927
--
(END)
Based on this report, the error rate of 0.085 is sufficient as it actually ends up using about 5% error. The contig sizes look close to chromosome arm lengths in this genome. I'm going to close this issue since you've been able to get an assembly which looks to be reasonable quality.
Dear Canu team,
I have ultra-long read sequencing data (PromethiON flow cell from Oxford Nanopore) of Drosophila erecta (genome size 145MB). Basecalling was done using Guppy version 6.5.7. Prior to using canu I created a fastq file containing 100x coverage of the longest reads. All my analyses are done on a Linux computer with 60 cores.
I am using canu 2.2 to create an assembly and the correction step using nanopore correction took a few days but worked.
For the trimming I used the following code:
./softwares/canu-2.2/bin/canu -trim corThreads=40 -p derecta -d results/canu_correct genomeSize=145m -corrected -nanopore results/canu_correct/derecta.correctedReads.fasta.gz
First it seemed to do some progress, but its been stuck at the 'obtovl' step for more than a month now (I started on September 25).
Here is my logfile:
Thank you for your help!