Canu 2.2 stuck at trimming step

SarahSaadain commented 1 year ago

Dear Canu team,

I have ultra-long read sequencing data (PromethiON flow cell from Oxford Nanopore) of Drosophila erecta (genome size 145MB). Basecalling was done using Guppy version 6.5.7. Prior to using canu I created a fastq file containing 100x coverage of the longest reads. All my analyses are done on a Linux computer with 60 cores.

I am using canu 2.2 to create an assembly and the correction step using nanopore correction took a few days but worked.

For the trimming I used the following code:

./softwares/canu-2.2/bin/canu -trim corThreads=40 -p derecta -d results/canu_correct genomeSize=145m -corrected -nanopore results/canu_correct/derecta.correctedReads.fasta.gz

First it seemed to do some progress, but its been stuck at the 'obtovl' step for more than a month now (I started on September 25).

Here is my logfile:

Mon Sep 25 10:43:36 CEST 2023
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_CTYPE = "UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '20-internal' (from '/home/vetlinux04/.miniconda3/lib/jvm/bin/java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
--
-- Detected 64 CPUs and 504 gigabytes of memory on the local machine.
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl     24.000 GB    8 CPUs x   8 jobs   192.000 GB  64 CPUs  (k-mer counting)
-- Local: hap       12.000 GB   16 CPUs x   4 jobs    48.000 GB  64 CPUs  (read-to-haplotype assignment)
-- Local: cormhap   13.000 GB   16 CPUs x   4 jobs    52.000 GB  64 CPUs  (overlap detection with mhap)
-- Local: obtovl     8.000 GB    8 CPUs x   8 jobs    64.000 GB  64 CPUs  (overlap detection)
-- Local: utgovl     8.000 GB    8 CPUs x   8 jobs    64.000 GB  64 CPUs  (overlap detection)
-- Local: cor        -.--- GB   40 CPUs x   - jobs     -.--- GB   - CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x  64 jobs   256.000 GB  64 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x  63 jobs   504.000 GB  63 CPUs  (overlap store sorting)
-- Local: red       16.000 GB    4 CPUs x  16 jobs   256.000 GB  64 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x  63 jobs   504.000 GB  63 CPUs  (overlap error adjustment)
-- Local: bat       64.000 GB    8 CPUs x   1 job     64.000 GB   8 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    8 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- Found Nanopore reads in 'derecta.seqStore':
--   Libraries:
--     Nanopore:              1
--   Reads:
--     Raw:                   14500076338
--     Corrected:             5642164200
--
--
-- Generating assembly 'derecta' in '/home/vetlinux04/Sarah/results/canu_correct':
--   genomeSize:
--     145000000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.3200 ( 32.00%)
--     obtOvlErrorRate 0.1200 ( 12.00%)
--     utgOvlErrorRate 0.1200 ( 12.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.3000 ( 30.00%)
--     obtErrorRate    0.1200 ( 12.00%)
--     utgErrorRate    0.1200 ( 12.00%)
--     cnsErrorRate    0.2000 ( 20.00%)
--
--   Stages to run:
--     only trim corrected reads.
--
--
-- Correction skipped; not enabled.
--
-- BEGIN TRIMMING
----------------------------------------
-- Starting command on Mon Sep 25 10:43:36 2023 with 6284.611 GB free disk space

    cd trimming/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Mon Sep 25 10:43:36 2023 (like a bat out of hell) with 6284.61 GB free disk space
----------------------------------------
--  segments   memory batches
--  -------- -------- -------
--        01 18.43 GB       2
--        02  9.46 GB       2
--        04  5.00 GB       2
--        06  3.54 GB       2
--        08  2.59 GB       2
--        12  1.87 GB       2
--        16  1.40 GB       2
--        20  1.12 GB       2
--        24  0.94 GB       2
--        32  0.70 GB       2
--        40  0.57 GB       2
--        48  0.47 GB       2
--        56  0.41 GB       2
--
--  For 41759 reads with 5642164200 bases, limit to 56 batches.
--  Will count kmers using 01 jobs, each using 20 GB and 8 threads.
--
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Mon Sep 25 10:43:36 2023 with 6284.61 GB free disk space (1 processes; 8 concurrently)

    cd trimming/0-mercounts
    ./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1

-- Finished on Mon Sep 25 10:53:53 2023 (617 seconds) with 6283.821 GB free disk space
----------------------------------------
-- Found 1 Kmer counting (meryl) outputs.
-- Finished stage 'obt-merylCountCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Mon Sep 25 10:53:53 2023 with 6283.821 GB free disk space (1 processes; 8 concurrently)

    cd trimming/0-mercounts
    ./meryl-process.sh 1 > ./meryl-process.000001.out 2>&1

-- Finished on Mon Sep 25 10:53:57 2023 (4 seconds) with 6283.892 GB free disk space
----------------------------------------
-- Meryl finished successfully.  Kmer frequency histogram:
--
-- WARNING: gnuplot failed.
--
----------------------------------------
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2    456343                                                                        0.0038 0.0002
--       3-     4    275194                                                                        0.0053 0.0003
--       5-     7    147403                                                                        0.0066 0.0004
--       8-    11    120886                                                                        0.0076 0.0005
--      12-    16    229767                                                                        0.0085 0.0007
--      17-    22    897183 *                                                                      0.0109 0.0015
--      23-    29   4318648 *****                                                                  0.0199 0.0055
--      30-    37  24298136 ********************************                                       0.0651 0.0323
--      38-    46  52706201 ********************************************************************** 0.3000 0.2083
--      47-    56  30421717 ****************************************                               0.7391 0.6094
--      57-    67   3978010 *****                                                                  0.9585 0.8484
--      68-    79    403722                                                                        0.9840 0.8812
--      80-    92    373254                                                                        0.9872 0.8863
--      93-   106    272704                                                                        0.9903 0.8919
--     107-   121    166496                                                                        0.9925 0.8966
--     122-   137    114305                                                                        0.9938 0.8998
--     138-   154     92775                                                                        0.9947 0.9025
--     155-   172     64583                                                                        0.9955 0.9048
--     173-   191     58223                                                                        0.9960 0.9066
--     192-   211     44976                                                                        0.9965 0.9085
--     212-   232     39615                                                                        0.9969 0.9101
--     233-   254     31118                                                                        0.9972 0.9116
--     255-   277     24492                                                                        0.9974 0.9130
--     278-   301     25322                                                                        0.9977 0.9141
--     302-   326     19437                                                                        0.9979 0.9154
--     327-   352     17457                                                                        0.9980 0.9165
--     353-   379     14889                                                                        0.9982 0.9176
--     380-   407     14506                                                                        0.9983 0.9185
--     408-   436     13595                                                                        0.9984 0.9195
--     437-   466     12514                                                                        0.9985 0.9206
--     467-   497     11049                                                                        0.9986 0.9216
--     498-   529      9196                                                                        0.9987 0.9225
--     530-   562      8445                                                                        0.9988 0.9233
--     563-   596      7640                                                                        0.9989 0.9241
--     597-   631      7528                                                                        0.9989 0.9249
--     632-   667      7037                                                                        0.9990 0.9258
--     668-   704      5830                                                                        0.9991 0.9266
--     705-   742      5500                                                                        0.9991 0.9273
--     743-   781      4914                                                                        0.9991 0.9280
--     782-   821      4341                                                                        0.9992 0.9286
--
--           0 (max occurrences)
--  5631711220 (total mers, non-unique)
--   119818115 (distinct mers, non-unique)
--           0 (unique mers)
-- Finished stage 'meryl-process', reset canuIteration.
--
-- Removing meryl database 'trimming/0-mercounts/derecta.ms22'.
--
-- OVERLAPPER (normal) (trimming) erate=0.12
--
----------------------------------------
-- Starting command on Mon Sep 25 10:53:57 2023 with 6284.604 GB free disk space

    cd trimming/1-overlapper
       /home/vetlinux04/Sarah/softwares/canu-2.2/bin/overlapInCorePartition \
     -S  ../../derecta.seqStore \
     -hl 160000000 \
     -rl 5000000000 \
     -ol 500 \
     -o  ./derecta.partition \
    > ./derecta.partition.err 2>&1

-- Finished on Mon Sep 25 10:53:57 2023 (fast as lightning) with 6284.604 GB free disk space
----------------------------------------
--
-- Configured 41 overlapInCore jobs.
-- Finished stage 'obt-overlapConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'obtovl' concurrent execution on Mon Sep 25 10:53:57 2023 with 6284.604 GB free disk space (41 processes; 8 concurrently)

    cd trimming/1-overlapper
    ./overlap.sh 1 > ./overlap.000001.out 2>&1
    ./overlap.sh 2 > ./overlap.000002.out 2>&1
    ./overlap.sh 3 > ./overlap.000003.out 2>&1
    ./overlap.sh 4 > ./overlap.000004.out 2>&1
    ./overlap.sh 5 > ./overlap.000005.out 2>&1
    ./overlap.sh 6 > ./overlap.000006.out 2>&1
    ./overlap.sh 7 > ./overlap.000007.out 2>&1
    ./overlap.sh 8 > ./overlap.000008.out 2>&1
    ./overlap.sh 9 > ./overlap.000009.out 2>&1
    ./overlap.sh 10 > ./overlap.000010.out 2>&1
    ./overlap.sh 11 > ./overlap.000011.out 2>&1
    ./overlap.sh 12 > ./overlap.000012.out 2>&1
    ./overlap.sh 13 > ./overlap.000013.out 2>&1
    ./overlap.sh 14 > ./overlap.000014.out 2>&1
    ./overlap.sh 15 > ./overlap.000015.out 2>&1
    ./overlap.sh 16 > ./overlap.000016.out 2>&1
    ./overlap.sh 17 > ./overlap.000017.out 2>&1
    ./overlap.sh 18 > ./overlap.000018.out 2>&1
    ./overlap.sh 19 > ./overlap.000019.out 2>&1
    ./overlap.sh 20 > ./overlap.000020.out 2>&1
    ./overlap.sh 21 > ./overlap.000021.out 2>&1
    ./overlap.sh 22 > ./overlap.000022.out 2>&1
    ./overlap.sh 23 > ./overlap.000023.out 2>&1
    ./overlap.sh 24 > ./overlap.000024.out 2>&1
    ./overlap.sh 25 > ./overlap.000025.out 2>&1
    ./overlap.sh 26 > ./overlap.000026.out 2>&1
    ./overlap.sh 27 > ./overlap.000027.out 2>&1
    ./overlap.sh 28 > ./overlap.000028.out 2>&1
    ./overlap.sh 29 > ./overlap.000029.out 2>&1
    ./overlap.sh 30 > ./overlap.000030.out 2>&1
    ./overlap.sh 31 > ./overlap.000031.out 2>&1
    ./overlap.sh 32 > ./overlap.000032.out 2>&1
    ./overlap.sh 33 > ./overlap.000033.out 2>&1
    ./overlap.sh 34 > ./overlap.000034.out 2>&1
    ./overlap.sh 35 > ./overlap.000035.out 2>&1
    ./overlap.sh 36 > ./overlap.000036.out 2>&1
    ./overlap.sh 37 > ./overlap.000037.out 2>&1
    ./overlap.sh 38 > ./overlap.000038.out 2>&1
    ./overlap.sh 39 > ./overlap.000039.out 2>&1
    ./overlap.sh 40 > ./overlap.000040.out 2>&1
    ./overlap.sh 41 > ./overlap.000041.out 2>&1

Thank you for your help!

skoren commented 1 year ago

The overlap step can be slow. It looks like the step is running and making progress, it was running 8 jobs at a time but is up to 41, the last job. The FAQ has parameters that can help make this faster potentially after correction, specifically ovlMerDistinct=0.975. However, this requires a restart so I would probably just wait for the jobs to complete. You could run with the the --fast option but typically that produces a less continuous assembly.

SarahSaadain commented 1 year ago

Thank you for your fast reply! The last output that was created (according to the 1-overlapper folder) was on October 3rd and that was overlap41, so after that it seems to be stuck for 5 weeks now. I've come across the FAQ where you recommend the parameters that can be tweaked when having a very repetitive or large genome by using: corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 512 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975. The genome I want to assemble will be used to find transposable elements and piRNA clusters, hence I do not want to exclude too much of the repetitive regions and was not sure how stringent this code will be.

skoren commented 1 year ago

Since you already have corrected reads, only the last parameter would matter. It would filter what k-mers are allowed to seed an overlap with more repetitive ones not allowed to seed. Any repeats that still have unique k-mers w/in them would still be part of an overlap. However, like I said, this parameter change would require starting over from scratch.

What does top show for the system the jobs are running on? Are they still getting CPU/doing work.

SarahSaadain commented 1 year ago

Thank you again for your reply!

This is what top is showing me:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                  
  50365 vetlinu+  20   0   53.0g  45.5g    216 R 199.3   9.0 225840:30 overlapInCore

and when I do the htop I see that 8 threads are assigned but only two are running (see screenshot attached)

Initially it was using all 40 cores I assigned, ever since it is in the 'obtovl' step (which was beginning of october), it is only using 1-2 cores. Is this normal?

skoren commented 1 year ago

We do sometimes see that there are straggler reads which take longer to compute overlaps (either because they have more noise or more simple sequence k-mers seeding hits) which is why it's only using a couple of cores. The other subsets of reads have completed.

The default error rate for nanopore data of 12% is probably too high for modern datasets. I'd suggest killing the one job that didn't finish and editing the overlapping shell script to drop the error rate from 0.12 to 0.085 or even 0.065 to see if the job completes quicker (re-run it as overlap.sh 41). The other options I mentioned above would require a restart but you could launch one of those in parallel to see if it completes faster than this run. I'd drop the 12% there as well using correctedErrorRate=0.085.

SarahSaadain commented 1 year ago

Thank you for your reply!

I checked the overlap.out files and it is actually not 41 that is stuck but 1. So overlap 2-41 are all fine. To drop the error rate I change the --maxerate in overlap.sh to 0.0065?

Since your last reply I started running the same trimming script with your recommended ovlMerDistinct=0.975 on another computer, this time using 150x coverage. I double checked it now and from all 41 overlaps it is also overlap 1 that is stuck, while all others worked.

So to double check, I should kill the one job that is stuck (overlap 1), and change --maxerate in the output.sh to 0.065? Can I then just run the overlap.sh again and it will automatically continue where it was before? While the main script I submitted is still running?

skoren commented 1 year ago

Yes, that is correct, kill both the main script and the overlap.sh job that is stalled. Then edit as you said. Re-run overlap.sh 1 and then re-run the main script. It should resume the next step after overlapping once it sees all the output files are present.

SarahSaadain commented 1 year ago

Thank you very much! Both my trimming runs have finally finished: the one where I only used ovlMerDistinct=0.975 took 10 days and the other run where I included ovlMerDistinct=0.975 and maxerate of 0.0065 only took a coupe of days.

Is there any additional setting I can include to speed up the assembly step? I am using a correctedErrorRate of 0.14256 (because the recommended rate for Nanopore is 0.144 and I decreased it by 1% because I use high coverage). Should I use ovlMerDistinct=0.975 again?

Best regards

skoren commented 1 year ago

Yes, include ovlMerDistinct. I think you can probably drop the ONT error rate quite a bit from the 14% default. You could try with 0.065 since that was fast and see what the assembly looks like, post the report when it's done as that will have info on what error rates for most reads looked like. If that ends up being too low, you could always re-run with 0.105 or similar but go with the faster option first.

SarahSaadain commented 12 months ago

Thank you for your reply! The assembly with ovlMerDistinct=0.975 and an error rate of 0.144 is running for 2.5 weeks now and it seems it is again the overlap 1 that takes the longest, all others have finished a while ago. I now started the assembly on a different computer with your recommended error rate of 0.065 and hope it will be a bit faster.

SarahSaadain commented 12 months ago

Sorry I made a mistake in my previous message: The trimming script that finished fast had a correctedErrorRate=0.085, not 0.065. Therefore in the assembly script I am running now I also used the setting correctedErrorRate=0.085. I hope my logic makes sense. Or should I even decrease it down to 0.065, although the trimming used 0.085?

skoren commented 12 months ago

The assembly should use a higher or equal error rate to trimming so 0.085 is good. You don't want to lower the assembly rate vs trimming since the trimming could have left errors at up to that rate in the reads which would then have no overlaps in the assembly step.

SarahSaadain commented 12 months ago

okay, thank you for clarifying!

SarahSaadain commented 11 months ago

The one assembly with the "faster" settings has finished, here are the parameters that I used: ./softwares/canu-2.2/bin/canu corThreads=30 ovlMerDistinct=0.975 correctedErrorRate=0.085 -p derecta -d results/canu_correct genomeSize=145m -trimmed -corrected -nanopore results/canu_correct/derecta.trimmedReads.fasta.gz

And this is the report:

[CORRECTION/READS]
--
-- In sequence store './derecta.seqStore':
--   Found 136072 reads.
--   Found 14500076338 bases (100 times coverage).
--    Histogram of raw reads:
--    
--    G=14500076338                      sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010       157050      7580   1450108360  ||      80368-101874      78319|---------------------------------------------------------------
--    00020       133286     17681   2900113590  ||     101875-123381      32213|--------------------------
--    00030       119805     29192   4350052527  ||     123382-144888      13792|------------
--    00040       110219     41833   5800051868  ||     144889-166395       6239|------
--    00050       102934     55462   7250077163  ||     166396-187902       2823|---
--    00060        97046     69981   8700137391  ||     187903-209409       1302|--
--    00070        92021     85335  10150125992  ||     209410-230916        632|-
--    00080        87665    101486  11600104333  ||     230917-252423        288|-
--    00090        83842    118403  13050151732  ||     252424-273930        160|-
--    00100        80368    136071  14500076338  ||     273931-295437        103|-
--    001.000x              136072  14500076338  ||     295438-316944         45|-
--                                               ||     316945-338451         27|-
--                                               ||     338452-359958         23|-
--                                               ||     359959-381465         20|-
--                                               ||     381466-402972         16|-
--                                               ||     402973-424479          8|-
--                                               ||     424480-445986          8|-
--                                               ||     445987-467493         11|-
--                                               ||     467494-489000          5|-
--                                               ||     489001-510507          8|-
--                                               ||     510508-532014          4|-
--                                               ||     532015-553521          5|-
--                                               ||     553522-575028          1|-
--                                               ||     575029-596535          2|-
--                                               ||     596536-618042          4|-
--                                               ||     618043-639549          2|-
--                                               ||     639550-661056          1|-
--                                               ||     661057-682563          1|-
--                                               ||     682564-704070          1|-
--                                               ||     704071-725577          0|
--                                               ||     725578-747084          0|
--                                               ||     747085-768591          1|-
--                                               ||     768592-790098          0|
--                                               ||     790099-811605          1|-
--                                               ||     811606-833112          0|
--                                               ||     833113-854619          0|
--                                               ||     854620-876126          0|
--                                               ||     876127-897633          0|
--                                               ||     897634-919140          1|-
--                                               ||     919141-940647          1|-
--                                               ||     940648-962154          1|-
--                                               ||     962155-983661          1|-
--                                               ||     983662-1005168         0|
--                                               ||    1005169-1026675         1|-
--                                               ||    1026676-1048182         0|
--                                               ||    1048183-1069689         0|
--                                               ||    1069690-1091196         1|-
--                                               ||    1091197-1112703         0|
--                                               ||    1112704-1134210         0|
--                                               ||    1134211-1155717         1|-
--
[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2 248674277 ********************************************************************** 0.3958 0.0356
--       3-     4 180564055 **************************************************                     0.5860 0.0613
--       5-     7  64529184 ******************                                                     0.7363 0.0907
--       8-    11  18784914 *****                                                                  0.7980 0.1093
--      12-    16   5889744 *                                                                      0.8188 0.1187
--      17-    22   2152641                                                                        0.8261 0.1235
--      23-    29    921796                                                                        0.8289 0.1261
--      30-    37    489651                                                                        0.8302 0.1276
--      38-    46    485163                                                                        0.8309 0.1287
--      47-    56    921941                                                                        0.8317 0.1303
--      57-    67   3335924                                                                        0.8333 0.1341
--      68-    79  15334749 ****                                                                   0.8394 0.1517
--      80-    92  36193813 **********                                                             0.8664 0.2435
--      93-   106  29894155 ********                                                               0.9250 0.4736
--     107-   121   8472079 **                                                                     0.9698 0.6741
--     122-   137   1363515                                                                        0.9818 0.7347
--     138-   154    748845                                                                        0.9837 0.7460
--     155-   172   1684975                                                                        0.9850 0.7541
--     173-   191   2598153                                                                        0.9877 0.7749
--     192-   211   1781528                                                                        0.9919 0.8089
--     212-   232    664419                                                                        0.9946 0.8335
--     233-   254    354558                                                                        0.9956 0.8435
--     255-   277    399351                                                                        0.9962 0.8496
--     278-   301    396207                                                                        0.9968 0.8573
--     302-   326    257783                                                                        0.9974 0.8654
--     327-   352    167917                                                                        0.9978 0.8711
--     353-   379    149104                                                                        0.9981 0.8751
--     380-   407    133662                                                                        0.9983 0.8790
--     408-   436    102005                                                                        0.9985 0.8828
--     437-   466     82033                                                                        0.9987 0.8858
--     467-   497     72164                                                                        0.9988 0.8885
--     498-   529     62695                                                                        0.9989 0.8909
--     530-   562     53929                                                                        0.9990 0.8932
--     563-   596     45293                                                                        0.9991 0.8953
--     597-   631     40167                                                                        0.9992 0.8972
--     632-   667     34691                                                                        0.9993 0.8990
--     668-   704     30711                                                                        0.9993 0.9006
--     705-   742     27504                                                                        0.9994 0.9021
--     743-   781     24457                                                                        0.9994 0.9035
--     782-   821     22014                                                                        0.9994 0.9048
--
--           0 (max occurrences)
-- 13964914381 (total mers, non-unique)
--   628272341 (distinct mers, non-unique)
--           0 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads             129367          6705
--   Number of Bases        13853602641     391543023
--   Coverage                    95.542         2.700
--   Median                       98261         82964
--   Mean                        107087         58395
--   N50                         103582         88996
--   Minimum                      80368             0
--   Maximum                    1155702        214316
--   --                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads             132593          42515         42515            243           243
--   Number of Bases        14147354079     5825528856    5800049114       22067145      21636739
--   Coverage                    97.568         40.176        40.000          0.152         0.149
--   Median                       97867         127214        126763          89567         88391
--   Mean                        106697         137022        136423          90811         89040
--   N50                         103108         131958        131320          90405         89331
--   Minimum                      80368         108170        108167          80401         45033
--   Maximum                    1155702        1006825        984031         110773        107537
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads              93314         93314
--   Number of Bases         8397549663    7608139687
--   Coverage                    57.914        52.470
--   Median                       90450         88991
--   Mean                         89992         81532
--   N50                          91868         91490
--   Minimum                          0             0
--   Maximum                    1155702        988212
--   
--   Maximum Memory         10924732928

[TRIMMING/READS]
--
-- In sequence store './derecta.seqStore':
--   Found 41759 reads.
--   Found 5642164200 bases (38.91 times coverage).
--    Histogram of corrected reads:
--    
--    G=5642164200                       sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010       184517      2617    564358383  ||      17666-29203           2|-
--    00020       161088      5914   1128567575  ||      29204-40741           3|-
--    00030       147797      9582   1692708198  ||      40742-52279           4|-
--    00040       138161     13534   2256893764  ||      52280-63817           8|-
--    00050       130888     17734   2821097746  ||      63818-75355          26|-
--    00060       125002     22148   3385338086  ||      75356-86893         164|-
--    00070       119925     26758   3949542316  ||      86894-98431         172|-
--    00080       115485     31554   4513737864  ||      98432-109969       2611|-------------
--    00090       111565     36526   5078016566  ||     109970-121507      13546|---------------------------------------------------------------
--    00100        17666     41758   5642164200  ||     121508-133045       8893|------------------------------------------
--    001.000x               41759   5642164200  ||     133046-144583       5542|--------------------------
--                                               ||     144584-156121       3758|------------------
--                                               ||     156122-167659       2346|-----------
--                                               ||     167660-179197       1531|--------
--                                               ||     179198-190735       1047|-----
--                                               ||     190736-202273        666|----
--                                               ||     202274-213811        464|---
--                                               ||     213812-225349        307|--
--                                               ||     225350-236887        223|--
--                                               ||     236888-248425        119|-
--                                               ||     248426-259963         84|-
--                                               ||     259964-271501         73|-
--                                               ||     271502-283039         62|-
--                                               ||     283040-294577         27|-
--                                               ||     294578-306115         19|-
--                                               ||     306116-317653         15|-
--                                               ||     317654-329191         12|-
--                                               ||     329192-340729          5|-
--                                               ||     340730-352267          6|-
--                                               ||     352268-363805          4|-
--                                               ||     363806-375343          2|-
--                                               ||     375344-386881          2|-
--                                               ||     386882-398419          2|-
--                                               ||     398420-409957          1|-
--                                               ||     409958-421495          1|-
--                                               ||     421496-433033          1|-
--                                               ||     433034-444571          0|
--                                               ||     444572-456109          1|-
--                                               ||     456110-467647          3|-
--                                               ||     467648-479185          1|-
--                                               ||     479186-490723          1|-
--                                               ||     490724-502261          2|-
--                                               ||     502262-513799          0|
--                                               ||     513800-525337          1|-
--                                               ||     525338-536875          1|-
--                                               ||     536876-548413          0|
--                                               ||     548414-559951          0|
--                                               ||     559952-571489          0|
--                                               ||     571490-583027          0|
--                                               ||     583028-594565          1|-
--

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2    456343                                                                        0.0038 0.0002
--       3-     4    275194                                                                        0.0053 0.0003
--       5-     7    147403                                                                        0.0066 0.0004
--       8-    11    120886                                                                        0.0076 0.0005
--      12-    16    229767                                                                        0.0085 0.0007
--      17-    22    897183 *                                                                      0.0109 0.0015
--      23-    29   4318648 *****                                                                  0.0199 0.0055
--      30-    37  24298136 ********************************                                       0.0651 0.0323
--      38-    46  52706201 ********************************************************************** 0.3000 0.2083
--      47-    56  30421717 ****************************************                               0.7391 0.6094
--      57-    67   3978010 *****                                                                  0.9585 0.8484
--      68-    79    403722                                                                        0.9840 0.8812
--      80-    92    373254                                                                        0.9872 0.8863
--      93-   106    272704                                                                        0.9903 0.8919
--     107-   121    166496                                                                        0.9925 0.8966
--     122-   137    114305                                                                        0.9938 0.8998
--     138-   154     92775                                                                        0.9947 0.9025
--     155-   172     64583                                                                        0.9955 0.9048
--     173-   191     58223                                                                        0.9960 0.9066
--     192-   211     44976                                                                        0.9965 0.9085
--     212-   232     39615                                                                        0.9969 0.9101
--     233-   254     31118                                                                        0.9972 0.9116
--     255-   277     24492                                                                        0.9974 0.9130
--     278-   301     25322                                                                        0.9977 0.9141
--     302-   326     19437                                                                        0.9979 0.9154
--     327-   352     17457                                                                        0.9980 0.9165
--     353-   379     14889                                                                        0.9982 0.9176
--     380-   407     14506                                                                        0.9983 0.9185
--     408-   436     13595                                                                        0.9984 0.9195
--     437-   466     12514                                                                        0.9985 0.9206
--     467-   497     11049                                                                        0.9986 0.9216
--     498-   529      9196                                                                        0.9987 0.9225
--     530-   562      8445                                                                        0.9988 0.9233
--     563-   596      7640                                                                        0.9989 0.9241
--     597-   631      7528                                                                        0.9989 0.9249
--     632-   667      7037                                                                        0.9990 0.9258
--     668-   704      5830                                                                        0.9991 0.9266
--     705-   742      5500                                                                        0.9991 0.9273
--     743-   781      4914                                                                        0.9991 0.9280
--     782-   821      4341                                                                        0.9992 0.9286
--
--           0 (max occurrences)
--  5631711220 (total mers, non-unique)
--   119818115 (distinct mers, non-unique)
--           0 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0850    (use overlaps at or below this fraction error)
--      500    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        2    (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--  136072 reads   5642164200 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--    8376 reads   1039795585 bases (trimmed reads output)
--   33295 reads   4505945499 bases (reads with no change, kept as is)
--   94346 reads      3400096 bases (reads with no overlaps, deleted)
--      55 reads      7723074 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--    6010 reads     40651339 bases (bases trimmed from the 5' end of a read)
--    3032 reads     44648607 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0850    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--   41671 reads   5631041030 bases (reads processed)
--   94401 reads     11123170 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--      25 reads      5229328 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--   41646 reads   5625811702 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--       1 reads            1 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--       1 reads          155 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--       0 reads            0 bases (trimmed from the 5' end of the read)
--       1 reads         1760 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In sequence store './derecta.seqStore':
--   Found 41671 reads.
--   Found 5545739324 bases (38.24 times coverage).
--    Histogram of corrected-trimmed reads:
--    
--    G=5545739324                       sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010       182283      2615    554587431  ||       1010-12685         115|-
--    00020       159463      5891   1109288324  ||      12686-24361          24|-
--    00030       146650      9526   1663764666  ||      24362-36037          22|-
--    00040       137179     13440   2218383078  ||      36038-47713          40|-
--    00050       129997     17596   2772969955  ||      47714-59389          90|-
--    00060       124231     21962   3327548448  ||      59390-71065         142|-
--    00070       119223     26520   3882129588  ||      71066-82741         248|--
--    00080       114866     31260   4436592446  ||      82742-94417         271|--
--    00090       111037     36172   4991236172  ||      94418-106093        488|---
--    00100         1010     41670   5545739324  ||     106094-117769      12210|---------------------------------------------------------------
--    001.000x               41671   5545739324  ||     117770-129445      10040|----------------------------------------------------
--                                               ||     129446-141121       6276|---------------------------------
--                                               ||     141122-152797       4128|----------------------
--                                               ||     152798-164473       2654|--------------
--                                               ||     164474-176149       1703|---------
--                                               ||     176150-187825       1092|------
--                                               ||     187826-199501        693|----
--                                               ||     199502-211177        478|---
--                                               ||     211178-222853        325|--
--                                               ||     222854-234529        215|--
--                                               ||     234530-246205        118|-
--                                               ||     246206-257881         85|-
--                                               ||     257882-269557         66|-
--                                               ||     269558-281233         57|-
--                                               ||     281234-292909         26|-
--                                               ||     292910-304585         14|-
--                                               ||     304586-316261         11|-
--                                               ||     316262-327937         12|-
--                                               ||     327938-339613          5|-
--                                               ||     339614-351289          4|-
--                                               ||     351290-362965          5|-
--                                               ||     362966-374641          1|-
--                                               ||     374642-386317          1|-
--                                               ||     386318-397993          2|-
--                                               ||     397994-409669          1|-
--                                               ||     409670-421345          1|-
--                                               ||     421346-433021          1|-
--                                               ||     433022-444697          0|
--                                               ||     444698-456373          1|-
--                                               ||     456374-468049          2|-
--                                               ||     468050-479725          1|-
--                                               ||     479726-491401          0|
--                                               ||     491402-503077          0|
--                                               ||     503078-514753          0|
--                                               ||     514754-526429          1|-
--                                               ||     526430-538105          1|-
--                                               ||     538106-549781          0|
--                                               ||     549782-561457          0|
--                                               ||     561458-573133          0|
--                                               ||     573134-584809          1|-
--

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2    230020                                                                        0.0019 0.0001
--       3-     6    205675                                                                        0.0027 0.0002
--       7-    13    167111                                                                        0.0039 0.0003
--      14-    23   1384258 *                                                                      0.0055 0.0007
--      24-    36  24757538 *********************                                                  0.0198 0.0072
--      37-    52  80039921 ********************************************************************** 0.2653 0.1837
--      53-    71  10721401 *********                                                              0.9138 0.8033
--      72-    93    647215                                                                        0.9853 0.8913
--      94-   118    381191                                                                        0.9907 0.9009
--     119-   146    185148                                                                        0.9937 0.9078
--     147-   177    118536                                                                        0.9953 0.9122
--     178-   211     84356                                                                        0.9962 0.9156
--     212-   248     61750                                                                        0.9969 0.9185
--     249-   288     42108                                                                        0.9975 0.9210
--     289-   331     35388                                                                        0.9978 0.9231
--     332-   377     27072                                                                        0.9981 0.9250
--     378-   426     24351                                                                        0.9983 0.9267
--     427-   478     21344                                                                        0.9985 0.9285
--     479-   533     15938                                                                        0.9987 0.9302
--     534-   591     13556                                                                        0.9988 0.9316
--     592-   652     12501                                                                        0.9990 0.9330
--     653-   716     10399                                                                        0.9991 0.9344
--     717-   783      8003                                                                        0.9991 0.9357
--     784-   853      6913                                                                        0.9992 0.9368
--     854-   926      5093                                                                        0.9993 0.9378
--     927-  1002      4856                                                                        0.9993 0.9386
--    1003-  1081      4190                                                                        0.9994 0.9394
--    1082-  1163      4269                                                                        0.9994 0.9402
--    1164-  1248      6393                                                                        0.9994 0.9411
--    1249-  1336      6056                                                                        0.9995 0.9425
--    1337-  1427      3186                                                                        0.9995 0.9439
--    1428-  1521      2560                                                                        0.9996 0.9447
--    1522-  1618      2451                                                                        0.9996 0.9454
--    1619-  1718      2933                                                                        0.9996 0.9461
--    1719-  1821      3652                                                                        0.9996 0.9469
--    1822-  1927      3181                                                                        0.9997 0.9481
--    1928-  2036      2479                                                                        0.9997 0.9492
--    2037-  2148      4172                                                                        0.9997 0.9501
--    2149-  2263      1654                                                                        0.9997 0.9517
--    2264-  2381       904                                                                        0.9997 0.9523
--
--           0 (max occurrences)
--  5541777172 (total mers, non-unique)
--   119288981 (distinct mers, non-unique)
--           0 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing         72    0.17   125575.38 +- 88642.67     29363.92 +- 34573.11   (bad trimming)
--   middle-hump             3    0.01   108004.00 +- 51863.79     44073.00 +- 50111.01   (bad trimming)
--   no-5-prime             35    0.08    97546.09 +- 46218.33     23718.60 +- 22400.18   (bad trimming)
--   no-3-prime             33    0.08   111794.48 +- 48361.57     14413.91 +- 18512.81   (bad trimming)
--   
--   low-coverage          625    1.50    98408.61 +- 41776.90         5.47 +- 2.36       (easy to assemble, potential for lower quality consensus)
--   unique              39195   94.06   133703.26 +- 28519.46        41.99 +- 7.50       (easy to assemble, perfect, yay)
--   repeat-cont            26    0.06    94521.31 +- 34629.93       107.50 +- 23.75      (potential for consensus errors, no impact on assembly)
--   repeat-dove             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat          1018    2.44   138097.88 +- 37519.36     28652.41 +- 28424.31   (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont      485    1.16   117335.64 +- 35797.47                            (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove      164    0.39   162290.30 +- 51546.67                            (will end contigs, potential to misassemble)
--   uniq-anchor             3    0.01    91306.00 +- 36220.45     24663.67 +- 9528.51    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/ERROR RATES]
--  
--  ERROR RATES
--  -----------
--                                                   --------threshold------
--  46544                        fraction error      fraction        percent
--  samples                              (1e-5)         error          error
--                   --------------------------      --------       --------
--  command line (-eg)                           ->   8500.00        8.5000%
--  command line (-ef)                           ->  -----.--      ---.----%
--  command line (-eM)                           ->   8500.00        8.5000%
--  mean + std.dev      40.94 +-  12 *   359.06  ->   4349.69        4.3497%  (enabled)
--  median + mad         0.00 +-  12 *     0.00  ->      0.00        0.0000%
--  90th percentile                              ->      4.00        0.0040%
--  
--  BEST EDGE FILTERING
--  -------------------
--  At graph threshold 8.5000%, reads:
--    available to have edges:         5034
--    with at least one edge:          5002
--  
--  At max threshold 8.5000%, reads:  (not computed)
--    available to have edges:            0
--    with at least one edge:             0
--  
--  At tight threshold 0.0040%, reads with:
--    both edges below error threshold:      3190  (80.00% minReadsBest threshold = 4001)
--    one  edge  above error threshold:       857
--    both edges above error threshold:       955
--    at least one edge:                     5002
--  
--  At loose threshold 4.3497%, reads with:
--    both edges below error threshold:      4933  (80.00% minReadsBest threshold = 4001)
--    one  edge  above error threshold:        63
--    both edges above error threshold:         6
--    at least one edge:                     5002
--  
--  
--  INITIAL EDGES
--  -------- ----------------------------------------
--     36410 reads are contained
--     94731 reads have no best edges (singleton)
--        91 reads have only one best edge (spur) 
--                 70 are mutual best
--      4840 reads have two best edges 
--                110 have one mutual best edge
--               4695 have two mutual best edges
--  
--  
--  FINAL EDGES
--  -------- ----------------------------------------
--     36410 reads are contained
--     94786 reads have no best edges (singleton)
--        95 reads have only one best edge (spur) 
--                 83 are mutual best
--      4781 reads have two best edges 
--                 83 have one mutual best edge
--               4676 have two mutual best edges
--  
--  
--  EDGE FILTERING
--  -------- ------------------------------------------
--         0 reads are ignored
--       265 reads have a gap in overlap coverage
--        27 reads have lopsided best edges

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      42 sequences, total length 143364405 bp (including 10 repeats of total length 3621731 bp).
--   bubbles:      8 sequences, total length 2671930 bp.
--   unassembled:  625 sequences, total length 61476931 bp.
--
-- Contig sizes based on genome size 145mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10    30632353             1    30632353
--     20    30632353             1    30632353
--     30    26935139             2    57567492
--     40    26349955             3    83917447
--     50    26349955             3    83917447
--     60    21998810             4   105916257
--     70    21998810             4   105916257
--     80    20055920             5   125972177
--     90     1617177             7   131561269
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      42 sequences, total length 143304869 bp (including 10 repeats of total length 3615982 bp).
--   bubbles:      8 sequences, total length 2664384 bp.
--   unassembled:  625 sequences, total length 61476908 bp.
--
-- Contig sizes based on genome size 145mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10    30626116             1    30626116
--     20    30626116             1    30626116
--     30    26933743             2    57559859
--     40    26337489             3    83897348
--     50    26337489             3    83897348
--     60    21994750             4   105892098
--     70    21994750             4   105892098
--     80    20037162             5   125929260
--     90     1614787             7   131512927
--

(END)

skoren commented 11 months ago

Based on this report, the error rate of 0.085 is sufficient as it actually ends up using about 5% error. The contig sizes look close to chromosome arm lengths in this genome. I'm going to close this issue since you've been able to get an assembly which looks to be reasonable quality.

marbl / canu

Canu 2.2 stuck at trimming step #2275