ONT targeted long-read sequencing trimmed data error correction issue

swcho-HYBigLab commented 4 months ago

Hello, I am a BI graduate student in South Korea dealing with ONT long-read sequencing data for the first time. I am working on a project to align and analyze targeted long-read sqeuencing data, and plan to use CANU to error correct trimmed fastq.

Actually, there was no major problem with running progress, but the data scale of corrected reads fasta changed significantly. (trimmed 32,365,275 reads -> corrected 889,131 reads) 화면 캡처 2024-07-12 164934

Compared to existing trimmed data, the number of reads of data that underwent correction was greatly reduced. I want to solve the above situation and to get as much as reads possible.

Targeted long-read sequencing data was sequenced using a 4.1Mb panel, and the code used is as follows:

## Fasta error correction
canu -correct \
    -p ${PREFIX} \
    -d ${WORKING_DIR}/canu_results_trimmed_Q7_4.1m \
    genomeSize=4.1m \
    minReadLength=200 \
    minOverlapLength=100 \
    corMinCoverage=0 \
    corOutCoverage=all \
    MhapSensitivity=low \
    rawErrorRate=0.500 \
    correctedErrorRate=0.12 \
    -nanopore ${SAMPLE} \
    maxThreads=40 \
    minMemory=100 \
    maxMemory=200 \
    useGrid=false

Thank you

skoren commented 4 months ago

What's the version of canu you're using? Can you post the full log (the report) from the run?

swcho-HYBigLab commented 4 months ago

My version is 2.2 and this is my log

2024-07-11 14:59:40 - Starting Error correction
-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   ?o?ic M, ?ikic M.
--   Edlib: a C/C?++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '13.0.1' (from '/share/apps/programs/java/jdk-13.0.1/bin/java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
--
-- Detected 40 CPUs and 661 gigabytes of memory on the local machine.
--
-- Detected PBSPro '14.1.0' with 'pbsnodes' binary in /opt/pbs/bin/pbsnodes.
--          PBSPro disabled by useGrid=false
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
-- Job limits:
--     40 gigabytes memory  (maxMemory option).
--     40 CPUs              (maxThreads option).
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl     20.000 GB    4 CPUs x   2 jobs    40.000 GB   8 CPUs  (k-mer counting)
-- Local: hap       20.000 GB    4 CPUs x   2 jobs    40.000 GB   8 CPUs  (read-to-haplotype assignment)
-- Local: cormhap   20.000 GB   16 CPUs x   2 jobs    40.000 GB  32 CPUs  (overlap detection with mhap)
-- Local: obtovl    20.000 GB    8 CPUs x   2 jobs    40.000 GB  16 CPUs  (overlap detection)
-- Local: utgovl    20.000 GB    8 CPUs x   2 jobs    40.000 GB  16 CPUs  (overlap detection)
-- Local: cor       20.000 GB    4 CPUs x   2 jobs    40.000 GB   8 CPUs  (read correction)
-- Local: ovb       20.000 GB    1 CPU  x   2 jobs    40.000 GB   2 CPUs  (overlap store bucketizer)
-- Local: ovs       20.000 GB    1 CPU  x   2 jobs    40.000 GB   2 CPUs  (overlap store sorting)
-- Local: red       20.000 GB    4 CPUs x   2 jobs    40.000 GB   8 CPUs  (read error detection)
-- Local: oea       20.000 GB    1 CPU  x   2 jobs    40.000 GB   2 CPUs  (overlap error adjustment)
-- Local: bat       20.000 GB    4 CPUs x   1 job     20.000 GB   4 CPUs  (contig construction with bogart)
-- Local: cns       20.000 GB    4 CPUs x   2 jobs    40.000 GB   8 CPUs  (consensus)
--
-- Found Nanopore reads in 'S23-10097_targeted_long_read_ERBB2.trimmed_Q7.seqStore':
--   Libraries:
--     Nanopore:              1
--   Reads:
--     Raw:                   820000922
--
--
-- Generating assembly 'S23-10097_targeted_long_read_ERBB2.trimmed_Q7' in '/lustre/export/home/swcho/1_ecDNA/3_data_for_WGS/nanopore_targeted/canu_results_trimmed_Q7':
--   genomeSize:
--     4100000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.5000 ( 50.00%)
--     obtOvlErrorRate 0.1200 ( 12.00%)
--     utgOvlErrorRate 0.1200 ( 12.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.5000 ( 50.00%)
--     obtErrorRate    0.1200 ( 12.00%)
--     utgErrorRate    0.1200 ( 12.00%)
--     cnsErrorRate    0.1200 ( 12.00%)
--
--   Stages to run:
--     only correct raw reads.
--
--
-- BEGIN CORRECTION
--
-- Correction jobs estimated to need at most 1.659 GB for computation.
-- Correction jobs will request 20 GB each.
--
-- Local: cor       20.000 GB    4 CPUs x   2 jobs    40.000 GB   8 CPUs  (read correction)
--
--
-- Configuring correction jobs:
--   Reads estimated to need at most 1.659 GB for computation.
--   Jobs will request 20 GB each.
----------------------------------------
-- Starting command on Thu Jul 11 14:59:40 2024 with 182482.537 GB free disk space

    cd correction/2-correction
    ./correctReadsPartition.sh \
    > ./correctReadsPartition.err 2>&1

-- Finished on Thu Jul 11 15:01:21 2024 (101 seconds) with 182451.866 GB free disk space
----------------------------------------
-- Finished stage 'cor-generateCorrectedReadsConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cor' concurrent execution on Thu Jul 11 15:01:21 2024 with 182451.866 GB free disk space (64 processes; 2 concurrently)

    cd correction/2-correction
    ./correctReads.sh 1 > ./correctReads.000001.out 2>&1
    ./correctReads.sh 2 > ./correctReads.000002.out 2>&1
    ./correctReads.sh 3 > ./correctReads.000003.out 2>&1
    ./correctReads.sh 4 > ./correctReads.000004.out 2>&1
    ./correctReads.sh 5 > ./correctReads.000005.out 2>&1
    ./correctReads.sh 6 > ./correctReads.000006.out 2>&1
    ./correctReads.sh 7 > ./correctReads.000007.out 2>&1
    ./correctReads.sh 8 > ./correctReads.000008.out 2>&1
    ./correctReads.sh 9 > ./correctReads.000009.out 2>&1
    ./correctReads.sh 10 > ./correctReads.000010.out 2>&1
    ./correctReads.sh 11 > ./correctReads.000011.out 2>&1
    ./correctReads.sh 12 > ./correctReads.000012.out 2>&1
    ./correctReads.sh 13 > ./correctReads.000013.out 2>&1
    ./correctReads.sh 14 > ./correctReads.000014.out 2>&1
    ./correctReads.sh 15 > ./correctReads.000015.out 2>&1
    ./correctReads.sh 16 > ./correctReads.000016.out 2>&1
    ./correctReads.sh 17 > ./correctReads.000017.out 2>&1
    ./correctReads.sh 18 > ./correctReads.000018.out 2>&1
    ./correctReads.sh 19 > ./correctReads.000019.out 2>&1
    ./correctReads.sh 20 > ./correctReads.000020.out 2>&1
    ./correctReads.sh 21 > ./correctReads.000021.out 2>&1
    ./correctReads.sh 22 > ./correctReads.000022.out 2>&1
    ./correctReads.sh 23 > ./correctReads.000023.out 2>&1
    ./correctReads.sh 24 > ./correctReads.000024.out 2>&1
    ./correctReads.sh 25 > ./correctReads.000025.out 2>&1
    ./correctReads.sh 26 > ./correctReads.000026.out 2>&1
    ./correctReads.sh 27 > ./correctReads.000027.out 2>&1
    ./correctReads.sh 28 > ./correctReads.000028.out 2>&1
    ./correctReads.sh 29 > ./correctReads.000029.out 2>&1
    ./correctReads.sh 30 > ./correctReads.000030.out 2>&1
    ./correctReads.sh 31 > ./correctReads.000031.out 2>&1
    ./correctReads.sh 32 > ./correctReads.000032.out 2>&1
    ./correctReads.sh 33 > ./correctReads.000033.out 2>&1
    ./correctReads.sh 34 > ./correctReads.000034.out 2>&1
    ./correctReads.sh 35 > ./correctReads.000035.out 2>&1
    ./correctReads.sh 36 > ./correctReads.000036.out 2>&1
    ./correctReads.sh 37 > ./correctReads.000037.out 2>&1
    ./correctReads.sh 38 > ./correctReads.000038.out 2>&1
    ./correctReads.sh 39 > ./correctReads.000039.out 2>&1
    ./correctReads.sh 40 > ./correctReads.000040.out 2>&1
    ./correctReads.sh 41 > ./correctReads.000041.out 2>&1
    ./correctReads.sh 42 > ./correctReads.000042.out 2>&1
    ./correctReads.sh 43 > ./correctReads.000043.out 2>&1
    ./correctReads.sh 44 > ./correctReads.000044.out 2>&1
    ./correctReads.sh 45 > ./correctReads.000045.out 2>&1
    ./correctReads.sh 46 > ./correctReads.000046.out 2>&1
    ./correctReads.sh 47 > ./correctReads.000047.out 2>&1
    ./correctReads.sh 48 > ./correctReads.000048.out 2>&1
    ./correctReads.sh 49 > ./correctReads.000049.out 2>&1
    ./correctReads.sh 50 > ./correctReads.000050.out 2>&1
    ./correctReads.sh 51 > ./correctReads.000051.out 2>&1
    ./correctReads.sh 52 > ./correctReads.000052.out 2>&1
    ./correctReads.sh 53 > ./correctReads.000053.out 2>&1
    ./correctReads.sh 54 > ./correctReads.000054.out 2>&1
    ./correctReads.sh 55 > ./correctReads.000055.out 2>&1
    ./correctReads.sh 56 > ./correctReads.000056.out 2>&1
    ./correctReads.sh 57 > ./correctReads.000057.out 2>&1
    ./correctReads.sh 58 > ./correctReads.000058.out 2>&1
    ./correctReads.sh 59 > ./correctReads.000059.out 2>&1
    ./correctReads.sh 60 > ./correctReads.000060.out 2>&1
    ./correctReads.sh 61 > ./correctReads.000061.out 2>&1
    ./correctReads.sh 62 > ./correctReads.000062.out 2>&1
    ./correctReads.sh 63 > ./correctReads.000063.out 2>&1
    ./correctReads.sh 64 > ./correctReads.000064.out 2>&1

-- Finished on Fri Jul 12 07:30:39 2024 (59358 seconds, at least I didn't crash) with 175671.08 GB free disk space
----------------------------------------
-- Found 64 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
-- Found 64 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
--
-- Loading corrected reads into corStore and seqStore.
----------------------------------------
-- Starting command on Fri Jul 12 07:30:39 2024 with 175671.08 GB free disk space

    cd correction
    /lustre/export/home/swcho/4_DNA_analysis_practice/tools/canu-2.2/bin/loadCorrectedReads \
      -S ../S23-10097_targeted_long_read_ERBB2.trimmed_Q7.seqStore \
      -C ./S23-10097_targeted_long_read_ERBB2.trimmed_Q7.corStore \
      -L ./2-correction/corjob.files \
    >  ./S23-10097_targeted_long_read_ERBB2.trimmed_Q7.loadCorrectedReads.log \
    2> ./S23-10097_targeted_long_read_ERBB2.trimmed_Q7.loadCorrectedReads.err

-- Finished on Fri Jul 12 07:34:07 2024 (208 seconds) with 175656.176 GB free disk space
----------------------------------------
--
-- In sequence store './S23-10097_targeted_long_read_ERBB2.trimmed_Q7.seqStore':
--   Found 889131 reads.
--   Found 694634880 bases (169.42 times coverage).
--    Histogram of corrected reads:
--    
--    G=694634880                        sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010         2873     20246     69465471  ||        200-1261       797156|---------------------------------------------------------------
--    00020         1383     63135    138928297  ||       1262-2323        70311|------
--    00030         1183    117776    208391117  ||       2324-3385         2984|-
--    00040         1041    180390    277854107  ||       3386-4447        18597|--
--    00050          920    251393    347318104  ||       4448-5509           57|-
--    00060          801    332206    416781093  ||       5510-6571            5|-
--    00070          680    426248    486244899  ||       6572-7633           15|-
--    00080          568    538095    555708422  ||       7634-8695            1|-
--    00090          435    676722    625171663  ||       8696-9757            1|-
--    00100          200    889130    694634880  ||       9758-10819           1|-
--    001.000x              889131    694634880  ||      10820-11881           0|
--                                               ||      11882-12943           0|
--                                               ||      12944-14005           0|
--                                               ||      14006-15067           0|
--                                               ||      15068-16129           0|
--                                               ||      16130-17191           0|
--                                               ||      17192-18253           0|
--                                               ||      18254-19315           0|
--                                               ||      19316-20377           0|
--                                               ||      20378-21439           0|
--                                               ||      21440-22501           0|
--                                               ||      22502-23563           0|
--                                               ||      23564-24625           0|
--                                               ||      24626-25687           0|
--                                               ||      25688-26749           0|
--                                               ||      26750-27811           1|-
--                                               ||      27812-28873           0|
--                                               ||      28874-29935           0|
--                                               ||      29936-30997           0|
--                                               ||      30998-32059           0|
--                                               ||      32060-33121           0|
--                                               ||      33122-34183           0|
--                                               ||      34184-35245           1|-
--                                               ||      35246-36307           0|
--                                               ||      36308-37369           0|
--                                               ||      37370-38431           0|
--                                               ||      38432-39493           0|
--                                               ||      39494-40555           0|
--                                               ||      40556-41617           0|
--                                               ||      41618-42679           0|
--                                               ||      42680-43741           0|
--                                               ||      43742-44803           0|
--                                               ||      44804-45865           0|
--                                               ||      45866-46927           0|
--                                               ||      46928-47989           0|
--                                               ||      47990-49051           0|
--                                               ||      49052-50113           0|
--                                               ||      50114-51175           0|
--                                               ||      47990-49051           0|
--                                               ||      49052-50113           0|
--                                               ||      50114-51175           0|
--                                               ||      51176-52237           0|
--                                               ||      52238-53299           1|-
--    
--
-- Purging correctReads output after loading into stores.
-- Purged 64 .cns outputs.
-- Purged 128 .out job log outputs.
--
-- No corrected reads generated, overlaps used for correction saved.
-- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
----------------------------------------
-- Starting command on Fri Jul 12 07:34:32 2024 with 175704.653 GB free disk space

    cd .
    /lustre/export/home/swcho/4_DNA_analysis_practice/tools/canu-2.2/bin/sqStoreDumpFASTQ \
      -corrected \
      -S ./S23-10097_targeted_long_read_ERBB2.trimmed_Q7.seqStore \
      -o ./S23-10097_targeted_long_read_ERBB2.trimmed_Q7.correctedReads.gz \
      -fasta \
      -nolibname \
    > S23-10097_targeted_long_read_ERBB2.trimmed_Q7.correctedReads.fasta.err 2>&1

-- Finished on Fri Jul 12 07:34:59 2024 (27 seconds) with 175724.334 GB free disk space
----------------------------------------
--
-- Corrected reads saved in 'S23-10097_targeted_long_read_ERBB2.trimmed_Q7.correctedReads.fasta.gz'.
-- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
--
-- Trimming skipped; not enabled.
--
-- Unitigging skipped; not enabled.
--
-- Bye.

skoren commented 4 months ago

That looks like a restart run so it doesn't have all the info on the previous steps, there should be a report file that has the full information, can you share that? The reads are pretty short so you probably don't want to reduce the mhap sensitivity, I think you want MhapSensitivity=high.

swcho-HYBigLab commented 4 months ago

Sorry, I have only this prefix.report in may canu_results folder.


[CORRECTION/READS]
--
-- In sequence store './S23-10097_targeted_long_read_ERBB2.trimmed_Q7.seqStore':
--   Found 921717 reads.
--   Found 820000922 bases (200 times coverage).
--    Histogram of raw reads:
--    
--    G=820000922                        sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010         2008     25667     82000684  ||        200-2994       901655|---------------------------------------------------------------
--    00020         1411     77345    164000530  ||       2995-5789        20037|--
--    00030         1232    139864    246000829  ||       5790-8584           19|-
--    00040         1108    210150    328001135  ||       8585-11379           2|-
--    00050          999    288134    410000887  ||      11380-14174           0|
--    00060          900    374599    492000658  ||      14175-16969           0|
--    00070          794    471471    574001082  ||      16970-19764           0|
--    00080          676    583092    656000937  ||      19765-22559           0|
--    00090          536    718330    738001185  ||      22560-25354           0|
--    00100          200    921716    820000922  ||      25355-28149           1|-
--    001.000x              921717    820000922  ||      28150-30944           0|
--                                               ||      30945-33739           0|
--                                               ||      33740-36534           1|-
--                                               ||      36535-39329           0|
--                                               ||      39330-42124           0|
--                                               ||      42125-44919           0|
--                                               ||      44920-47714           0|
--                                               ||      47715-50509           0|
--                                               ||      50510-53304           0|
--                                               ||      53305-56099           1|-
--                                               ||      56100-58894           0|
--                                               ||      58895-61689           0|
--                                               ||      61690-64484           0|
--                                               ||      64485-67279           0|
--                                               ||      67280-70074           0|
--                                               ||      70075-72869           0|
--                                               ||      72870-75664           0|
--                                               ||      75665-78459           0|
--                                               ||      78460-81254           0|
--                                               ||      81255-84049           0|
--                                               ||      84050-86844           0|
--                                               ||      86845-89639           0|
--                                               ||      89640-92434           0|
--                                               ||      92435-95229           0|
--                                               ||      95230-98024           0|
--                                               ||      98025-100819          0|
--                                               ||     100820-103614          0|
--                                               ||     103615-106409          0|
--                                               ||     106410-109204          0|
--                                               ||     109205-111999          0|
--                                               ||     112000-114794          0|
--                                               ||     114795-117589          0|
--                                               ||     117590-120384          0|
--                                               ||     120385-123179          0|
--                                               ||     123180-125974          0|
--                                               ||     125975-128769          0|
--                                               ||     128770-131564          0|
--                                               ||     131565-134359          0|
--                                               ||     134360-137154          0|
--                                               ||     137155-139949          1|-
--

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2  32955094 ********************************************************************** 0.5035 0.0986
--       3-     4  18758270 ***************************************                                0.6980 0.1557
--       5-     7   6386875 *************                                                          0.8394 0.2158
--       8-    11   2408429 *****                                                                  0.9010 0.2564
--      12-    16   1242505 **                                                                     0.9296 0.2850
--      17-    22    796618 *                                                                      0.9460 0.3088
--      23-    29    577780 *                                                                      0.9572 0.3309
--      30-    37    443972                                                                        0.9655 0.3527
--      38-    46    354538                                                                        0.9720 0.3745
--      47-    56    279734                                                                        0.9772 0.3963
--      57-    67    225712                                                                        0.9813 0.4174
--      68-    79    177945                                                                        0.9847 0.4380
--      80-    92    142449                                                                        0.9873 0.4573
--      93-   106    113894                                                                        0.9895 0.4753
--     107-   121     95023                                                                        0.9912 0.4920
--     122-   137     79417                                                                        0.9926 0.5080
--     138-   154     64942                                                                        0.9938 0.5233
--     155-   172     51230                                                                        0.9948 0.5373
--     173-   191     41661                                                                        0.9955 0.5496
--     192-   211     34539                                                                        0.9962 0.5609
--     212-   232     28216                                                                        0.9967 0.5712
--     233-   254     22850                                                                        0.9971 0.5804
--     255-   277     18388                                                                        0.9975 0.5887
--     278-   301     15720                                                                        0.9977 0.5959
--     302-   326     13521                                                                        0.9980 0.6027
--     327-   352     11734                                                                        0.9982 0.6090
--     353-   379      9174                                                                        0.9984 0.6149
--     380-   407      7964                                                                        0.9985 0.6199
--     408-   436      6851                                                                        0.9986 0.6246
--     437-   466      5915                                                                        0.9987 0.6289
--     467-   497      5382                                                                        0.9988 0.6329
--     498-   529      4683                                                                        0.9989 0.6367
--     530-   562      4214                                                                        0.9990 0.6403
--     563-   596      3874                                                                        0.9990 0.6437
--     597-   631      3439                                                                        0.9991 0.6471
--     632-   667      3043                                                                        0.9991 0.6502
--     668-   704      2824                                                                        0.9992 0.6532
--     705-   742      2492                                                                        0.9992 0.6561
--     743-   781      2298                                                                        0.9993 0.6588
--     782-   821      2121                                                                        0.9993 0.6614
--
--           0 (max occurrences)
--   668734055 (total mers, non-unique)
--    65449213 (distinct mers, non-unique)
--           0 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads             900598      31168169
--   Number of Bases          807155152       2035357
--   Coverage                   196.867         0.496
--   Median                         812             0
--   Mean                           896             0
--   N50                           1004           678
--   Minimum                        200             0
--   Maximum                     139903          2054
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads             904089         900597        900597              0             0
--   Number of Bases          809155029      807015249     783595700              0             0
--   Coverage                   197.355        196.833       191.121          0.000         0.000
--   Median                         811            812           789              0             0
--   Mean                           894            896           870              0             0
--   N50                           1003           1004           993              0             0
--   Minimum                        200            200             4              0             0
--   Maximum                     139903          53354         53353              0             0
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads           31168170      31168170
--   Number of Bases            2175260        133073
--   Coverage                     0.531         0.032
--   Median                           0             0
--   Mean                             0             0
--   N50                            716             0
--   Minimum                          0             0
--   Maximum                     139903        133073
--   
--   Maximum Memory          1782103604

[TRIMMING/READS]
--
-- In sequence store './S23-10097_targeted_long_read_ERBB2.trimmed_Q7.seqStore':
--   Found 889131 reads.
--   Found 694634880 bases (169.42 times coverage).
--    Histogram of corrected reads:
--    
--    G=694634880                        sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010         2873     20246     69465471  ||        200-1261       797156|---------------------------------------------------------------
--    00020         1383     63135    138928297  ||       1262-2323        70311|------
--    00030         1183    117776    208391117  ||       2324-3385         2984|-
--    00040         1041    180390    277854107  ||       3386-4447        18597|--
--    00050          920    251393    347318104  ||       4448-5509           57|-
--    00060          801    332206    416781093  ||       5510-6571            5|-
--    00070          680    426248    486244899  ||       6572-7633           15|-
--    00080          568    538095    555708422  ||       7634-8695            1|-
--    00090          435    676722    625171663  ||       8696-9757            1|-
--    00100          200    889130    694634880  ||       9758-10819           1|-
--    001.000x              889131    694634880  ||      10820-11881           0|
--                                               ||      11882-12943           0|
--                                               ||      12944-14005           0|
--                                               ||      14006-15067           0|
--                                               ||      15068-16129           0|
--                                               ||      16130-17191           0|
--                                               ||      17192-18253           0|
--                                               ||      18254-19315           0|
--                                               ||      19316-20377           0|
--                                               ||      20378-21439           0|
--                                               ||      21440-22501           0|
--                                               ||      22502-23563           0|
--                                               ||      23564-24625           0|
--                                               ||      24626-25687           0|
--                                               ||      25688-26749           0|
--                                               ||      26750-27811           1|-
--                                               ||      27812-28873           0|
--                                               ||      28874-29935           0|
--                                               ||      29936-30997           0|
--                                               ||      30998-32059           0|
--                                               ||      32060-33121           0|
--                                               ||      33122-34183           0|
--                                               ||      34184-35245           1|-
--                                               ||      35246-36307           0|
--                                               ||      36308-37369           0|
--                                               ||      37370-38431           0|
--                                               ||      38432-39493           0|
--                                               ||      39494-40555           0|
--                                               ||      40556-41617           0|
--                                               ||      41618-42679           0|
--                                               ||      42680-43741           0|
--                                               ||      43742-44803           0|
--                                               ||      44804-45865           0|
--                                               ||      45866-46927           0|
--                                               ||      46928-47989           0|
--                                               ||      47990-49051           0|
--                                               ||      49052-50113           0|
--                                               ||      50114-51175           0|
--                                               ||      51176-52237           0|
--                                               ||      52238-53299           1|-
--

skoren commented 4 months ago

Ah you're hitting the max input coverage, which defaults to 200x so the 170x of corrected data makes sense given that. Add maxInputCoverage=10000 (I don't recall if it also supports all or not you could try it). The report should update to show a much higher coverage than 200x (the first read length report).

swcho-HYBigLab commented 4 months ago

Thank you! So, you mean that there's no problem with genomeSize=4.1m? I though it was a matter with read count calling, since the documentation said it affects corOutCoverage and MhapSensitivity.

As you mentioned, I will run it by adjusting MhapSensitivity=high and maxInputCoverage=10000.

Thanks again.

skoren commented 4 months ago

The genome size doesn't matter much, it affects those two settings plus maxInputCoverage but not the correction itself. So if you explicitly set the sensitivity and coverages, then the genome size essentially doesn't matter.

swcho-HYBigLab commented 4 months ago

Thank you for your comment! It was great help for my research

marbl / canu

ONT targeted long-read sequencing trimmed data error correction issue #2328