marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
653 stars 179 forks source link

No corrected reads generated #2076

Closed Fika182 closed 2 years ago

Fika182 commented 2 years ago

Hi! I try to correct my nanopore raw data using Canu 2.2. The process was running on Ubuntu 20.04.2 LTS.

I use this command to execute : canu -correct -d run2 -p trial genomeSize=1k -nanopore-raw minion_fastq/pass/fastq_runid_8837c4037eaaa1531a8e3e52636c309148e6df3e_0_0.fastq

When I check on CONFIGURE CANU, it showed that local machine for read correction and consensus didn't use any of memory and CPU.

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '11.0.9.1-internal' (from '/home/medion/miniconda3/envs/canu/bin/java') without -d64 support.
-- Detected gnuplot version '5.4 patchlevel 1   ' (from 'gnuplot') and image format 'png'.
--
-- Detected 16 CPUs and 16 gigabytes of memory on the local machine.
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl      4.000 GB    4 CPUs x   4 jobs    16.000 GB  16 CPUs  (k-mer counting)
-- Local: hap        4.000 GB    4 CPUs x   4 jobs    16.000 GB  16 CPUs  (read-to-haplotype assignment)
-- Local: cormhap    6.000 GB   16 CPUs x   1 job      6.000 GB  16 CPUs  (overlap detection with mhap)
-- Local: obtovl     4.000 GB    8 CPUs x   2 jobs     8.000 GB  16 CPUs  (overlap detection)
-- Local: utgovl     4.000 GB    8 CPUs x   2 jobs     8.000 GB  16 CPUs  (overlap detection)
-- Local: cor        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x   4 jobs    16.000 GB   4 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x   2 jobs    16.000 GB   2 CPUs  (overlap store sorting)
-- Local: red        8.000 GB    4 CPUs x   2 jobs    16.000 GB   8 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x   2 jobs    16.000 GB   2 CPUs  (overlap error adjustment)
-- Local: bat       16.000 GB    4 CPUs x   1 job     16.000 GB   4 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- Found untrimmed raw Nanopore reads in the input files.

I don't have any idea of it. I would be glad if I can hear any solution from you. Thanks

skoren commented 2 years ago

Can't help without the output or error from the run. Post the full output from stdout/err output from the canu run.

Fika182 commented 2 years ago

I'm sorry that this is a new things for me. But here I enclose what happened next after.

-- Generating assembly 'trial' in '/mnt/d/tutorial_amplicon_analysis/run2':
--   genomeSize:
--     1000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.3200 ( 32.00%)
--     obtOvlErrorRate 0.1200 ( 12.00%)
--     utgOvlErrorRate 0.1200 ( 12.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.3000 ( 30.00%)
--     obtErrorRate    0.1200 ( 12.00%)
--     utgErrorRate    0.1200 ( 12.00%)
--     cnsErrorRate    0.2000 ( 20.00%)
--
--   Stages to run:
--     only correct raw reads.
--
--
-- BEGIN CORRECTION
----------------------------------------
-- Starting command on Sat Jan  8 09:22:31 2022 with 610.042 GB free disk space

    cd .
    ./trial.seqStore.sh \
    > ./trial.seqStore.err 2>&1

-- Finished on Sat Jan  8 09:22:32 2022 (one second) with 610.042 GB free disk space
----------------------------------------
--
-- In sequence store './trial.seqStore':
--   Found 11 reads.
--   Found 229954 bases (229.95 times coverage).
--    Histogram of raw reads:
--
--    G=229954                           sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        46185         0        46185  ||       2711-3580            1|---------------------------------------------------------------
--    00020        46185         0        46185  ||       3581-4450            1|---------------------------------------------------------------
--    00030        34794         1        80979  ||       4451-5320            0|
--    00040        33815         2       114794  ||       5321-6190            0|
--    00050        31522         3       146316  ||       6191-7060            0|
--    00060        31522         3       146316  ||       7061-7930            0|
--    00070        22274         4       168590  ||       7931-8800            0|
--    00080        19915         5       188505  ||       8801-9670            1|---------------------------------------------------------------
--    00090        11364         7       213958  ||       9671-10540           0|
--    00100         2711        10       229954  ||      10541-11410           1|---------------------------------------------------------------
--    001.000x                  11       229954  ||      11411-12280           0|
--                                               ||      12281-13150           0|
--                                               ||      13151-14020           0|
--                                               ||      14021-14890           1|---------------------------------------------------------------
--                                               ||      14891-15760           0|
--                                               ||      15761-16630           0|
--                                               ||      16631-17500           0|
--                                               ||      17501-18370           0|
--                                               ||      18371-19240           0|
--                                               ||      19241-20110           1|---------------------------------------------------------------
--                                               ||      20111-20980           0|
--                                               ||      20981-21850           0|
--                                               ||      21851-22720           1|---------------------------------------------------------------
--                                               ||      22721-23590           0|
--                                               ||      23591-24460           0|
--                                               ||      24461-25330           0|
--                                               ||      25331-26200           0|
--                                               ||      26201-27070           0|
--                                               ||      27071-27940           0|
--                                               ||      27941-28810           0|
--                                               ||      28811-29680           0|
--                                               ||      29681-30550           0|
--                                               ||      30551-31420           0|
--                                               ||      31421-32290           1|---------------------------------------------------------------
--                                               ||      32291-33160           0|
--                                               ||      33161-34030           1|---------------------------------------------------------------
--                                               ||      34031-34900           1|---------------------------------------------------------------
--                                               ||      34901-35770           0|
--                                               ||      35771-36640           0|
--                                               ||      36641-37510           0|
--                                               ||      37511-38380           0|
--                                               ||      38381-39250           0|
--                                               ||      39251-40120           0|
--                                               ||      40121-40990           0|
--                                               ||      40991-41860           0|
--                                               ||      41861-42730           0|
--                                               ||      42731-43600           0|
--                                               ||      43601-44470           0|
--                                               ||      44471-45340           0|
--                                               ||      45341-46210           1|---------------------------------------------------------------
--
----------------------------------------
-- Starting command on Sat Jan  8 09:22:32 2022 with 610.042 GB free disk space

    cd correction/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Sat Jan  8 09:22:33 2022 (one second) with 610.042 GB free disk space
----------------------------------------
--  segments   memory batches
--  -------- -------- -------
--        01  0.01 GB       2
--
--  For 11 reads with 229954 bases, limit to 1 batch.
--  Will count kmers using 01 jobs, each using 2 GB and 4 threads.
--
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sat Jan  8 09:22:33 2022 with 610.042 GB free disk space (1 processes; 4 concurrently)

    cd correction/0-mercounts
    ./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1

-- Finished on Sat Jan  8 09:22:34 2022 (one second) with 610.04 GB free disk space
----------------------------------------
--
-- Kmer counting (meryl-count) jobs failed, retry.
--   job trial.01.meryl FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sat Jan  8 09:22:34 2022 with 610.04 GB free disk space (1 processes; 4 concurrently)

    cd correction/0-mercounts
    ./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1

-- Finished on Sat Jan  8 09:22:35 2022 (one second) with 610.04 GB free disk space
----------------------------------------
-- Found 1 Kmer counting (meryl) outputs.
-- Finished stage 'cor-merylCountCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sat Jan  8 09:22:35 2022 with 610.04 GB free disk space (1 processes; 4 concurrently)

    cd correction/0-mercounts
    ./meryl-process.sh 1 > ./meryl-process.000001.out 2>&1

-- Finished on Sat Jan  8 09:22:36 2022 (one second) with 610.042 GB free disk space
----------------------------------------
-- Meryl finished successfully.  Kmer frequency histogram:
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2      1433 ********************************************************************** 0.6909 0.3971
--       3-     6       602 *****************************                                          0.8655 0.5475
--       7-    13        37 *                                                                      0.9846 0.7023
--      14-    23         1                                                                        0.9995 0.7448
--      24-    36         0                                                                        0.0000 0.0000
--      37-    52         0                                                                        0.0000 0.0000
--      53-    71         0                                                                        0.0000 0.0000
--      72-    93         0                                                                        0.0000 0.0000
--      94-   118         0                                                                        0.0000 0.0000
--     119-   146         0                                                                        0.0000 0.0000
--     147-   177         0                                                                        0.0000 0.0000
--     178-   211         0                                                                        0.0000 0.0000
--     212-   248         0                                                                        0.0000 0.0000
--     249-   288         0                                                                        0.0000 0.0000
--     289-   331         0                                                                        0.0000 0.0000
--     332-   377         0                                                                        0.0000 0.0000
--     378-   426         0                                                                        0.0000 0.0000
--     427-   478         0                                                                        0.0000 0.0000
--     479-   533         0                                                                        0.0000 0.0000
--     534-   591         0                                                                        0.0000 0.0000
--     592-   652         0                                                                        0.0000 0.0000
--     653-   716         0                                                                        0.0000 0.0000
--     717-   783         0                                                                        0.0000 0.0000
--     784-   853         0                                                                        0.0000 0.0000
--     854-   926         0                                                                        0.0000 0.0000
--     927-  1002         0                                                                        0.0000 0.0000
--    1003-  1081         0                                                                        0.0000 0.0000
--    1082-  1163         0                                                                        0.0000 0.0000
--    1164-  1248         0                                                                        0.0000 0.0000
--    1249-  1336         0                                                                        0.0000 0.0000
--    1337-  1427         0                                                                        0.0000 0.0000
--    1428-  1521         0                                                                        0.0000 0.0000
--    1522-  1618         0                                                                        0.0000 0.0000
--    1619-  1718         0                                                                        0.0000 0.0000
--    1719-  1821         0                                                                        0.0000 0.0000
--    1822-  1927         1                                                                        1.0000 1.0000
--
--           0 (max occurrences)
--        7218 (total mers, non-unique)
--        2074 (distinct mers, non-unique)
--           0 (unique mers)
-- Finished stage 'meryl-process', reset canuIteration.
--
-- Removing meryl database 'correction/0-mercounts/trial.ms16'.
--
-- OVERLAPPER (mhap) (correction)
--
-- Set corMhapSensitivity=low based on read coverage of 229.95.
--
-- PARAMETERS: hashes=256, minMatches=3, threshold=0.8
--
-- Given 5.4 GB, can fit 16200 reads per block.
-- For 2 blocks, set stride to 2 blocks.
-- Logging partitioning to 'correction/1-overlapper/partitioning.log'.
-- Configured 1 mhap precompute jobs.
-- Configured 1 mhap overlap jobs.
-- Finished stage 'cor-mhapConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Sat Jan  8 09:22:37 2022 with 610.042 GB free disk space (1 processes; 1 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1

-- Finished on Sat Jan  8 09:22:38 2022 (one second) with 610.042 GB free disk space
----------------------------------------
-- All 1 mhap precompute jobs finished successfully.
-- Finished stage 'cor-mhapPrecomputeCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Sat Jan  8 09:22:38 2022 with 610.042 GB free disk space (1 processes; 1 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 1 > ./mhap.000001.out 2>&1

-- Finished on Sat Jan  8 09:22:39 2022 (one second) with 610.042 GB free disk space
----------------------------------------
-- Found 1 mhap overlap output files.
-- Finished stage 'cor-mhapCheck', reset canuIteration.
----------------------------------------
-- Starting command on Sat Jan  8 09:22:39 2022 with 610.042 GB free disk space

    cd correction
    /home/Users/miniconda3/envs/canu/bin/ovStoreConfig \
     -S ../trial.seqStore \
     -M 4-8 \
     -L ./1-overlapper/ovljob.files \
     -create ./trial.ovlStore.config \
     > ./trial.ovlStore.config.txt \
    2> ./trial.ovlStore.config.err

-- Finished on Sat Jan  8 09:22:39 2022 (fast as lightning) with 610.042 GB free disk space
----------------------------------------
--
-- Creating overlap store correction/trial.ovlStore using:
--      1 bucket
--      1 slice
--        using at most 1 GB memory each
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovS' concurrent execution on Sat Jan  8 09:22:39 2022 with 610.042 GB free disk space (1 processes; 2 concurrently)

    cd correction
    ./trial.ovlStore.sh 1 > ./trial.ovlStore.000001.out 2>&1

-- Finished on Sat Jan  8 09:22:39 2022 (furiously fast) with 610.042 GB free disk space
----------------------------------------
-- Checking store.
----------------------------------------
-- Starting command on Sat Jan  8 09:22:40 2022 with 610.042 GB free disk space

    cd correction
    /home/Users/miniconda3/envs/canu/bin/ovStoreDump \
     -S ../trial.seqStore \
     -O  ./trial.ovlStore \
     -counts \
     > ./trial.ovlStore/counts.dat 2> ./trial.ovlStore/counts.err

-- Finished on Sat Jan  8 09:22:40 2022 (one second) with 610.042 GB free disk space
----------------------------------------
--
-- Overlap store 'correction/trial.ovlStore' successfully constructed.
-- Found 0 overlaps for 0 reads; 72 reads have no overlaps.
--
--
-- Purged 0 GB in 3 overlap output files.
-- Finished stage 'cor-createOverlapStore', reset canuIteration.
-- Set corMinCoverage=4 based on read coverage of 229.95.
-- Computing correction layouts.
--   Local  filter coverage   80
--   Global filter coverage   40
----------------------------------------
-- Starting command on Sat Jan  8 09:22:40 2022 with 610.042 GB free disk space

    cd correction
    /home/Users/miniconda3/envs/canu/bin/generateCorrectionLayouts \
      -S ../trial.seqStore \
      -O  ./trial.ovlStore \
      -C  ./trial.corStore.WORKING \
      -eC 80 \
      -xC 40 \
    > ./trial.corStore.err 2>&1

-- Finished on Sat Jan  8 09:22:40 2022 (like a bat out of hell) with 610.042 GB free disk space
----------------------------------------
-- Finished stage 'cor-buildCorrectionLayoutsConfigure', reset canuIteration.
-- Computing correction layouts.
----------------------------------------
-- Starting command on Sat Jan  8 09:22:40 2022 with 610.042 GB free disk space

    cd correction/2-correction
    /home/Users/miniconda3/envs/canu/bin/filterCorrectionLayouts \
      -S  ../../trial.seqStore \
      -C     ../trial.corStore \
      -R      ./trial.readsToCorrect.WORKING \
      -cc 4 \
      -cl 1000 \
      -g  1000 \
      -c  40 \
    > ./trial.readsToCorrect.err 2>&1

-- Finished on Sat Jan  8 09:22:40 2022 (in the blink of an eye) with 610.042 GB free disk space
----------------------------------------
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads                  0            72
--   Number of Bases                  0             0
--   Coverage                     0.000         0.000
--   Median                           0             0
--   Mean                             0             0
--   N50                              0             0
--   Minimum                          0             0
--   Maximum                          0             0
--
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads                  0              0             0              0             0
--   Number of Bases                  0              0             0              0             0
--   Coverage                     0.000          0.000         0.000          0.000         0.000
--   Median                           0              0             0              0             0
--   Mean                             0              0             0              0             0
--   N50                              0              0             0              0             0
--   Minimum                          0              0             0              0             0
--   Maximum                          0              0             0              0             0
--
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads                 72            72
--   Number of Bases                  0             0
--   Coverage                     0.000         0.000
--   Median                           0             0
--   Mean                             0             0
--   N50                              0             0
--   Minimum                          0             0
--   Maximum                          0             0
--
--   Maximum Memory                   0
-- Finished stage 'cor-filterCorrectionLayouts', reset canuIteration.
--
-- Correction jobs estimated to need at most 0 GB for computation.
-- Correction jobs will request 6 GB each.
--
-- Local: cor        6.000 GB    4 CPUs x   2 jobs    12.000 GB   8 CPUs  (read correction)
--
--
-- Configuring correction jobs:
--   Reads estimated to need at most 0 GB for computation.
--   Jobs will request 6 GB each.
----------------------------------------
-- Starting command on Sat Jan  8 09:22:40 2022 with 610.042 GB free disk space

    cd correction/2-correction
    ./correctReadsPartition.sh \
    > ./correctReadsPartition.err 2>&1

-- Finished on Sat Jan  8 09:22:40 2022 (fast as lightning) with 610.042 GB free disk space
----------------------------------------
-- Finished stage 'cor-generateCorrectedReadsConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cor' concurrent execution on Sat Jan  8 09:22:40 2022 with 610.042 GB free disk space (1 processes; 2 concurrently)

    cd correction/2-correction
    ./correctReads.sh 1 > ./correctReads.000001.out 2>&1

-- Finished on Sat Jan  8 09:22:41 2022 (one second) with 610.042 GB free disk space
----------------------------------------
-- Found 1 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
-- Found 1 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
--
-- Loading corrected reads into corStore and seqStore.
----------------------------------------
-- Starting command on Sat Jan  8 09:22:41 2022 with 610.042 GB free disk space

    cd correction
    /home/Users/miniconda3/envs/canu/bin/loadCorrectedReads \
      -S ../trial.seqStore \
      -C ./trial.corStore \
      -L ./2-correction/corjob.files \
    >  ./trial.loadCorrectedReads.log \
    2> ./trial.loadCorrectedReads.err

-- Finished on Sat Jan  8 09:22:41 2022 (in the blink of an eye) with 610.042 GB free disk space
----------------------------------------
--
-- No corrected reads generated; correctReads output saved.
--
-- Purging overlaps used for correction.
-- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
----------------------------------------
-- Starting command on Sat Jan  8 09:22:41 2022 with 610.042 GB free disk space

    cd .
    /home/Users/miniconda3/envs/canu/bin/sqStoreDumpFASTQ \
      -corrected \
      -S ./trial.seqStore \
      -o ./trial.correctedReads.gz \
      -fasta \
      -nolibname \
    > trial.correctedReads.fasta.err 2>&1

-- Finished on Sat Jan  8 09:22:41 2022 (furiously fast) with 610.042 GB free disk space
----------------------------------------
--
-- Corrected reads saved in 'trial.correctedReads.fasta.gz'.
-- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
--
-- Trimming skipped; not enabled.
--
-- Unitigging skipped; not enabled.
--
-- Bye.

BTW thanks for such a responsive response

skoren commented 2 years ago

It looks like there are no overlaps in your dataset. Given the genome size you've specified of 1k and your reads are all 3kb+, they almost all get filtered for too much coverage. Is the genome really 1kb here?

Fika182 commented 2 years ago

The best guess for my dataset is around 1kb. I have tried run with genomeSize=3k mode and the result is still the same as before (no corrected reads generated). Do I have to do something with filter mode?

skoren commented 2 years ago

I don't think either 1 or 3kb can be accurate given your data. The reads left in the input are between 3 and 46kb which would imply your genome is read >10 times by a single read which doesn't make sense. If the target is really 3kb then there must be other contamination in the data. I'd set the genome size much larger, maybe 3-4mb to keep more of your reads. See also the FAQ at https://canu.readthedocs.io/en/latest/faq.html#my-asm-contigs-fasta-is-empty-why, your target may end up in the unassembled sequences, depending on its coverage.

Fika182 commented 2 years ago

Oh okay, thanks for your suggestion.

I also have another problem when I do assembly data. Should I post here or in a new issue?

skoren commented 2 years ago

If it's the same data and run, just with a changed genome size you can post it here.

Fika182 commented 2 years ago

I use data that corrected-approved and it works! I agree with your statement that something wrong with the data so the app cannot process it. Thank you so much with your help

Fika182 commented 2 years ago

Dear @skoren,

Do you have any tips for avoiding this kind of problem? I keep facing this even I have followed your suggestion to set larger genome size's number and corOutCoverage as declared on page https://canu.readthedocs.io/en/latest/faq.html#my-asm-contigs-fasta-is-empty-why

Thanks for your advice

skoren commented 2 years ago

The issue before was there was no overlaps and all the reads were larger then the genome size. Is it still the same? Post the log file of the run and the parameters you're using.

Fika182 commented 2 years ago

[Uploading Result.zip…]()

Here I attached the zip of result that I could obtain. I run this experiment with this command :

canu -correct -p test -d try_2 genomeSize=0.2m -nanopore-raw readSamplingCoverage=100 ../pass/*.fastq

FYI : I try to analyze 1200 bp of amplicon

skoren commented 2 years ago

The results.zip link doesn't seem to work. Have you filtered reads > your amplicon before launching Canu? If you haven't, you likely have the same issue as before that just a few of those reads provide 100x+ coverage. I'd filter any read over 1.5kb and/or increase the genome size significantly, like 5mb or 10mb.