marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Running out of disk space when assembling short sequence #1569

Closed HSA191109 closed 4 years ago

HSA191109 commented 4 years ago

Hi, These are our first experiments with Nanopore sequencing and we are just practicing with different types of samples from our department. I tried to assemble a PCR-generated 8 kb fragment (prepared with the LSK-109-Kit). According to FastQC the data contain over-represented sequences. Canu-v1.9 ran out of disk space in the correction step with the default canu --correct command, so I added purgeOverlaps:=aggressive and got the same error message. Same outcome with the parameter set given at https://canu.readthedocs.io/en/latest/faq.html#my-assembly-is-running-out-of-space-is-too-slow combined with purgeOverlaps:=aggressive. Same outcome with restriction to maxMemory 200 GB and maxThreads 50 combined with purgeOverlaps:=aggressive. Is there any special parameter set for assembling a short sequence with Canu or should I use another program in this case? Similar to the ONT Lambda-experiment I set the "genome size" to 4m. Thank you very much! Best regards Katta

canu -correct -d /home/kabru/data/191026_TCL2MMT2/191026_TCL2MMT2_CanuCorrect -p  191026_TCL2MMT2_CanuAsm_1_4m genomeSize=4000000 purgeOverlaps=aggressive useGrid=false maxThreads=50 maxMemory=200 -nanopore-raw /home/kabru/data/191026_TCL2MMT2_PassMerged.fastq
-- Canu 1.9
--

--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_232' (from 'java') with -d64 support.
-- Detected gnuplot version '4.6 patchlevel 2   ' (from 'gnuplot') and image format 'png'.
-- Detected 72 CPUs and 376 gigabytes of memory.
-- Limited to 200 gigabytes from maxMemory option.
-- Limited to 50 CPUs from maxThreads option.
-- No grid engine detected, grid disabled.
----                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl      8 GB    2 CPUs x  25 jobs   200 GB   50 CPUs  (k-mer counting)
-- Local: hap        8 GB    2 CPUs x  25 jobs   200 GB   50 CPUs  (read-to-haplotype assignment)
-- Local: cormhap    6 GB   10 CPUs x   5 jobs    30 GB   50 CPUs  (overlap detection with mhap)
-- Local: obtovl     4 GB    5 CPUs x  10 jobs    40 GB   50 CPUs  (overlap detection)
-- Local: utgovl     4 GB    5 CPUs x  10 jobs    40 GB   50 CPUs  (overlap detection)
-- Local: cor        8 GB    4 CPUs x  12 jobs    96 GB   48 CPUs  (read correction)
-- Local: ovb        4 GB    1 CPU  x  50 jobs   200 GB   50 CPUs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x  25 jobs   200 GB   25 CPUs  (overlap store sorting)
-- Local: red        8 GB    2 CPUs x  25 jobs   200 GB   50 CPUs  (read error detection)
-- Local: oea        8 GB    1 CPU  x  25 jobs   200 GB   25 CPUs  (overlap error adjustment)
-- Local: bat       16 GB    4 CPUs x   1 job     16 GB    4 CPUs  (contig construction with bogart)
-- Local: cns      --- GB    4 CPUs x --- jobs   --- GB  --- CPUs  (consensus)
-- Local: gfa       16 GB    4 CPUs x   1 job     16 GB    4 CPUs  (GFA alignment and processing)
--
-- Found Nanopore uncorrected reads in the input files.
--
-- Generating assembly '191026_TCL2MMT2_CanuAsm_1_4m' in '/home/kabru/data/191026_TCL2MMT2/191026_TCL2MMT2_CanuCorrect'
--
-- Parameters:
--
--  genomeSize        4000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Tue Nov 19 22:15:30 2019 with 738.666 GB free disk space

    cd .
    /home/kabru/canu-1.9/Linux-amd64/bin/sqStoreCreate \
      -o ./191026_TCL2MMT2_CanuAsm_1_4m.seqStore.BUILDING \
      -minlength 1000 \
      ./191026_TCL2MMT2_CanuAsm_1_4m.seqStore.ssi \
    > ./191026_TCL2MMT2_CanuAsm_1_4m.seqStore.BUILDING.err 2>&1

-- Finished on Tue Nov 19 22:15:44 2019 (14 seconds) with 738.132 GB free disk space
----------------------------------------
--
-- In sequence store './191026_TCL2MMT2_CanuAsm_1_4m.seqStore':
--   Found 373689 reads.
--   Found 1341523983 bases (335.38 times coverage).
--
--   Read length histogram (one '*' equals 1671.7 reads):
--     1000   1999 117019 **********************************************************************
--     2000   2999  81687 ************************************************
--     3000   3999  51913 *******************************
--     4000   4999  34547 ********************
--     5000   5999  23407 **************
--     6000   6999  16253 *********
--     7000   7999  11322 ******
--     8000   8999  36457 *********************
--     9000   9999    404
--    10000  10999    227
--    11000  11999    158
--    12000  12999     92
--    13000  13999     58
--    14000  14999     47
--    15000  15999     32
--    16000  16999     58
--    17000  17999      4
--    18000  18999      2
--    19000  19999      0
--    20000  20999      1
--    21000  21999      0
--    22000  22999      1
-- New report created.
----------------------------------------
-- Starting command on Tue Nov 19 22:15:47 2019 with 738.13 GB free disk space

    cd correction/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Tue Nov 19 22:15:47 2019 (lickety-split) with 738.13 GB free disk space
----------------------------------------
--  segments   memory batches
--  -------- -------- -------
--        01  3.00 GB       1
--        02  1.50 GB       1
--        04  0.88 GB       1
--        06  0.62 GB       1
--        08  0.44 GB       1
--        12  0.31 GB       1
--
--  For 373689 reads with 1341523983 bases, limit to 13 batches.
--  Will count kmers using 01 jobs, each using 5 GB and 2 threads.
--
-- Report changed.
-- Finished stage 'merylConfigure', reset canuIteration.
-- No change in report.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Tue Nov 19 22:15:47 2019 with 738.13 GB free disk space (1 processes; 25 concurrently)

    cd correction/0-mercounts
    ./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1

-- Finished on Tue Nov 19 22:17:52 2019 (125 seconds) with 737.551 GB free disk space
----------------------------------------
-- Found 1 Kmer counting (meryl) outputs.
-- No change in report.
-- Finished stage 'cor-merylCountCheck', reset canuIteration.
-- No change in report.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Tue Nov 19 22:17:52 2019 with 737.551 GB free disk space (1 processes; 25 concurrently)

    cd correction/0-mercounts
    ./meryl-process.sh 1 > ./meryl-process.000001.out 2>&1

-- Finished on Tue Nov 19 22:18:10 2019 (18 seconds) with 737.847 GB free disk space
----------------------------------------
-- Meryl finished successfully.  Kmer frequency histogram:
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2  14788298 ********************************************************************** 0.4471 0.0239
--       3-     4   8131681 **************************************                                 0.6084 0.0368
--       5-     7   3811712 ******************                                                     0.7456 0.0528
--       8-    11   2004861 *********                                                              0.8286 0.0679
--      12-    16   1174417 *****                                                                  0.8781 0.0814
--      17-    22    743460 ***                                                                    0.9091 0.0936
--      23-    29    500775 **                                                                     0.9295 0.1046
--      30-    37    353249 *                                                                      0.9436 0.1147
--      38-    46    259920 *                                                                      0.9537 0.1239
--      47-    56    196350                                                                        0.9612 0.1324
--      57-    67    151582                                                                        0.9669 0.1404
--      68-    79    120396                                                                        0.9714 0.1479
--      80-    92     97427                                                                        0.9749 0.1549
--      93-   106     80411                                                                        0.9778 0.1616
--     107-   121     67697                                                                        0.9802 0.1679
--     122-   137     56220                                                                        0.9822 0.1741
--     138-   154     47394                                                                        0.9839 0.1799
--     155-   172     41229                                                                        0.9853 0.1855
--     173-   191     36657                                                                        0.9865 0.1908
--     192-   211     32136                                                                        0.9876 0.1962
--     212-   232     28914                                                                        0.9886 0.2014
--     233-   254     26186                                                                        0.9894 0.2065
--     255-   277     24023                                                                        0.9902 0.2117
--     278-   301     21938                                                                        0.9909 0.2168
--     302-   326     19839                                                                        0.9916 0.2219
--     327-   352     18260                                                                        0.9922 0.2269
--     353-   379     16414                                                                        0.9928 0.2319
--     380-   407     15073                                                                        0.9932 0.2367
--     408-   436     13858                                                                        0.9937 0.2415
--     437-   466     12445                                                                        0.9941 0.2462
--     467-   497     11332                                                                        0.9945 0.2507
--     498-   529     10184                                                                        0.9948 0.2551
--     530-   562      9232                                                                        0.9951 0.2593
--     563-   596      8390                                                                        0.9954 0.2634
--     597-   631      7733                                                                        0.9957 0.2673
--     632-   667      7164                                                                        0.9959 0.2711
--     668-   704      6543                                                                        0.9961 0.2748
--     705-   742      6099                                                                        0.9963 0.2785
--     743-   781      5666                                                                        0.9965 0.2820
--     782-   821      5310                                                                        0.9967 0.2855
--
--           0 (max occurrences)
--  1240073231 (total mers, non-unique)
--    33075355 (distinct mers, non-unique)
--           0 (unique mers)
-- Report changed.
-- Finished stage 'meryl-process', reset canuIteration.
--
-- Removing meryl database 'correction/0-mercounts/191026_TCL2MMT2_CanuAsm_1_4m.ms16'.
--
-- OVERLAPPER (mhap) (correction)
--
-- Set corMhapSensitivity=low based on read coverage of 335.
--
-- PARAMETERS: hashes=256, minMatches=3, threshold=0.8
--
-- Given 5.4 GB, can fit 16200 reads per block.
-- For 25 blocks, set stride to 6 blocks.
-- Logging partitioning to 'correction/1-overlapper/partitioning.log'.
-- Configured 24 mhap precompute jobs.
-- Configured 57 mhap overlap jobs.
-- No change in report.
-- Finished stage 'cor-mhapConfigure', reset canuIteration.
-- No change in report.
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Tue Nov 19 22:18:11 2019 with 738.126 GB free disk space (24 processes; 5 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 2 > ./precompute.000002.out 2>&1
    ./precompute.sh 3 > ./precompute.000003.out 2>&1
    ./precompute.sh 4 > ./precompute.000004.out 2>&1
    ./precompute.sh 5 > ./precompute.000005.out 2>&1
    ./precompute.sh 6 > ./precompute.000006.out 2>&1
    ./precompute.sh 7 > ./precompute.000007.out 2>&1
    ./precompute.sh 8 > ./precompute.000008.out 2>&1
    ./precompute.sh 9 > ./precompute.000009.out 2>&1
    ./precompute.sh 10 > ./precompute.000010.out 2>&1
    ./precompute.sh 11 > ./precompute.000011.out 2>&1
    ./precompute.sh 12 > ./precompute.000012.out 2>&1
    ./precompute.sh 13 > ./precompute.000013.out 2>&1
    ./precompute.sh 14 > ./precompute.000014.out 2>&1
    ./precompute.sh 15 > ./precompute.000015.out 2>&1
    ./precompute.sh 16 > ./precompute.000016.out 2>&1
    ./precompute.sh 17 > ./precompute.000017.out 2>&1
    ./precompute.sh 18 > ./precompute.000018.out 2>&1
    ./precompute.sh 19 > ./precompute.000019.out 2>&1
    ./precompute.sh 20 > ./precompute.000020.out 2>&1
    ./precompute.sh 21 > ./precompute.000021.out 2>&1
    ./precompute.sh 22 > ./precompute.000022.out 2>&1
    ./precompute.sh 23 > ./precompute.000023.out 2>&1
    ./precompute.sh 24 > ./precompute.000024.out 2>&1

-- Finished on Tue Nov 19 22:24:03 2019 (352 seconds) with 731.813 GB free disk space
----------------------------------------
-- All 24 mhap precompute jobs finished successfully.
-- No change in report.
-- Finished stage 'cor-mhapPrecomputeCheck', reset canuIteration.
-- No change in report.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Tue Nov 19 22:24:03 2019 with 731.813 GB free disk space (57 processes; 5 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 1 > ./mhap.000001.out 2>&1
    ./mhap.sh 2 > ./mhap.000002.out 2>&1
    ./mhap.sh 3 > ./mhap.000003.out 2>&1
    ./mhap.sh 4 > ./mhap.000004.out 2>&1
    ./mhap.sh 5 > ./mhap.000005.out 2>&1
    ./mhap.sh 6 > ./mhap.000006.out 2>&1
    ./mhap.sh 7 > ./mhap.000007.out 2>&1
    ./mhap.sh 8 > ./mhap.000008.out 2>&1
    ./mhap.sh 9 > ./mhap.000009.out 2>&1
    ./mhap.sh 10 > ./mhap.000010.out 2>&1
    ./mhap.sh 11 > ./mhap.000011.out 2>&1
    ./mhap.sh 12 > ./mhap.000012.out 2>&1
    ./mhap.sh 13 > ./mhap.000013.out 2>&1
    ./mhap.sh 14 > ./mhap.000014.out 2>&1
    ./mhap.sh 15 > ./mhap.000015.out 2>&1
    ./mhap.sh 16 > ./mhap.000016.out 2>&1
    ./mhap.sh 17 > ./mhap.000017.out 2>&1
    ./mhap.sh 18 > ./mhap.000018.out 2>&1
    ./mhap.sh 19 > ./mhap.000019.out 2>&1
    ./mhap.sh 20 > ./mhap.000020.out 2>&1
    ./mhap.sh 21 > ./mhap.000021.out 2>&1
    ./mhap.sh 22 > ./mhap.000022.out 2>&1
    ./mhap.sh 23 > ./mhap.000023.out 2>&1
    ./mhap.sh 24 > ./mhap.000024.out 2>&1
    ./mhap.sh 25 > ./mhap.000025.out 2>&1
    ./mhap.sh 26 > ./mhap.000026.out 2>&1
    ./mhap.sh 27 > ./mhap.000027.out 2>&1
./mhap.sh 28 > ./mhap.000028.out 2>&1
    ./mhap.sh 29 > ./mhap.000029.out 2>&1
    ./mhap.sh 30 > ./mhap.000030.out 2>&1
    ./mhap.sh 31 > ./mhap.000031.out 2>&1
    ./mhap.sh 32 > ./mhap.000032.out 2>&1
    ./mhap.sh 33 > ./mhap.000033.out 2>&1
    ./mhap.sh 34 > ./mhap.000034.out 2>&1
    ./mhap.sh 35 > ./mhap.000035.out 2>&1
    ./mhap.sh 36 > ./mhap.000036.out 2>&1
    ./mhap.sh 37 > ./mhap.000037.out 2>&1
    ./mhap.sh 38 > ./mhap.000038.out 2>&1
    ./mhap.sh 39 > ./mhap.000039.out 2>&1
    ./mhap.sh 40 > ./mhap.000040.out 2>&1
    ./mhap.sh 41 > ./mhap.000041.out 2>&1
    ./mhap.sh 42 > ./mhap.000042.out 2>&1
    ./mhap.sh 43 > ./mhap.000043.out 2>&1
    ./mhap.sh 44 > ./mhap.000044.out 2>&1
    ./mhap.sh 45 > ./mhap.000045.out 2>&1
    ./mhap.sh 46 > ./mhap.000046.out 2>&1
    ./mhap.sh 47 > ./mhap.000047.out 2>&1
    ./mhap.sh 48 > ./mhap.000048.out 2>&1
    ./mhap.sh 49 > ./mhap.000049.out 2>&1
    ./mhap.sh 50 > ./mhap.000050.out 2>&1
    ./mhap.sh 51 > ./mhap.000051.out 2>&1
    ./mhap.sh 52 > ./mhap.000052.out 2>&1
    ./mhap.sh 53 > ./mhap.000053.out 2>&1
    ./mhap.sh 54 > ./mhap.000054.out 2>&1
    ./mhap.sh 55 > ./mhap.000055.out 2>&1
    ./mhap.sh 56 > ./mhap.000056.out 2>&1
    ./mhap.sh 57 > ./mhap.000057.out 2>&1

-- Finished on Wed Nov 20 23:45:17 2019 (91274 seconds, no bitcoins found either) with 0 GB free disk space  !!! WARNING !!!
----------------------------------------
--
-- Mhap overlap jobs failed, retry.
--   job correction/1-overlapper/results/000036.ovb FAILED.
--   job correction/1-overlapper/results/000037.ovb FAILED.
--   job correction/1-overlapper/results/000039.ovb FAILED.
--   job correction/1-overlapper/results/000040.ovb FAILED.
--   job correction/1-overlapper/results/000041.ovb FAILED.
--   job correction/1-overlapper/results/000042.ovb FAILED.
--   job correction/1-overlapper/results/000043.ovb FAILED.
--   job correction/1-overlapper/results/000044.ovb FAILED.
--   job correction/1-overlapper/results/000045.ovb FAILED.
--   job correction/1-overlapper/results/000046.ovb FAILED.
--   job correction/1-overlapper/results/000047.ovb FAILED.
--   job correction/1-overlapper/results/000048.ovb FAILED.
--   job correction/1-overlapper/results/000049.ovb FAILED.
--   job correction/1-overlapper/results/000050.ovb FAILED.
--   job correction/1-overlapper/results/000051.ovb FAILED.
--   job correction/1-overlapper/results/000052.ovb FAILED.
--   job correction/1-overlapper/results/000053.ovb FAILED.
--   job correction/1-overlapper/results/000054.ovb FAILED.
--   job correction/1-overlapper/results/000055.ovb FAILED.
--   job correction/1-overlapper/results/000056.ovb FAILED.
--   job correction/1-overlapper/results/000057.ovb FAILED.
--
-- Report changed.
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Wed Nov 20 23:45:17 2019 with 0 GB free disk space (21 processes; 5 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 36 > ./mhap.000036.out 2>&1
    ./mhap.sh 37 > ./mhap.000037.out 2>&1
    ./mhap.sh 39 > ./mhap.000039.out 2>&1
    ./mhap.sh 40 > ./mhap.000040.out 2>&1
    ./mhap.sh 41 > ./mhap.000041.out 2>&1
    ./mhap.sh 42 > ./mhap.000042.out 2>&1
    ./mhap.sh 43 > ./mhap.000043.out 2>&1
    ./mhap.sh 44 > ./mhap.000044.out 2>&1
    ./mhap.sh 45 > ./mhap.000045.out 2>&1
    ./mhap.sh 46 > ./mhap.000046.out 2>&1
    ./mhap.sh 47 > ./mhap.000047.out 2>&1
    ./mhap.sh 48 > ./mhap.000048.out 2>&1
    ./mhap.sh 49 > ./mhap.000049.out 2>&1
    ./mhap.sh 50 > ./mhap.000050.out 2>&1
    ./mhap.sh 51 > ./mhap.000051.out 2>&1
    ./mhap.sh 52 > ./mhap.000052.out 2>&1
    ./mhap.sh 53 > ./mhap.000053.out 2>&1
    ./mhap.sh 54 > ./mhap.000054.out 2>&1
    ./mhap.sh 55 > ./mhap.000055.out 2>&1
    ./mhap.sh 56 > ./mhap.000056.out 2>&1
    ./mhap.sh 57 > ./mhap.000057.out 2>&1

-- Finished on Wed Nov 20 23:57:02 2019 (705 seconds) with 0 GB free disk space  !!! WARNING !!!
----------------------------------------
--
-- Mhap overlap jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/results/000036.ovb FAILED.
--   job correction/1-overlapper/results/000037.ovb FAILED.
--   job correction/1-overlapper/results/000039.ovb FAILED.
--   job correction/1-overlapper/results/000040.ovb FAILED.
--   job correction/1-overlapper/results/000041.ovb FAILED.
--   job correction/1-overlapper/results/000042.ovb FAILED.
--   job correction/1-overlapper/results/000043.ovb FAILED.
--   job correction/1-overlapper/results/000044.ovb FAILED.
--   job correction/1-overlapper/results/000045.ovb FAILED.
--   job correction/1-overlapper/results/000046.ovb FAILED.
--   job correction/1-overlapper/results/000047.ovb FAILED.
--   job correction/1-overlapper/results/000048.ovb FAILED.
--   job correction/1-overlapper/results/000049.ovb FAILED.
--   job correction/1-overlapper/results/000050.ovb FAILED.
--   job correction/1-overlapper/results/000051.ovb FAILED.
--   job correction/1-overlapper/results/000052.ovb FAILED.
--   job correction/1-overlapper/results/000053.ovb FAILED.
--   job correction/1-overlapper/results/000054.ovb FAILED.
--   job correction/1-overlapper/results/000055.ovb FAILED.
--   job correction/1-overlapper/results/000056.ovb FAILED.
--   job correction/1-overlapper/results/000057.ovb FAILED.
--

ABORT:
ABORT: Canu 1.9
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
skoren commented 4 years ago

Is all this data for an 8kb product? In that case you have extremely high coverage. Take a look at the readSamplingCoverage and readSamplingBias options to downsample your data. You probably only want 100x at most.

HSA191109 commented 4 years ago

Yes, all the data is for an 8kb product. It is a first test run with this type of sample before trying to multiplex them. Yesterday, I repeated the -correct step with readSamplingCoverage=100 readSamplingBias=1.5 purgeOverlaps=aggressive. It finished very fast without running out of memory, and currently -trim-assemble is in progress with the same parameters. I was not sure whether to still use genomeSize=4m or better genomeSize=8k, so I stayed with genomeSize=4m for the first trial, I hope it works. Thank you very much for the advice.

lmolokin commented 4 years ago

@HSA191109 I would think your genomeSize should be 8k since that is the size of your amplicon.

brianwalenz commented 4 years ago

Genome size really only affects the amount of corrected reads generated. Canu will generate 40 * genomeSize bases of corrected reads.