marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Not enough memory to load the minimum number of overlaps; increase -M #1093

Closed the-tech-mode closed 5 years ago

the-tech-mode commented 6 years ago

Hello

Let me explain briefly my issue I hope we can find a solution

I sequenced short tandem repeats (VNTR) using the Nanopore Native Barcoding Kit, (multiplexing) each library contains only small PCR amplicons (somewhere between 200-1000bp) However each library contains PCR amplicons from 24 different loci (kind of second multiplexing that I plan to demultiplex using their primers as a second barcode) Some of these loci are very similar in the VNTR region (up to 100% identity in a fragment of 50bp)

I understand that given that Im not handling a whole genome I dont really need to assembly, but Im interested in the corrected reads fasta file.

Now I succesfully achieve it in an iMac i5 32GB ram and 2Tb hard disk

here is the issue: BUT when I try to run in an Ubuntu miniserver we have in the lab

(Memory 125.8 GiB Processor Intel® Xeon(R) CPU E5-2697 v3 @ 2.60GHz × 16 OS type 64-bit Disk 4.3 TB (free space 180 GB))

using with this command

'/home/luna2/canu/Linux-amd64/bin/canu' -d '/home/luna2/Desktop/Barcode04/parameter01' -p BC04 -genomeSize=25k 'gnuplot=/usr/bin/gnuplot' 'minReadLength=200' 'minOverlapLength=150' 'stopOnReadQuality=False' -nanopore-raw '/home/luna2/Desktop/BC04.fastq'

I get the following announce:

- Finished on Thu Sep 20 06:17:41 2018 (like a bat out of hell) with 166.037 GB free disk space
----------------------------------------
--
-- Bogart failed, retry
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'bat' concurrent execution on Thu Sep 20 06:17:41 2018 with 166.037 GB free disk space (1 processes; 1 concurrently)

    cd unitigging/4-unitigger
    ./unitigger.sh 1 > ./unitigger.000001.out 2>&1

-- Finished on Thu Sep 20 06:17:41 2018 (lickety-split) with 166.037 GB free disk space
----------------------------------------
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu snapshot v1.7 +368 changes (r9060 99162f51f40a61de2043591da8858fa75e3b0e9e)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT: Disk space available:  166.037 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.err):
ABORT:
ABORT:   
ABORT:   Lengths:
ABORT:     Minimum read          0 bases
ABORT:     Minimum overlap       150 bases
ABORT:   
ABORT:   Overlap Error Rates:
ABORT:     Graph                 0.144 (14.400%)
ABORT:     Max                   0.144 (14.400%)
ABORT:   
ABORT:   Deviations:
ABORT:     Graph                 6.000
ABORT:     Bubble                6.000
ABORT:     Repeat                3.000
ABORT:   
ABORT:   Edge Confusion:
ABORT:     Absolute              2100
ABORT:     Percent               200.0000
ABORT:   
ABORT:   Unitig Construction:
ABORT:     Minimum intersection  500 bases
ABORT:     Maxiumum placements   2 positions
ABORT:   
ABORT:   Debugging Enabled:
ABORT:     (none)
ABORT:   
ABORT:   ==> LOADING AND FILTERING OVERLAPS.
ABORT:   
ABORT:   ReadInfo()-- Using 216353 reads, no minimum read length used.
ABORT:   
ABORT:   OverlapCache()-- limited to 16384MB memory (user supplied).
ABORT:   
ABORT:   OverlapCache()--       1MB for read data.
ABORT:   OverlapCache()--       8MB for best edges.
ABORT:   OverlapCache()--      21MB for tigs.
ABORT:   OverlapCache()--       5MB for tigs - read layouts.
ABORT:   OverlapCache()--       8MB for tigs - error profiles.
ABORT:   OverlapCache()--    4096MB for tigs - error profile overlaps.
ABORT:   OverlapCache()--       0MB for other processes.
ABORT:   OverlapCache()-- ---------
ABORT:   OverlapCache()--    4145MB for data structures (sum of above).
ABORT:   OverlapCache()-- ---------
ABORT:   OverlapCache()--       4MB for overlap store structure.
ABORT:   OverlapCache()--   12234MB for overlap data.
ABORT:   OverlapCache()-- ---------
ABORT:   OverlapCache()--   16384MB allowed.
ABORT:   OverlapCache()--
ABORT:   OverlapCache()-- Retain at least 4731 overlaps/read, based on 2365.71x coverage.
ABORT:   OverlapCache()-- Initial guess at 3705 overlaps/read.
ABORT:   OverlapCache()--
ABORT:   OverlapCache()-- Not enough memory to load the minimum number of overlaps; increase -M.

What do you recommend me to do? I would like to see if I canu can assembly 24 (or near 24) "contigs" So now I have corrected reads but no contigs folder

Thanks so much I hope I was not confusing in this explanation

I attached the report as well

Regards Hector Guzman Mahidol University Thailand

BC04.report.zip

brianwalenz commented 6 years ago

Increasing the memory allowed for this step to 32 GB or even 48 GB should work. Add batMemory=32 to your command.

The issue is that it thinks you've got 2365x read coverage, and is then wanting to find 4371 overlaps for each read (times 216000 reads = a lot of memory). You might get some contigs out, but they most certainly be redundant.

You can try using the read sampling options (https://canu.readthedocs.io/en/latest/parameter-reference.html#readsamplingcoverage) to down sample coverage to 50-100x.

skoren commented 5 years ago

Idle closing, duplicate of #1021 as well. Given that you don't really need assembly, you should already have corrected reads before the bogart step, run canu with canu -correct to skip anything except correction.