marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

Canu fails at correction #1438

Closed tayabsoomro closed 5 years ago

tayabsoomro commented 5 years ago

Hi, I am running canu on my raw nanopore reads generated through MinION.

Here is the command I am running:

canu --assemble -p pb3 -d pb3-ont corOutCoverage=500 \
corMinCoverage=0 corMhapSensitivity=high  genomeSize=25m \
-nanopore-raw $NANOPORE_RAW \
gridEngineMemoryOption="-l h_vmem=MEMORY"

I am using canu version 1.8

canu                      1.8             pl526h470a237_0    bioconda

I am running the software on SGE grid, here are the configuration:

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '11.0.1' (from '/.../miniconda3/bin/java') without -d64 support.
-- Detected gnuplot version '5.2 patchlevel 7   ' (from 'gnuplot') and image format 'png'.
-- Detected 80 CPUs and 1008 gigabytes of memory.
-- Detected Sun Grid Engine in '/opt/gridengine/default'.
-- Detected Grid Engine environment 'smp'.
-- User supplied Grid Engine consumable '-l h_vmem=MEMORY'.
--
-- Found   8 hosts with  80 cores and 1007 GB memory under Sun Grid Engine control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     12 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap    6 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16 GB    4 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)

I get the following output in canu.out

.
.
.

-- BEGIN CORRECTION
--
--
-- Mhap precompute jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/blocks/000001.dat FAILED.
--   job correction/1-overlapper/blocks/000002.dat FAILED.
--   job correction/1-overlapper/blocks/000003.dat FAILED.
--   job correction/1-overlapper/blocks/000004.dat FAILED.
--   job correction/1-overlapper/blocks/000005.dat FAILED.
--   job correction/1-overlapper/blocks/000006.dat FAILED.
--   job correction/1-overlapper/blocks/000007.dat FAILED.
--   job correction/1-overlapper/blocks/000008.dat FAILED.

.
.
.

Any hints on why might the correction fail?

skoren commented 5 years ago

Based on the output, I don't think you actually have canu 1.8, conda canu installations are broken and conda is not recommended for installing canu. Older versions of Canu didn't support JVMs newer than 8 which you have. I expect if you download the 1.8 pre-complied release from the release page it will work with your JVM.

tayabsoomro commented 5 years ago

Okay, so I scrapped the previous canu and I installed it from the source like it mentioned in the instructions. It shows the same message in canu.out, where the correction is FAILED.

Here is the configuration of canu:

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '11.0.1' (from '/.../miniconda3/bin/java') without -d64 support.
-- Detected gnuplot version '5.2 patchlevel 7   ' (from 'gnuplot') and image format 'png'.
-- Detected 80 CPUs and 1008 gigabytes of memory.
-- Detected Sun Grid Engine in '/opt/gridengine/default'.
-- Detected Grid Engine environment 'smp'.
-- User supplied Grid Engine consumable '-l h_vmem=MEMORY'.
--
-- Found   8 hosts with  80 cores and 1007 GB memory under Sun Grid Engine control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     12 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap    6 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16 GB    4 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)
skoren commented 5 years ago

Did you install the 1.9 branch from source? You shouldn't be installing tip from source, that has lots of untested changes. I suggested using a release to make it easier to install.

There's no error in your log you posted, is it truncated? You have to remove all previous output results and run from a new assembly folder, the new version is not backwards compatible.

tayabsoomro commented 5 years ago

I did download using relsease:

(base) [soomrot@biocluster pb3-ont]$ canu --version
Canu 1.8

I have truncated most of the output because it shows the same FAILED message for many lines. I did run canu on a fresh folder after new installation.

skoren commented 5 years ago

Without an error message there is no way to say why your assembly isn't running. Include at least one of the repeated messages.

Where did you install the downloaded release (full path)? What does which canu return? What is the full command you're using?

tayabsoomro commented 5 years ago

I did include the error message in first message, here it is again. It is in the canu.out file:

.
.
.

-- BEGIN CORRECTION
--
--
-- Mhap precompute jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/blocks/000001.dat FAILED.
--   job correction/1-overlapper/blocks/000002.dat FAILED.
--   job correction/1-overlapper/blocks/000003.dat FAILED.
--   job correction/1-overlapper/blocks/000004.dat FAILED.
--   job correction/1-overlapper/blocks/000005.dat FAILED.
--   job correction/1-overlapper/blocks/000006.dat FAILED.
--   job correction/1-overlapper/blocks/000007.dat FAILED.
--   job correction/1-overlapper/blocks/000008.dat FAILED.

.
.
.

I installed it in my home directory.

(base) [soomrot@biocluster pb3-ont]$ which canu
~/software/canu-1.8/Linux-amd64/bin/canu
skoren commented 5 years ago

Ah, OK, sorry I misread your second comment that it was the same error. What is the error log for one of the failing jobs (correction/1-overlapper/precompute*out)?

tayabsoomro commented 5 years ago
Found perl:
   /.../soomrot/miniconda3/bin/perl
   This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi

Found java:
   /.../soomrot/miniconda3/bin/java
   openjdk version "11.0.1" 2018-10-16 LTS

Found canu:
   /.../soomrot/software/canu-1.8/Linux-amd64/bin/canu
   Canu 1.8

Running job 113 based on SGE_TASK_ID=113 and offset=0.
Dumping reads from 1814401 to 1830600 (inclusive).

Starting mhap precompute.

Error occurred during initialization of VM
Could not reserve enough space for 5662720KB object heap
Mhap failed.

Found perl:
   /.../soomrot/miniconda3/bin/perl
   This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi

Found java:
   /.../soomrot/miniconda3/bin/java
   openjdk version "11.0.1" 2018-10-16 LTS

Found canu:
   /.../soomrot/software/canu-1.8/Linux-amd64/bin/canu
   Canu 1.8

Running job 113 based on SGE_TASK_ID=113 and offset=0.
Dumping reads from 1814401 to 1830600 (inclusive).

Starting mhap precompute.

Error occurred during initialization of VM
Could not reserve enough space for 5662720KB object heap
Mhap failed.

Oh! It looks like it is running out of memory? How do I allot more memory? I was under the impression that canu realizes the available resources by itself and manges accordingly.

skoren commented 5 years ago

Canu does configure memory and threads, but it is possible the memory parameter you've told it to use isn't correct. Typically h_vmem is not a consumable resource and not requested on a per-core basis which is what canu expects (see #1016). What was the canu grid request for this job (in correction/1-overlapper/*.jobSubmit*sh)? Check your grid memory options for a consumable resources and post the results here (qconf -sc|grep MEMORY), you want something like this:

#name               shortcut   type        relop requestable consumable default  urgency 
#----------------------------------------------------------------------------------------
mem_free            mf         MEMORY      <=    YES         YES        0        0
tayabsoomro commented 5 years ago

Here is the result of the above command:

(base) [soomrot@biocomp-0-3 Canu_run_7]$ qconf -sc|grep MEMORY
h_core              h_core     MEMORY    <=    YES         NO         0        0
h_data              h_data     MEMORY    <=    YES         NO         0        0
h_fsize             h_fsize    MEMORY    <=    YES         NO         0        0
h_rss               h_rss      MEMORY    <=    YES         NO         0        0
h_stack             h_stack    MEMORY    <=    YES         NO         0        0
h_vmem              h_vmem     MEMORY    <=    YES         NO         0        0
mem_free            mf         MEMORY    <=    YES         NO         0        0
mem_total           mt         MEMORY    <=    YES         NO         0        0
mem_used            mu         MEMORY    >=    YES         NO         0        0
s_core              s_core     MEMORY    <=    YES         NO         0        0
s_data              s_data     MEMORY    <=    YES         NO         0        0
s_fsize             s_fsize    MEMORY    <=    YES         NO         0        0
s_rss               s_rss      MEMORY    <=    YES         NO         0        0
s_stack             s_stack    MEMORY    <=    YES         NO         0        0
s_vmem              s_vmem     MEMORY    <=    YES         NO         0        0
swap_free           sf         MEMORY    <=    YES         NO         0        0
swap_rate           sr         MEMORY    >=    YES         NO         0        0
swap_rsvd           srsv       MEMORY    >=    YES         NO         0        0
swap_total          st         MEMORY    <=    YES         NO         0        0
swap_used           su         MEMORY    >=    YES         NO         0        0
virtual_free        vf         MEMORY    <=    YES         NO         0        0
virtual_total       vt         MEMORY    <=    YES         NO         0        0
virtual_used        vu         MEMORY    >=    YES         NO         0        0

Which one of these are consumable?

And here is the content of *.jobSubmit*sh file:

qsub \
  -l h_vmem=384m -pe smp 16 -j y -o precompute.\$TASK_ID.out \
  -cwd -N "cormhap_pb3" \
  -t 1-174 \
  ./precompute.sh 0 \
> ./precompute.jobSubmit-01.out 2>&1
skoren commented 5 years ago

None of them are consumable, that's the second column. This means your grid really has no way to reserve memory. Essentially, this means the only way to reserve memory on your cluster is via cores. Since your machines have 60gb/core, that should be OK for canu (it won't over-schedule your memory) for the full config you posted.

The second issue with non-consumable memory is that it's requested per job not per core as canu expects. There are two ways around this, first the 1.9 tip which you have to build from source has an option of gridEngineMemoryPerJob which you'd want to set to true. Alternatively, you'd have to change the code in Linux-amd64/lib/site_perl/canu/Execution.pm to remove lines 890-892:

 890     if (uc(getGlobal("gridEngine")) eq "SGE") {
 891         $m /= $t;
 892     }
tayabsoomro commented 5 years ago

None of them are consumable, that's the second column. This means your grid really has no way to reserve memory. Essentially, this means the only way to reserve memory on your cluster is via cores. Since your machines have 60gb/core, that should be OK for canu (it won't over-schedule your memory) for the full config you posted.

So, the following configuration should be okay for my purposes, so long as I either get 1.9 tip or I remove the lines from Execution.pm file talked about above??

canu --assemble -p pb3 -d pb3-ont corOutCoverage=500 \
corMinCoverage=0 corMhapSensitivity=high  genomeSize=25m \
-nanopore-raw $NANOPORE_RAW \
gridEngineMemoryOption="-l h_vmem=MEMORY"
skoren commented 5 years ago

Yes, you can look at the resource usage from canu:

-- Grid:  meryl     12 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap    6 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16 GB    4 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)

As long as none of memory per core usages exceed your node's memory per core you should be OK.

tayabsoomro commented 5 years ago

Should I still give my command this option: gridEngineMemoryOption="-l h_vmem=MEMORY"?

Also, how do I find out how much memory per core does my system have?

skoren commented 5 years ago

Yes, as long as you make the changes I suggested you can include the memory option. Ah, good thing you asked because I mis-read your message and over-estimated your memory. Divide memory by cores:

-- Found   8 hosts with  80 cores and 1007 GB memory under Sun Grid Engine control.

So 1000/80 = 12.5GB/core. So I I reserve 1 core, I'm implicitly reserving 12.5GB of memory. For example, only 5 cormhap 6gb/16core jobs can fit on your node since 80/16 = 5. That means they'll be able to use 1000gb between them but will only be using 30.

IyadKandalaft commented 5 years ago

Typically h_vmem is not a consumable resource and not requested on a per-core basis which is what canu expects

In SGE, h_vmem is requested on a per-slot basis by default but can be modified to a per JOB basis by the admin. Some HPC equate slots and cores and some don't, which makes it tricky.

As far as I can tell, the command used to execute CANU did not identify how to notify SGE to reserve slots (or cores). gridEngineThreadsOption should be added as follows: gridEngineThreadsOption="-pe smp THREADS" gridEngineMemoryOption="-l h_vmem=MEMORY"

A job with -pe smp 8 or -pe orte 8 with an h_vmem=2G resource has a process limit of

Max address space         17179869184          17179869184          bytes

which is exactly 16GB given that h_vmem is configured to multiply by the number of slots.

skoren commented 5 years ago

This is not correct across different SGE configurations, as you said. I expect on your cluster h_vmem is consumable which is why it's requesting memory per slot. Have you checked what your qconf -sc reports?

I have not seen non-consumable resources be requested per slot but you can make consumable resources requested per job rather than per-slot. Your command wouldn't help the original poster since his h_vmem appears to be per job (this also matches our cluster config we have another consumable memory resources which is per slot called mem_free). Canu 1.9 does check whether your resource is per job or per slot and let the user force one or the other if it can't be auto-detected.

IyadKandalaft commented 5 years ago

@skoren , we're talking about the same HPC as Tayab.

h_vmem h_vmem MEMORY <= YES NO 0 0

Requestable = YES Consumable = NO

If CONSUMABLE is YES, then the value of "Max address space" is the requested h_vmem * NSLOTS. If CONSUMABLE is JOB, then the value of "Max address space" is the requested h_vmem.

The SGE manual on complex.5 (https://arc.liv.ac.uk/SGE/htmlman/manuals.html) does not explicitly state the value for "CONSUMABLE=NO". But I validated it to be the same as when it's set to YES.

A consumable defined by 'y' is a per-slot consumable, which means the limit is multiplied by the number of slots being used by the job before being applied. In case of 'j' the consumable is a per-job consumable. This resource is debited as requested (without multiplication) from the allocated master queue. The resource need not be available for the slave task queues.

skoren commented 5 years ago

"The SGE manual on complex.5 (https://arc.liv.ac.uk/SGE/htmlman/manuals.html) does not explicitly state the value for "CONSUMABLE=NO". But I validated it to be the same as when it's set to YES."

I don't think this is correct, if it was then the jobs should have had no issues allocating the JVM. They requested 384m * 16 = 6144m and only tried to create a JVM with 6gb. I've confirmed that the JVM gets killed when trying to use the memory with h_vmem, here is our config:

h_vmem              h_vmem     MEMORY      <=    YES         NO         0        0
mem_free            mf         MEMORY      <=    YES         YES        0        0

When I run:

qlogin -pe thread 16 -l h_vmem=1g
% java -Xmx1g -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

% java -Xmx16g -version
Error occurred during initialization of VM
Could not reserve enough space for 16777216KB object heap

% java -Xmx15g -version
Error occurred during initialization of VM
Could not reserve enough space for 15728640KB object heap

% java -Xmx13g -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for AllocateHeap
# An error report file with more information is saved as:

Why would a 13gb allocation fail when I have 16gb requested? It's clearly requesting much less than threads * memory.

Versus:

qlogin -pe thread 16 -l mem_free=1g
% java -Xmx1g -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
% java -Xmx16g -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
% java -Xmx15g -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
% java -Xmx13g -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

So yes, h_vmem non-consumable is per-job not per slot which caused the original error and which is fixed by my suggestions.

tayabsoomro commented 5 years ago

... Alternatively, you'd have to change the code in Linux-amd64/lib/site_perl/canu/Execution.pm to remove lines 890-892:

 890     if (uc(getGlobal("gridEngine")) eq "SGE") {
 891         $m /= $t;
 892     }

I made the changes described above and I re-ran the canu program with following command:

canu -p ... -d ... genomeSize=25m -nanopore-raw $NANOPORE_RAW \
gridEngineMemoryOption="-l h_vmem=MEMORY"

From the output I realize that the algorithm "contig construction with bogart" uses 16 GB on 4 threads, which is under 12.5 GB/core limit right?

--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  bat       16 GB    4 CPUs  (contig construction with bogart)

It looks canu is having issues at this step (bogart), as you can see in the full output of canu.out file:

Found perl:
   /.../miniconda3/bin/perl
   This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi

Found java:
   /.../miniconda3/bin/java
   Error occurred during initialization of VM

Found canu:
   /.../canu-1.8/Linux-amd64/bin/canu
   Canu 1.8

-- Canu 1.8
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
--
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '11.0.1' (from '/.../miniconda3/bin/java') without -d64 support.
-- Detected gnuplot version '5.2 patchlevel 7   ' (from 'gnuplot') and image format 'png'.
-- Detected 80 CPUs and 1008 gigabytes of memory.
-- Detected Sun Grid Engine in '/opt/gridengine/default'.
-- Detected Grid Engine environment 'smp'.
-- User supplied Grid Engine consumable '-l h_vmem=MEMORY'.
--
-- Found   8 hosts with  80 cores and 1007 GB memory under Sun Grid Engine control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     12 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap    6 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16 GB    4 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)
--
-- In 'pb3.seqStore', found Nanopore reads:
--   Raw:        2814274
--   Corrected:  873725
--   Trimmed:    863940
--
-- Generating assembly '...' in '/.../...'
--
-- Parameters:
--
--  genomeSize        25000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT: Disk space available:  428481.528 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.err):
ABORT:
ABORT:
ABORT:   Lengths:
ABORT:     Minimum read          0 bases
ABORT:     Minimum overlap       500 bases
ABORT:
ABORT:   Overlap Error Rates:
ABORT:     Graph                 0.120 (12.000%)
ABORT:     Max                   0.120 (12.000%)
ABORT:
ABORT:   Deviations:
ABORT:     Graph                 6.000
ABORT:     Bubble                6.000
ABORT:     Repeat                3.000
ABORT:
ABORT:   Edge Confusion:
ABORT:     Absolute              2100
ABORT:     Percent               200.0000
ABORT:
ABORT:   Unitig Construction:
ABORT:     Minimum intersection  500 bases
ABORT:     Maxiumum placements   2 positions
ABORT:
ABORT:   Debugging Enabled:
ABORT:     (none)
ABORT:
ABORT:   ==> LOADING AND FILTERING OVERLAPS.
ABORT:
ABORT:   ReadInfo()-- Using 2814274 reads, no minimum read length used.
ABORT:
ABORT:   OverlapCache()-- limited to 16384MB memory (user supplied).
ABORT:
ABORT:   OverlapCache()--      21MB for read data.
ABORT:   OverlapCache()--     107MB for best edges.
ABORT:   OverlapCache()--     279MB for tigs.
ABORT:   OverlapCache()--      75MB for tigs - read layouts.
ABORT:   OverlapCache()--     107MB for tigs - error profiles.
ABORT:   OverlapCache()--    4096MB for tigs - error profile overlaps.
ABORT:   OverlapCache()--       0MB for other processes.
ABORT:   OverlapCache()-- ---------
ABORT:   OverlapCache()--    4740MB for data structures (sum of above).
ABORT:   OverlapCache()-- ---------
ABORT:   OverlapCache()--      53MB for overlap store structure.
ABORT:   OverlapCache()--   11590MB for overlap data.
ABORT:   OverlapCache()-- ---------
ABORT:   OverlapCache()--   16384MB allowed.
ABORT:   OverlapCache()--
ABORT:   OverlapCache()-- Retain at least 559 overlaps/read, based on 279.58x coverage.
ABORT:   OverlapCache()-- Initial guess at 269 overlaps/read.
ABORT:   OverlapCache()--
ABORT:   OverlapCache()-- Not enough memory to load the minimum number of overlaps; increase -M.
brianwalenz commented 5 years ago

There's only one bogart job, so the memory/cpu limit is ok.

The defaults are set assuming ~60x coverage, and this job just runs out of memory with your 280x coverage. Increase batMemory to 32g or 48g. You can increase threads (batThreads) too if you want to keep the same memory/cpu as before; it won't really make much difference in run time.

IyadKandalaft commented 5 years ago

Just because it can't allocate heap, that doesn't mean that h_vmem isn't being multiplied by the number of slots. There's more at play where h_vmem is just killing the JVM because it's trying to allocate more memory than you are anticipating.

Here's an example where h_vmem is CONSUMABLE = NO

$ qconf -sc | grep h_vmem
h_vmem              h_vmem     MEMORY    <=    YES         NO         0        0

I requested 8 slots with an h_vmem of 1G per slot

$ qlogin -pe smp 8 -l h_vmem=1G

Here is the address space limit on the parent process showing exactly 1GB * 8 slots = 8GB

$ cat /proc/137895/limits | grep "address space"
Max address space         8589934592           8589934592           bytes  

Here's an example of stress-ng using 7.5GB of vmem:

$ ~/stress-ng-0.10.01/stress-ng --vm 1 --vm-bytes 7.5G $ ps -o vsz,rss,comm VSZ RSS COMMAND 115560 2276 bash 86368 5632 stress-ng 86372 2696 stress-ng-vm 7426404 7342736 stress-ng-vm

And here's java failing to allocate just 6GB of heap:

$ java -Xmx6g -version
Error occurred during initialization of VM
Unable to allocate 196608KB bitmaps for parallel garbage collection for the requested 6291456KB heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

That's because h_vmem looks at total virtual memory being allocated by the parent process and all children. Hence, you can't just look at the java heap setting as the total virtual memory.

So the question is: What other memory is java allocation and how do we fix it?

So there's GC mem and Metaspace (there's potentialyl others - I'm not a java expert).

It seems that java is creating 53 GC threads by default on these machines.

Reducing it to 1 or 2 GC thread w/ max heap of 6400MB works: $ java -Xmx6400m -XX:ParallelGCThreads=1 -version openjdk version "1.8.0_181" OpenJDK Runtime Environment (build 1.8.0_181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

$ java -Xmx6400m -XX:ParallelGCThreads=2 -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

I needed to drop the heap by 100M for the third. $ java -Xmx6300m -XX:ParallelGCThreads=3 -version openjdk version "1.8.0_181" OpenJDK Runtime Environment (build 1.8.0_181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

It turns out that HeapSizePerGCThread = 87241520 adds an additional ~83MB per GC thread.

There's also per thread stack size, which consumes 1M per thread. The last of it

Ok but that still doesn't explain the other 1GB of memory that it can't allocate to the heap. I tried reducing MaxMetaspaceSize=64M but didn't seem to do anything. JCMD shows that Class reserves 1GB.

Class (reserved=1070652KB, committed=22844KB)
                        (classes #3072)
                        (malloc=3644KB #4053) 
                        (mmap: reserved=1067008KB, committed=19200KB) 

Basically, the conclusion is the h_vmem just makes it too complex for some software that reserves a lot of memory that it may never use. The only thing I found on this topic is: https://stackoverflow.com/questions/18708085/sge-h-vmem-vs-java-xmx-xms

brianwalenz commented 5 years ago

Per queue_conf:

       The resource limit parameters s_vmem and h_vmem are implemented by Grid
       Engine as a job limit.  They impose a limit on the amount of combined
       virtual memory consumed by all the processes in the job. If h_vmem is
       exceeded by a job running in the queue, it is aborted via a SIGKILL
       signal (see kill(1)).  If s_vmem is exceeded, the job is sent a SIGXCPU
       signal which can be caught by the job.  If you wish to allow a job to
       be "warned" so it can exit gracefully before it is killed, then you
       should set the s_vmem limit to a lower value than h_vmem.  For parallel
       processes, the limit is applied per slot which means that the limit is
       multiplied by the number of slots being used by the job before being
       applied.

Note three things: (1) it's by definition not consumable, so SGE will happily overcommit a node; (2) it is just a limit on job size; (3) it is scaled by number of threads.

I will disagree that h_vmem makes it 'too complex' and instead argue it's the wrong thing to be looking at. Memory mapped files will vastly increase your process size, but aren't in memory. https://serverfault.com/questions/806646/what-is-the-difference-between-h-rss-and-h-vmem-in-sun-grid-engine-sge has a lovely example of gcc using 20 TB virtual space but only 5 MB resident, and chrome using 2TB of address space.

Anyway, do you need to set h_vmem in the job submission? Two things to try: (1) remove the h_vmem request from Canu's jobSubmit script and run that by hand; (2) increase the h_vmem request (double?) to see if that will let jobs run.

IyadKandalaft commented 5 years ago

h_vmem can be configured in SGE as consumable (albeit against the definition). As per that serverfault post, h_vmem is not recommended as an enforcement tactic, which is one of the reasons I didn't implement it as such. Just to clarify, SGE by default multiplies the requested h_vmem by the # of slots before applying it as a job limit. Hence, "-l h_vmem=1G -pe smp 8" sets a job limit of 8GB and not 1GB as skoren indicated.

Yes, I agree h_vmem is the wrong thing to use. But, h_rss isn't enforced on linux and we don't have a consumable mem variable yet on this HPC. Lastly, h_vmem is not configured as a required complex. Tayab was already using gridEngineMemoryOption="-l h_vmem=MEMORY" as an option and I merely suggested to add gridEngineThreadsOption="-pe smp THREADS" to specify how many slots are used per job submission based on the CANU docs. skoren indicated that this would NOT work and I don't understand why that is. Maybe I misunderstood what the gridEngineThreadsOption is for or how CANU requests slots?

For your suggestions 1) Removing h_vmem should work as long as not too many jobs are scheduled on the same node, which has been the case for some of my users. 2) Doubling h_vmem might work assuming we don't run into an unpredictable Java memory over-allocation issue. Also, I assume you mean doubling the highest memory job requirement (i.e. 2 x bogart = 32GB)?

Do you know if CANU sets Java -Xmx to the same value as the "MEMORY" parameter it replaces in gridEngineMemoryOption?

skoren commented 5 years ago

The threads option is already included and the jobs still did not run so adding gridEngineThreadsOption wouldn't change anything. You can see the -pe smp 16 was already included. Canu is already only requesting 90% of the allocated memory for the JVM and restricting GC threads too. The confusion came from whether h_vmem is per slot or per core since the SGE documentation doesn't make this very clear.

  1. That's why I suggested relying on the implicit memory per core which will limit jobs scheduled on a single node. Ideally you'd add a consumable memory option, that would solve overloading of nodes.
  2. Canu does replace MEMORY with what it wants to use. You'd have to either use the gridOptions* (canu -options |grep gridOptions) options to pass specific -l h_vmem parameters to over-subscribe memory. Alternatively, you could provide a value double the max per core that will be used (4gb in this case) without MEMORY (e.g. gridEngineMemoryOption="-l h_vmem=8g). However, this changes from run to run so you'd have to adjust parameters every time. While you're modifying code, rather than removing the division you can divide by half the threads to accomplish the same effect.

Fundamentally, I agree with @brianwalenz, the VM isn't the right thing to limit for jobs. There are legitimate uses for using VM without overloading a node's physical ram and it's hard to predict overheads of system tools (like the JVM or file I/O which vary between systems) which can cause crashes that we can't control. You could use h_rss or mem_free but neither of these seem to actually enforce their limit. SGE seems to be mis-designed here. There is no way to enforce a resident limit not a virtual limit which other grid engines, like SLURM handle properly.

skoren commented 5 years ago

Idle, original poster issue resolved by requesting more cores/memory.