marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

canu memory problem #2340

Closed sfh1111 closed 1 month ago

sfh1111 commented 2 months ago

Hi, I am trying to run canu with the following command:

bsub -q long -R "rusage[mem=16384]" -n 16 canu \
  -p pennellii_assembly \
  -d assembly_out/ \
  genomeSize=1.2g \
  -xxx.hifi_reads.fastq.gz \
  useGrid=true \
  maxMemory=16g \
  maxThreads=40 \
  merylMemory=8g \
  merylThreads=4 \
  batMemory=16g \
  batThreads=16 \
  corOutCoverage=100 \
  corMinCoverage=0 \
  mhapSensitivity=normal \
  overlapper=mhap \
  corErrorRate=0.105 \
  correctedErrorRate=0.045

and getting the following error:

-- Failed to submit compute jobs.  Delay 10 seconds and try again.

CRASH:
CRASH: canu 2.2
CRASH: Please panic, this is abnormal.
CRASH:
CRASH:   Failed to submit compute jobs.
CRASH:
CRASH: Failed at xxx.conda/envs/my_mamba_env/envs/canu_env/bin/../lib/site_perl/canu/Execution.pm line 1259.
CRASH:   canu::Execution::submitOrRunParallelJob("pennellii_assembly", "utgmhap", "unitigging/1-overlapper", "precompute", 1, 2, 3, 4, ...) called at xxx/.conda/envs/my_mamba_env/envs/canu_env/bin/../lib/site_perl/canu/OverlapMhap.pm line 674
CRASH:   canu::OverlapMhap::mhapPrecomputeCheck("pennellii_assembly", "utg", "normal") called at xxx/.conda/envs/my_mamba_env/envs/canu_env/bin/canu line 1002
CRASH:   main::overlap("pennellii_assembly", "utg") called at xxs/.conda/envs/my_mamba_env/envs/canu_env/bin/canu line 1120
CRASH:
CRASH: Last 50 lines of the relevant log file (unitigging/1-overlapper/precompute.jobSubmit-01.out):
CRASH:
CRASH: MEMORY LIMIT GREATER THEN MEMORY RESERVED, NOT PERMITED
CRASH: RESERVED MEMORY(MB): 1024 MEMORY LIMIT(MB): 16384
CRASH: Request aborted by esub. Job not submitted.
CRASH:  

Any suggestions?

skoren commented 2 months ago

It looks like the grid doesn't like the memory request from Canu's submission. It's possible you have to adjust the submit command, what is in unitigging/1-overlapper/precompute.jobSubmit-01.sh

You're also using the mhap overlapper for assembly which isn't recommended. I'd suggest removing all the following options:

 corOutCoverage=100 \
  corMinCoverage=0 \
  mhapSensitivity=normal \
  overlapper=mhap \
  corErrorRate=0.105 \
  correctedErrorRate=0.045

The cor options aren't used on hifi data and the default error rate and overlapper should be used.

katievigil commented 1 month ago

Hi! I was wondering the same thing, how to decrease the amount of memory Canu uses for bigger .fastq files (>10Gb)? I had a 20Gb .fastq file and ended up having to split this up into 3 seperate .fastq files and run in on a 1.5Tb bigmem compute node to run the three files using Canu. Is there a better way to change my script, so I dont have to do this in the future, I have another 39Gb sample? These are viral metagenomic sequences, so I dont know the genomeSize, but 2m has been working for me.

Canu parameters

canu_base_command="singularity exec -B /ddnB/work/,/ddnB/project,/project,/work /project/containers/images/canu.sif canu genomeSize=2m maxinputCoverage=10000 corOutCoverage=10000 corMhapSensitivity=high corMinCoverage=0 redMemory=32 oeaMemory=32 batMemory=200 correctedErrorRate=0.16 minInputCoverage=0 stopOnLowCoverage=0 -useGrid=false -nanopore"

Thanks! @skoren

skoren commented 1 month ago

@katievigil not really related to the above as that's a grid submission issue. None of the steps should be using that much memory, which step is taking that much?

katievigil commented 1 month ago

@skoren I moved to a new issue thanks for your help! https://github.com/marbl/canu/issues/2346

skoren commented 1 month ago

Idle, looks like an issue with the submit command.