marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Don't panic, but a mostly harmless error occurred and Canu stopped.Please help. #958

Closed AndrewLeeGla closed 6 years ago

AndrewLeeGla commented 6 years ago

Hi, anybody could help me? I have a problem with canu. I build a bash file, then I use "qsub" command to submit this task to our server. It always fails. So I use "bash" command to run this file directly, here is the output of the Shell. Here is the file.

#!/bin/bash
#PBS -N Gla_canu_1

#PBS -l nodes=node76:ppn=48

#PBS -l walltime=150:00:00
#PBS -j oe
#PBS -q superfat 
cd /histor/Gla
export PATH=/histor/software/java/jre1.8.0_171/bin:$PATH
canu useGrid=false -d /histor/Gla -p Gla genomeSize=100m -pacbio-raw 1.fastq -nanopore-raw 2.fastq

What happened? Does my task file wrong? What should I do to finish my task?
Wish your help. Thank you.

-- Canu 1.7
--
-- CITATIONS
--
...
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_171' (from 'java').
-- Detected gnuplot version '4.2 patchlevel 6 ' (from 'gnuplot') and image format 'png'.
-- Detected 24 CPUs and 63 gigabytes of memory.
-- Detected PBS/Torque '4.2.7' with 'pbsnodes' binary in /opt/gridview/pbs/dispatcher/bin/pbsnodes.
-- Grid engine disabled per useGrid=false option.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl     63 GB   16 CPUs x   1 job     63 GB   16 CPUs  (k-mer counting)
-- Local: cormhap   13 GB   12 CPUs x   2 jobs    26 GB   24 CPUs  (overlap detection with mhap)
-- Local: obtovl     8 GB    8 CPUs x   3 jobs    24 GB   24 CPUs  (overlap detection)
-- Local: utgovl     8 GB    8 CPUs x   3 jobs    24 GB   24 CPUs  (overlap detection)
-- Local: ovb        2 GB    1 CPU  x  24 jobs    48 GB   24 CPUs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x   7 jobs    56 GB    7 CPUs  (overlap store sorting)
-- Local: red        8 GB    4 CPUs x   6 jobs    48 GB   24 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x  15 jobs    60 GB   15 CPUs  (overlap error adjustment)
-- Local: bat       63 GB    8 CPUs x   1 job     63 GB    8 CPUs  (contig construction)
-- Local: gfa        8 GB    8 CPUs x   1 job      8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'Glauconema_picbio_nanoporeraw.gkpStore', found both PacBio and Nanopore reads:
--   Raw:        1165390
--   Corrected:  0
--   Trimmed:    0
--
-- Generating assembly 'Glauconema_picbio_nanoporeraw' in '/histor/zhao/zhaolab/lc/Glauconema/canu_pacbio_nanoporeraw_assembled'
--
-- Parameters:
--
--  genomeSize        100000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
--
--
-- BEGIN CORRECTION
--
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sun Jun 17 01:36:29 2018 with 153364.179 GB free disk space (1 processes; 1 concurrently)

    cd correction/0-mercounts
    ./meryl.sh 1 > ./meryl.000001.out 2>&1

-- Finished on Sun Jun 17 01:36:29 2018 (lickety-split) with 153364.179 GB free disk space
----------------------------------------
--
-- Meryl failed, retry.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sun Jun 17 01:36:29 2018 with 153364.178 GB free disk space (1 processes; 1 concurrently)

    cd correction/0-mercounts
    ./meryl.sh 1 > ./meryl.000001.out 2>&1

-- Finished on Sun Jun 17 01:36:29 2018 (lickety-split) with 153364.178 GB free disk space
----------------------------------------
--
-- Meryl failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

by the way, followed the task_name.o[number] file:

-- Canu 1.7
--
-- CITATIONS
--
...
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_171' (from 'java').
-- Detected 128 CPUs and 2020 gigabytes of memory.
-- Detected PBS/Torque '4.2.7' with 'pbsnodes' binary in /opt/gridview/pbs/dispatcher/bin/pbsnodes.
-- Detecting PBS/Torque resources.
-- 
-- Found   1 host  with  48 cores and  504 GB memory under PBS/Torque control.
-- Found   9 hosts with  16 cores and  126 GB memory under PBS/Torque control.
-- Found   3 hosts with  28 cores and  126 GB memory under PBS/Torque control.
-- Found   1 host  with  32 cores and  126 GB memory under PBS/Torque control.
-- Found  29 hosts with  24 cores and   47 GB memory under PBS/Torque control.
-- Found  17 hosts with  28 cores and  125 GB memory under PBS/Torque control.
-- Found   4 hosts with  28 cores and  757 GB memory under PBS/Torque control.
-- Found   1 host  with 128 cores and 2019 GB memory under PBS/Torque control.
-- Found   1 host  with  24 cores and   39 GB memory under PBS/Torque control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     64 GB   16 CPUs  (k-mer counting)
-- Grid:  cormhap   13 GB    8 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     8 GB    8 CPUs  (overlap detection)
-- Grid:  ovb        2 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        6 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       64 GB    8 CPUs  (contig construction)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'Gla_picbio_nanoporeraw.gkpStore', found both PacBio and Nanopore reads:
--   Raw:        1165390
--   Corrected:  0
--   Trimmed:    0
--
-- Generating assembly 'Glauconema_picbio_nanoporeraw' in '/histor/Gla'
--
-- Parameters:
--
--  genomeSize        100000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
--
--
-- BEGIN CORRECTION
--
--
-- Running jobs.  First attempt out of 2.
--
-- 'meryl.jobSubmit-01.sh' -> job 518953[].manager task 1.
--
----------------------------------------
-- Starting command on Mon Jun 11 07:22:48 2018 with 152636.181 GB free disk space

    cd /histor/Gla
    qsub \
      -j oe \
      -d `pwd` \
      -W depend=afteranyarray:518953[].manager \
      -l mem=8g \
      -l nodes=1:ppn=1   \
      -N 'canu_Gla_picbio_nanoporeraw' \
      -o canu-scripts/canu.07.out canu-scripts/canu.07.sh
518954.manager

-- Finished on Mon Jun 11 07:22:48 2018 (lickety-split) with 152636.176 GB free disk space
brianwalenz commented 6 years ago

The immediate exit of 'meryl' makes me think it's not installed correctly.

What's in 'correction/0-mercounts/meryl.000001.out'?

What is reported if you run ./meryl.sh 0 from that same directory?

The preferred way to run canu on a grid is to set your grid-specific options in "gridOptions", and run canu directly on the head node. Canu will submit itself to the grid.

AndrewLeeGla commented 6 years ago

Here is correction/0-mercounts/meryl.000001.out.

seqFactory::openFile()-- Cannot determine type of file '../Glauconema_picbio_nanoporecorrected.gkpStore'. Tried: seqFactory::openFile()-- 'FastA' seqFactory::openFile()-- 'FastAstream' seqFactory::openFile()-- 'Fastq' seqFactory::openFile()-- 'FastQstream' seqFactory::openFile()-- 'seqStore' seqFactory::openFile()-- 'GKSTORE' ./meryl.sh: line 58: 26255 segmentation fault (core dumped) $bin/estimate-mer-threshold -h ./Glauconema_picbio_nanoporecorrected.ms16.histogram -c 0 > ./Glauconema_picbio_nanoporecorrected.ms16.estMerThresh.out.WORKING 2> ./Glauconema_picbio_nanoporecorrected.ms16.estMerThresh.err

When I run ./meryl.sh 0 from this directory followed information displayed.

seqFactory::openFile()-- Cannot determine type of file '../Glauconema_picbio_nanoporecorrected.gkpStore'. Tried: seqFactory::openFile()-- 'FastA' seqFactory::openFile()-- 'FastAstream' seqFactory::openFile()-- 'Fastq' seqFactory::openFile()-- 'FastQstream' seqFactory::openFile()-- 'seqStore' seqFactory::openFile()-- 'GKSTORE' ./meryl.sh: line 58: 14527 segmentation fault (core dumped) $bin/estimate-mer-threshold -h ./Glauconema_picbio_nanoporecorrected.ms16.histogram -c 0 > ./Glauconema_picbio_nanoporecorrected.ms16.estMerThresh.out.WORKING 2> ./Glauconema_picbio_nanoporecorrected.ms16.estMerThresh.err

What should I do? Wish your help. By the way, I want to run canu on a superfat. The superfat don't have a queen system like PBS. Will it work if I run canu with the option "-t 16"(I want to run canu with 16 CPUs rather than with the entire server)? If it won't work, which option should I choose? Thank you very much!

brianwalenz commented 6 years ago

What files (ls -l) exist in Glauconema_picbio_nanoporecorrected.gkpStore? One of the files it needs isn't there. Is anything odd reported in Glauconema_picbio_nanoporecorrected.gkpStore.err?

You'll want to use maxThreads=16 and maxMemory= to limit both CPU and memory usage. 32 to 48GB memory is reasonable.

AndrewLeeGla commented 6 years ago

Thank you for your reply. Copy it. Glauconema_picbio_nanoporecorrected.gkpStore contents:

-rw-rw-r-- 1 zhaolab zhaolab 3468222464 Jun 12 00:40 blobs -rw-rw-r-- 1 zhaolab zhaolab 38273024 Jun 12 00:40 errorLog -rw-rw-r-- 1 zhaolab zhaolab 0 Jun 12 00:54 libraries.txt -rw-rw-r-- 1 zhaolab zhaolab 0 Jun 12 00:30 load.dat -rw-rw-r-- 1 zhaolab zhaolab 69005312 Jun 12 00:40 readNames.txt

But I don't know what you say about Glauconema_picbio_nanoporecorrected.gkpStore.err. I didn't find the file named that. Do you mean Glauconema_picbio_nanoporecorrected.gkpStore/errorLog ? it's a huge file, all things in that are like this:

read 'add32aeb-0c48-4b23-ba93-d501fc98c097 runid=5e01e6b7010f3d0257811f99e5a9743d5fdb96a9 read=12814 ch=165 start_time=2018-04-13T13:21:48Z' of length 387 in file '/histor/zhao/zhaolab/lc/Glauconema/Gla_nanopore/Gla_nanopore_data/Gla_nanopore_aL_q8_flowcell_1.fastq' at line 1421145 is too short, skipping.

Or you mean the file 0-mercounts/Glauconema_picbio_nanoporecorrected.ms16.estMerThresh.err? There is nothing in this file.

Hope for your reply.

brianwalenz commented 6 years ago

Weird. It's missing the 'reads' file which stores the read metadata, and the 'libraries' with metadata on each input file.

So, the only thing you can do is to delete this attempt and try again. :-(

AndrewLeeGla commented 6 years ago

alright. I removed all the files and directories under the assembled directory. Then I tried to run this file directly by type bash filename.sh in Shell. This time it didn't stop at the meryl step. So I qsubed this task. Now it's in the queen. Thank you very much. But I have another question. It's the same problem on another server. Here is the file

> #!/bin/bash
> #PBS -k o
> #PBS -l nodes=1:ppn=16,walltime=300:00:00
> #PBS -m abe
> #PBS -N canu_asm_second
> #PBS -l mem=250gb,pmem=250b,vmem=250gb
> module load gnuplot
> module unload java
> module load java
> module load canu
> cd /canu_assembled_genome/
> 
> /software/canu/Linux-amd64/bin/canu useGrid=false -d /canu_assembled_genome/canu_assemble_secondtime -p Gla_pac_nanopore genomeSize=100m -pacbio-raw /Gla_pacbio/Gla_pacbio_data/Gla_pacbio_all.fastq -nanopore-raw /Gla_nanopore/Gla_nanopore_data/Gla_nanopore_aL_q8_flowcell_1.fastq
> 

I directly run this file on the Shell use bash command.
Followed is the feedback of the Shell.

> gd version 2.2.4 loaded.
> libcerf (GNU) version 1.5 loaded.
> gnuplot version 5.0.6 loaded.
> Sun/Oracle Java SE Development Kit version 1.8.0_131 loaded.
> canu version 1.6 loaded.
> -- Canu snapshot v1.7 +251 changes (r8943 be1d566a783eb4a04dda3fb4993f508b49109aef)
> --
> -- CITATIONS
> --
> -- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
> -- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
> -- Genome Res. 2017 May;27(5):722-736.
> -- http://doi.org/10.1101/gr.215087.116
> -- 
> -- Read and contig alignments during correction, consensus and GFA building use:
> --   Šošic M, Šikic M.
> --   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
> --   Bioinformatics. 2017 May 1;33(9):1394-1395.
> --   http://doi.org/10.1093/bioinformatics/btw753
> -- 
> -- Overlaps are generated using:
> --   Berlin K, et al.
> --   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
> --   Nat Biotechnol. 2015 Jun;33(6):623-30.
> --   http://doi.org/10.1038/nbt.3238
> -- 
> --   Myers EW, et al.
> --   A Whole-Genome Assembly of Drosophila.
> --   Science. 2000 Mar 24;287(5461):2196-204.
> --   http://doi.org/10.1126/science.287.5461.2196
> -- 
> -- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
> --   Chin CS, et al.
> --   Phased diploid genome assembly with single-molecule real-time sequencing.
> --   Nat Methods. 2016 Dec;13(12):1050-1054.
> --   http://doi.org/10.1038/nmeth.4035
> -- 
> -- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
> --   Chin CS, et al.
> --   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
> --   Nat Methods. 2013 Jun;10(6):563-9
> --   http://doi.org/10.1038/nmeth.2474
> -- 
> -- CONFIGURE CANU
> --
> -- Detected Java(TM) Runtime Environment '1.8.0_131' (from '/N/soft/rhel7/java/1.8.0_131//bin/java') with -d64 support.
> -- Detected gnuplot version '5.0 patchlevel 6' (from 'gnuplot') and image format 'svg'.
> -- Detected 48 CPUs and 252 gigabytes of memory.
> -- Detected PBS/Torque '6.0.2' with 'pbsnodes' binary in /usr/bin/pbsnodes.
> -- Grid engine disabled per useGrid=false option.
> --
> --                            (tag)Concurrency
> --                     (tag)Threads          |
> --            (tag)Memory         |          |
> --        (tag)         |         |          |     total usage     algorithm
> --        -------  ------  --------   --------  -----------------  -----------------------------
> -- Local: meryl     64 GB   16 CPUs x   1 job     64 GB   16 CPUs  (k-mer counting)
> -- Local: cormhap   13 GB   16 CPUs x   3 jobs    39 GB   48 CPUs  (overlap detection with mhap)
> -- Local: obtovl     8 GB    8 CPUs x   6 jobs    48 GB   48 CPUs  (overlap detection)
> -- Local: utgovl     8 GB    8 CPUs x   6 jobs    48 GB   48 CPUs  (overlap detection)
> -- Local: ovb        4 GB    1 CPU  x  48 jobs   192 GB   48 CPUs  (overlap store bucketizer)
> -- Local: ovs        8 GB    1 CPU  x  31 jobs   248 GB   31 CPUs  (overlap store sorting)
> -- Local: red       10 GB    6 CPUs x   8 jobs    80 GB   48 CPUs  (read error detection)
> -- Local: oea        4 GB    1 CPU  x  48 jobs   192 GB   48 CPUs  (overlap error adjustment)
> -- Local: bat       64 GB    8 CPUs x   1 job     64 GB    8 CPUs  (contig construction with bogart)
> -- Local: gfa        8 GB    8 CPUs x   1 job      8 GB    8 CPUs  (GFA alignment and processing)
> --
> -- In 'Gla_pac_nanopore.seqStore', found both PacBio and Nanopore reads:
> --   Raw:        1165390
> --   Corrected:  0
> --   Trimmed:    0
> --
> -- Generating assembly 'Gla_pac_nanopore' in '/canu_assembled_genome/canu_assemble_secondtime'
> --
> -- Parameters:
> --
> --  genomeSize        100000000
> --
> --  Overlap Generation Limits:
> --    corOvlErrorRate 0.3200 ( 32.00%)
> --    obtOvlErrorRate 0.1440 ( 14.40%)
> --    utgOvlErrorRate 0.1440 ( 14.40%)
> --
> --  Overlap Processing Limits:
> --    corErrorRate    0.5000 ( 50.00%)
> --    obtErrorRate    0.1440 ( 14.40%)
> --    utgErrorRate    0.1440 ( 14.40%)
> --    cnsErrorRate    0.1920 ( 19.20%)
> --
> --
> -- BEGIN CORRECTION
> --
> --
> -- Running jobs.  First attempt out of 2.
> ----------------------------------------
> -- Starting 'meryl' concurrent execution on Wed Jun 20 01:11:57 2018 with 526000.17 GB free disk space (1 processes; 1 concurrently)
> 
>     cd correction/0-mercounts
>     ./meryl.sh 1 > ./meryl.000001.out 2>&1
> 
> -- Finished on Wed Jun 20 01:12:48 2018 (51 seconds) with 525997.412 GB free disk space
> ----------------------------------------
> --
> -- Meryl failed, retry.
> --
> --
> -- Running jobs.  Second attempt out of 2.
> ----------------------------------------
> -- Starting 'meryl' concurrent execution on Wed Jun 20 01:12:48 2018 with 525997.412 GB free disk space (1 processes; 1 concurrently)
> 
>     cd correction/0-mercounts
>     ./meryl.sh 1 > ./meryl.000001.out 2>&1
> 
> -- Finished on Wed Jun 20 01:13:37 2018 (49 seconds) with 525993.082 GB free disk space
> ----------------------------------------
> --
> -- Meryl failed, tried 2 times, giving up.
> --
> 
> ABORT:
> ABORT: Canu snapshot v1.7 +251 changes (r8943 be1d566a783eb4a04dda3fb4993f508b49109aef)
> ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
> ABORT: Try restarting.  If that doesn't work, ask for help.
> ABORT:
> 

Here is correction/0-mercounts/meryl.000001.out.

> Computing 16 segments using 16 threads and 52416MB memory (7204MB if in one batch).
>   numMersActual      = 14060759264
>   mersPerBatch       = 4657250304
>   basesPerBatch      = 879890008
>   numBuckets         = 33554432 (25 bits)
>   bucketPointerWidth = 30
>   merDataWidth       = 7
> Computing segment 1 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 6 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 16 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 8 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 12 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 4 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 3 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 15 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 5 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 9 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 10 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 2 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 13 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 11 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 14 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 7 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Counting mers in buckets:  878.93 Mmers -- 18.52 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.98 Mmers -- 18.49 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.92 Mmers -- 18.45 Mmers/second
>  Creating bucket pointers.
>  Releasing 128MB from counting the size of each bucket./second
>  Allocating 734MB for mer storage (7 bits wide).
>  Releasing 128MB from counting the size of each bucket./second
>  Allocating 734MB for mer storage (7 bits wide).
>  Counting mers in buckets:  877.98 Mmers -- 18.39 Mmers/second
>  Creating bucket pointers.
>  Releasing 128MB from counting the size of each bucket./second
>  Allocating 734MB for mer storage (7 bits wide).
>  Counting mers in buckets:  878.93 Mmers -- 18.38 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.91 Mmers -- 18.35 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.66 Mmers -- 18.35 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  877.97 Mmers -- 18.33 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.97 Mmers -- 18.34 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.93 Mmers -- 18.34 Mmers/second
>  Creating bucket pointers.
>  Releasing 128MB from counting the size of each bucket.
>  Allocating 734MB for mer storage (7 bits wide).
>  Releasing 128MB from counting the size of each bucket./second
>  Allocating 734MB for mer storage (7 bits wide).
>  Releasing 128MB from counting the size of each bucket./second
>  Counting mers in buckets:  878.92 Mmers -- 18.28 Mmers/second
>  Allocating 734MB for mer storage (7 bits wide).
>  Creating bucket pointers.
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
> Can fit 74516004864 mers into table with prefix of 28 bits, using 52428.000MB (0.000MB for positions)
> ./meryl.sh: line 35: 35928 Aborted                 $bin/meryl -B -C -L 2 -v -m 16 -threads 16 -memory 52428 -s ../../Gla_pac_nanopore.seqStore -o ./Gla_pac_nanopore.ms16.WORKING
> ./meryl.sh: line 60: 36532 Segmentation fault      $bin/estimate-mer-threshold -h ./Gla_pac_nanopore.ms16.histogram -c 140 > ./Gla_pac_nanopore.ms16.estMerThresh.out.WORKING 2> ./Gla_pac_nanopore.ms16.estMerThresh.err
> 

When I run ./meryl.sh 0 from this directory followed information displayed.
> Can fit 74516004864 mers into table with prefix of 28 bits, using 52428.000MB (0.000MB for positions)
> Computing 16 segments using 16 threads and 52416MB memory (7204MB if in one batch).
>   numMersActual      = 14060759264
>   mersPerBatch       = 4657250304
>   basesPerBatch      = 879890008
>   numBuckets         = 33554432 (25 bits)
>   bucketPointerWidth = 30
>   merDataWidth       = 7
> Computing segment 1 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 3 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 11 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 2 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 8 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 13 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 4 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 12 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 16 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 5 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 10 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 14 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
> Computing segment 6 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 7 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 9 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
> Computing segment 15 of 16.
>  Allocating 120MB for bucket pointer table (30 bits wide).
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Allocating 128MB for counting the size of each bucket.
>  Counting mers in buckets:  878.93 Mmers -- 18.54 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.91 Mmers -- 18.51 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  877.97 Mmers -- 18.46 Mmers/second
>  Creating bucket pointers.
>  Releasing 128MB from counting the size of each bucket.
>  Allocating 734MB for mer storage (7 bits wide).6 Mmers/second
>  Counting mers in buckets:  878.99 Mmers -- 18.44 Mmers/second
>  Creating bucket pointers.
>  Releasing 128MB from counting the size of each bucket./second
>  Allocating 734MB for mer storage (7 bits wide).
>  Releasing 128MB from counting the size of each bucket./second
>  Allocating 734MB for mer storage (7 bits wide).
>  Releasing 128MB from counting the size of each bucket./second
>  Allocating 734MB for mer storage (7 bits wide).
>  Counting mers in buckets:  878.98 Mmers -- 18.37 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.66 Mmers -- 18.36 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.97 Mmers -- 18.36 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.93 Mmers -- 18.33 Mmers/second
>  Creating bucket pointers.
>  Counting mers in buckets:  878.91 Mmers -- 18.31 Mmers/second
>  Creating bucket pointers.
>  Releasing 128MB from counting the size of each bucket.
>  Allocating 734MB for mer storage (7 bits wide).
>  Releasing 128MB from counting the size of each bucket.
>  Allocating 734MB for mer storage (7 bits wide).
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
> ./meryl.sh: line 35: 42313 Aborted                 $bin/meryl -B -C -L 2 -v -m 16 -threads 16 -memory 52428 -s ../../Gla_pac_nanopore.seqStore -o ./Gla_pac_nanopore.ms16.WORKING
> ./meryl.sh: line 60: 42996 Segmentation fault      $bin/estimate-mer-threshold -h ./Gla_pac_nanopore.ms16.histogram -c 140 > ./Gla_pac_nanopore.ms16.estMerThresh.out.WORKING 2> ./Gla_pac_nanopore.ms16.estMerThresh.err
> 

Here is the [taskname].o[number] file:

> 409946[].s1
> 409947.s1

And here is the [taskname].e[number] file:

> gd version 2.2.4 loaded.
> libcerf (GNU) version 1.5 loaded.
> gnuplot version 5.0.6 loaded.
> Sun/Oracle Java SE Development Kit version 1.8.0_131 loaded.
> -- Detected Java(TM) Runtime Environment '1.8.0_131' (from '/N/soft/rhel7/java/1.8.0_131//bin/java').
> -- Detected 24 CPUs and 251 gigabytes of memory.
> Version: 6.0.2
> Commit: d9a34839a0f975d5c487bbfcf5dcb10b6a8f1e79
> -- Detected PBS/Torque '' with 'pbsnodes' binary in /bin/pbsnodes.
> Version: 6.0.2
> Commit: d9a34839a0f975d5c487bbfcf5dcb10b6a8f1e79
> -- Detecting PBS/Torque resources.
> -- 
> -- Found   1 host  with  48 cores and  251 GB memory under PBS/Torque control.
> -- Found   3 hosts with  16 cores and  503 GB memory under PBS/Torque control.
> -- Found  71 hosts with  24 cores and  251 GB memory under PBS/Torque control.
> -- Found   1 host  with  32 cores and  251 GB memory under PBS/Torque control.
> -- Found   8 hosts with  24 cores and  503 GB memory under PBS/Torque control.
> --
> -- Allowed to run under grid control, and use up to   8 compute threads and   41 GB memory for stage 'bogart (unitigger)'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   13 GB memory for stage 'mhap (overlapper)'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   13 GB memory for stage 'mhap (overlapper)'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   13 GB memory for stage 'mhap (overlapper)'.
> -- Allowed to run under grid control, and use up to   4 compute threads and    6 GB memory for stage 'read error detection (overlap error adjustment)'.
> -- Allowed to run under grid control, and use up to   1 compute thread  and    2 GB memory for stage 'overlap error adjustment'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   41 GB memory for stage 'utgcns (consensus)'.
> -- Allowed to run under grid control, and use up to   1 compute thread  and    4 GB memory for stage 'overlap store parallel bucketizer'.
> -- Allowed to run under grid control, and use up to   1 compute thread  and    8 GB memory for stage 'overlap store parallel sorting'.
> -- Allowed to run under grid control, and use up to   1 compute thread  and    5 GB memory for stage 'overlapper'.
> -- Allowed to run under grid control, and use up to   8 compute threads and    8 GB memory for stage 'overlapper'.
> -- Allowed to run under grid control, and use up to   8 compute threads and    8 GB memory for stage 'overlapper'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   41 GB memory for stage 'meryl (k-mer counting)'.
> -- Allowed to run under grid control, and use up to   4 compute threads and   20 GB memory for stage 'falcon_sense (read correction)'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   13 GB memory for stage 'minimap (overlapper)'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   13 GB memory for stage 'minimap (overlapper)'.
> -- Allowed to run under grid control, and use up to   8 compute threads and   13 GB memory for stage 'minimap (overlapper)'.
> --
> -- This is canu parallel iteration #1, out of a maximum of 2 attempts.
> --
> -- Final error rates before starting pipeline:
> --   
> --   genomeSize          -- 100000000
> --   errorRate           -- 0.10
> --   
> --   corOvlErrorRate     -- 0.3
> --   obtOvlErrorRate     -- 0.3
> --   utgOvlErrorRate     -- 0.3
> --   
> --   obtErrorRate        -- 0.3
> --   
> --   cnsErrorRate        -- 0.3
> --
> --
> -- BEGIN CORRECTION
> --
> -- Meryl attempt 1 begins.
> ----------------------------------------
> -- Starting command on Thu Jun 14 05:55:02 2018 with 602948.575 GB free disk space
> 
>       qsub \
>         -l mem=41g -l nodes=1:ppn=8 \
>         -d `pwd` -N "meryl_Gla_pac_nanopore" \
>         -t 1-1 \
>         -j oe -o /canu_assembled_genome/canu_assemble_secondtime/correction/0-mercounts/meryl.\$PBS_ARRAYID.out \
>         /canu_assembled_genome/canu_assemble_secondtime/correction/0-mercounts/meryl.sh -F "0"
>     
> warning: vmem resource not provided, default vmem of 16gb will be applied.  See /etc/motd for details.
> 
> -- Finished on Thu Jun 14 05:55:03 2018 (1 second) with 602948.575 GB free disk space
> ----------------------------------------
> ----------------------------------------
> -- Starting command on Thu Jun 14 05:55:03 2018 with 602948.575 GB free disk space
> 
>     qsub \
>       -l mem=8g \
>       -l nodes=1:ppn=1   \
>       -W depend=afteranyarray:409946[].s1 \
>       -d `pwd` \
>       -N "canu_Gla_pac_nanopore" \
>       -j oe \
>       -o /canu_assembled_genome/canu_assemble_secondtime/canu-scripts/canu.04.out /canu_assembled_genome/canu_assemble_secondtime/canu-scripts/canu.04.sh
> warning: vmem resource not provided, default vmem of 16gb will be applied.  See /etc/motd for details.
> 
> -- Finished on Thu Jun 14 05:55:03 2018 (lickety-split) with 602948.575 GB free disk space
> ----------------------------------------
> 

It's real a long story. Thank you for your patience and time. Hope your help. Thanks!

AndrewLeeGla commented 6 years ago

By the way, on this server, it doesn't work with deleting all the attempt. Thank you.

AndrewLeeGla commented 6 years ago

OK, I don't know what happened. I just deleted all the files and directories under the assembly directory. Then I qsub the file. Now the task has been run forty minutes. It seemed that it can run normally right now. Anyway, thank you very much! Have a nice day!

brianwalenz commented 6 years ago

In the long story comment, Meryl seems to be running out of memory. Whatever pmem is in your submit script, it's asking for 250 bytes.

 #PBS -l mem=250gb,pmem=250b,vmem=250gb

I'd prefer you let canu do the job submission, instead of submitting it to a single node with useGrid=false. It looks like Canu is ignoring the useGrid flag and trying to do this anyway.

The warning:

warning: vmem resource not provided, default vmem of 16gb will be applied. See /etc/motd for details.

is because canu doesn't know how to request memory. You need to set Canu's gridEngineMemoryOption to supply this to PBS (possibly gridEngineMemoryOption="-l vmem=MEMORY" will work). See http://canu.readthedocs.io/en/latest/parameter-reference.html#grid-engine-support

AndrewLeeGla commented 6 years ago

Ok, I will try it. Thank you very much!

skoren commented 6 years ago

Idle, closing.