marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
659 stars 179 forks source link

Canu possible failed on PBSpro #1090

Closed mictadlo closed 6 years ago

mictadlo commented 6 years ago

Hello, I ran Canu in the following way:

conda create --name canu171 canu
conda activate canu171
canu -d canu2 -p canu2 useGrid=true genomeSize=1.2g -pacbio-corrected pacbio-corrected/*.fasta "batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50" gridOptionsJobName=C852 "gridOptions=-V -l walltime=150:00:00 -W umask=0007"

Here is the console output:

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_152-release' (from '/work/waterhouse_team/miniconda2/envs/canu171/bin/java').
-- Detected gnuplot version '5.2 patchlevel 3' (from 'gnuplot') and image format 'png'.
-- Detected 384 CPUs and 5795 gigabytes of memory.
-- Detected PBSPro '14.2.4.20171012010902' with 'pbsnodes' binary in /opt/pbs/bin/pbsnodes.
-- Grid engine disabled per useGrid=false option.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl    256 GB   32 CPUs x   1 job    256 GB   32 CPUs  (k-mer counting)
-- Local: cormhap   32 GB   16 CPUs x  24 jobs   768 GB  384 CPUs  (overlap detection with mhap)
-- Local: obtovl    16 GB   16 CPUs x  24 jobs   384 GB  384 CPUs  (overlap detection)
-- Local: utgovl    16 GB   16 CPUs x  24 jobs   384 GB  384 CPUs  (overlap detection)
-- Local: ovb        4 GB    1 CPU  x 384 jobs  1536 GB  384 CPUs  (overlap store bucketizer)
-- Local: ovs       32 GB    1 CPU  x 181 jobs  5792 GB  181 CPUs  (overlap store sorting)
-- Local: red        8 GB    4 CPUs x  96 jobs   768 GB  384 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x 384 jobs  1536 GB  384 CPUs  (overlap error adjustment)
-- Local: bat      256 GB   16 CPUs x   1 job    256 GB   16 CPUs  (contig construction)
-- Local: gfa       16 GB   16 CPUs x   1 job     16 GB   16 CPUs  (GFA alignment and processing)
--
-- Found PacBio corrected reads in the input files.
--
-- Generating assembly 'canu' in '/lustre/scratch/waterhouse_team/banana/852/canu'
--
-- Parameters:
--
--  genomeSize        1200000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN TRIMMING
--
----------------------------------------
-- Starting command on Thu Sep 13 07:44:58 2018 with 576805.164 GB free disk space

    cd .
    /lustre/work-lustre/waterhouse_team/miniconda2/envs/canu171/libexec/bin/gatekeeperCreate \
      -minlength 1000 \
      -o ./canu.gkpStore.BUILDING \
      ./canu.gkpStore.gkp \
    > ./canu.gkpStore.BUILDING.err 2>&1

-- Finished on Thu Sep 13 08:19:43 2018 (2085 seconds) with 576775.342 GB free disk space
----------------------------------------
--
-- In gatekeeper store './canu.gkpStore':
--   Found 6270841 reads.
--   Found 109174862665 bases (90.97 times coverage).
--
--   Read length histogram (one '*' equals 23782.91 reads):
--        0   4999  46600 *
--     5000   9999 1664804 **********************************************************************
--    10000  14999 1307926 ******************************************************
--    15000  19999 1210813 **************************************************
--    20000  24999 810500 **********************************
--    25000  29999 510572 *********************
--    30000  34999 317274 *************
--    35000  39999 188294 *******
--    40000  44999 105826 ****
--    45000  49999  56178 **
--    50000  54999  28151 *
--    55000  59999  13582 
--    60000  64999   6010 
--    65000  69999   2637 
--    70000  74999   1090 
--    75000  79999    411 
--    80000  84999    122 
--    85000  89999     42 
--    90000  94999      8 
--    95000  99999      0 
--   100000 104999      0 
--   105000 109999      0 
--   110000 114999      0 
--   115000 119999      0 
--   120000 124999      0 
--   125000 129999      0 
--   130000 134999      0 
--   135000 139999      0 
--   140000 144999      1 
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Thu Sep 13 08:21:02 2018 with 576775.369 GB free disk space (1 processes; 1 concurrently)

    cd trimming/0-mercounts
    ./meryl.sh 1 > ./meryl.000001.out 2>&1

-- Finished on Thu Sep 13 12:09:13 2018 (13691 seconds) with 575748.409 GB free disk space
----------------------------------------
-- Meryl finished successfully.
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 23570188139 *******************************************************************--> 0.8929 0.2162
--       2-     2 1316118841 ********************************************************************** 0.9428 0.2403
--       3-     4 615135039 ********************************                                       0.9585 0.2517
--       5-     7 248844359 *************                                                          0.9705 0.2645
--       8-    11 114714413 ******                                                                 0.9770 0.2751
--      12-    16  60412219 ***                                                                    0.9805 0.2836
--      17-    22  36021595 *                                                                      0.9824 0.2906
--      23-    29  24146265 *                                                                      0.9837 0.2967
--      30-    37  17944371                                                                        0.9845 0.3022
--      38-    46  14539737                                                                        0.9852 0.3076
--      47-    56  12751267                                                                        0.9857 0.3131
--      57-    67  13344839                                                                        0.9862 0.3191
--      68-    79  20561279 *                                                                      0.9867 0.3270
--      80-    92  34540665 *                                                                      0.9875 0.3417
--      93-   106  36211843 *                                                                      0.9888 0.3699
--     107-   121  20269484 *                                                                      0.9902 0.4023
--     122-   137   9519554                                                                        0.9909 0.4222
--     138-   154  10499649                                                                        0.9912 0.4332
--     155-   172  26883636 *                                                                      0.9917 0.4479
--     173-   191  63860856 ***                                                                    0.9927 0.4912
--     192-   211  69063355 ***                                                                    0.9952 0.6015
--     212-   232  26011361 *                                                                      0.9978 0.7260
--     233-   254   4954600                                                                        0.9987 0.7748
--     255-   277   2565598                                                                        0.9989 0.7851
--     278-   301   2187151                                                                        0.9989 0.7913
--     302-   326   1897335                                                                        0.9990 0.7971
--     327-   352   1870969                                                                        0.9991 0.8025
--     353-   379   2085248                                                                        0.9992 0.8083
--     380-   407   2014924                                                                        0.9993 0.8154
--     408-   436   1766621                                                                        0.9993 0.8226
--     437-   466   1551363                                                                        0.9994 0.8294
--     467-   497   1412976                                                                        0.9995 0.8358
--     498-   529   1382423                                                                        0.9995 0.8421
--     530-   562   1224239                                                                        0.9996 0.8486
--     563-   596   1086300                                                                        0.9996 0.8547
--     597-   631    961617                                                                        0.9996 0.8604
--     632-   667    757542                                                                        0.9997 0.8658
--     668-   704    666055                                                                        0.9997 0.8703
--     705-   742    564679                                                                        0.9997 0.8745
--     743-   781    514639                                                                        0.9998 0.8782
--     782-   821    483349                                                                        0.9998 0.8818
--
--    11186619 (max occurrences)
-- 85472986865 (total mers, non-unique)
--  2826804565 (distinct mers, non-unique)
-- 23570188139 (unique mers)
----------------------------------------
-- Starting command on Thu Sep 13 12:09:19 2018 with 575751.42 GB free disk space

    cd trimming/0-mercounts
    /lustre/work-lustre/waterhouse_team/miniconda2/envs/canu171/libexec/bin/meryl \
      -Dt \
      -n 4501 \
      -s ./canu.ms22 \
    > ./canu.ms22.frequentMers.fasta \
    2> ./canu.ms22.frequentMers.fasta.err

-- Finished on Thu Sep 13 12:19:01 2018 (582 seconds) with 576028.808 GB free disk space
----------------------------------------
-- Reset obtOvlMerThreshold from auto to 4501.
--
-- Found 109043175004 22-mers; 26396992704 distinct and 23570188139 unique.  Largest count 11186619.
--
-- OVERLAPPER (normal) (trimming) erate=0.045
--
----------------------------------------
-- Starting command on Thu Sep 13 12:19:01 2018 with 576028.808 GB free disk space

    cd trimming/1-overlapper
    /lustre/work-lustre/waterhouse_team/miniconda2/envs/canu171/libexec/bin/overlapInCorePartition \
     -G  ../canu.gkpStore \
     -hl 341248000 \
     -rl 15000000000 \
     -ol 500 \
     -o  ./canu.partition \
    > ./canu.partition.err 2>&1

-- Finished on Thu Sep 13 12:19:04 2018 (3 seconds) with 576028.858 GB free disk space
----------------------------------------
--
-- Configured 1335 overlapInCore jobs.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'obtovl' concurrent execution on Thu Sep 13 12:19:05 2018 with 576028.858 GB free disk space (1335 processes; 24 concurrently)

    cd trimming/1-overlapper
    ./overlap.sh 1 > ./overlap.000001.out 2>&1
    ./overlap.sh 2 > ./overlap.000002.out 2>&1
    ./overlap.sh 3 > ./overlap.000003.out 2>&1
    ./overlap.sh 4 > ./overlap.000004.out 2>&1
    ./overlap.sh 5 > ./overlap.000005.out 2>&1
    ./overlap.sh 6 > ./overlap.000006.out 2>&1
    ./overlap.sh 7 > ./overlap.000007.out 2>&1
    ./overlap.sh 8 > ./overlap.000008.out 2>&1
    ./overlap.sh 9 > ./overlap.000009.out 2>&1
    ./overlap.sh 10 > ./overlap.000010.out 2>&1
    ./overlap.sh 11 > ./overlap.000011.out 2>&1
    ./overlap.sh 12 > ./overlap.000012.out 2>&1
    ./overlap.sh 13 > ./overlap.000013.out 2>&1
    ./overlap.sh 14 > ./overlap.000014.out 2>&1
    ./overlap.sh 15 > ./overlap.000015.out 2>&1
    ./overlap.sh 16 > ./overlap.000016.out 2>&1
    ./overlap.sh 17 > ./overlap.000017.out 2>&1
    ./overlap.sh 18 > ./overlap.000018.out 2>&1
    ./overlap.sh 19 > ./overlap.000019.out 2>&1
    ./overlap.sh 20 > ./overlap.000020.out 2>&1
    ./overlap.sh 21 > ./overlap.000021.out 2>&1
    ./overlap.sh 22 > ./overlap.000022.out 2>&1
    ./overlap.sh 23 > ./overlap.000023.out 2>&1
    ./overlap.sh 24 > ./overlap.000024.out 2>&1
    ./overlap.sh 25 > ./overlap.000025.out 2>&1
    ./overlap.sh 26 > ./overlap.000026.out 2>&1
    ./overlap.sh 27 > ./overlap.000027.out 2>&1
    ./overlap.sh 28 > ./overlap.000028.out 2>&1
    ./overlap.sh 29 > ./overlap.000029.out 2>&1
    ./overlap.sh 30 > ./overlap.000030.out 2>&1
    ./overlap.sh 31 > ./overlap.000031.out 2>&1
    ./overlap.sh 32 > ./overlap.000032.out 2>&1
    ./overlap.sh 33 > ./overlap.000033.out 2>&1
    ./overlap.sh 34 > ./overlap.000034.out 2>&1
    ./overlap.sh 35 > ./overlap.000035.out 2>&1
    ./overlap.sh 36 > ./overlap.000036.out 2>&1
    ./overlap.sh 37 > ./overlap.000037.out 2>&1
    ./overlap.sh 38 > ./overlap.000038.out 2>&1
    ./overlap.sh 39 > ./overlap.000039.out 2>&1
    ./overlap.sh 40 > ./overlap.000040.out 2>&1
    ./overlap.sh 41 > ./overlap.000041.out 2>&1
    ./overlap.sh 42 > ./overlap.000042.out 2>&1
    ./overlap.sh 43 > ./overlap.000043.out 2>&1
    ./overlap.sh 44 > ./overlap.000044.out 2>&1
    ./overlap.sh 45 > ./overlap.000045.out 2>&1
    ./overlap.sh 46 > ./overlap.000046.out 2>&1
    ./overlap.sh 47 > ./overlap.000047.out 2>&1
    ./overlap.sh 48 > ./overlap.000048.out 2>&1
    ./overlap.sh 49 > ./overlap.000049.out 2>&1
    ./overlap.sh 50 > ./overlap.000050.out 2>&1
    ./overlap.sh 51 > ./overlap.000051.out 2>&1
    ./overlap.sh 52 > ./overlap.000052.out 2>&1
    ./overlap.sh 53 > ./overlap.000053.out 2>&1
    ./overlap.sh 54 > ./overlap.000054.out 2>&1
    ./overlap.sh 55 > ./overlap.000055.out 2>&1
    ./overlap.sh 56 > ./overlap.000056.out 2>&1
    ./overlap.sh 57 > ./overlap.000057.out 2>&1
    ./overlap.sh 58 > ./overlap.000058.out 2>&1
    ./overlap.sh 59 > ./overlap.000059.out 2>&1
    ./overlap.sh 60 > ./overlap.000060.out 2>&1
    ./overlap.sh 61 > ./overlap.000061.out 2>&1
    ./overlap.sh 62 > ./overlap.000062.out 2>&1
    ./overlap.sh 63 > ./overlap.000063.out 2>&1
    ./overlap.sh 64 > ./overlap.000064.out 2>&1
    ./overlap.sh 65 > ./overlap.000065.out 2>&1
    ./overlap.sh 66 > ./overlap.000066.out 2>&1
    ./overlap.sh 67 > ./overlap.000067.out 2>&1
    ./overlap.sh 68 > ./overlap.000068.out 2>&1

-----
PBS Job 3521218.pbs
CPU time  : 612:53:42
Wall time : 24:14:43
Mem usage : 192988368kb

The canu2 folder contains the following files:

ls -hl canu2 
total 16G
-rw-rw---- 1 lorencm default  16G Sep 14 02:34 canu2.trimmedReads.fasta.gz
drwxrwx--- 2 lorencm default 692K Sep 14 13:09 canu-logs
drwxrwx--- 2 lorencm default 4.0K Sep 14 13:02 canu-scripts
drwxrwx--- 7 lorencm default 4.0K Sep 14 01:52 trimming
-rw-rw---- 1 lorencm default  22K Sep 14 13:09 trimming.html
drwxrwx--- 2 lorencm default 4.0K Sep 14 02:02 trimming.html.files
drwxrwx--- 7 lorencm default 4.0K Sep 14 09:15 unitigging
-rw-rw---- 1 lorencm default 6.5K Sep 14 13:01 unitigging.html
drwxrwx--- 2 lorencm default 4.0K Sep 14 04:47 unitigging.html.files

What did I miss?

Thank you in advance.

Michal

brianwalenz commented 6 years ago

It looks like it might have been killed after 24 hours wall time.

Though you specified 'useGrid=true', Canu is reporting

-- Grid engine disabled per useGrid=false option.

Is this listed in ~/.canu or $bin/canu.defaults ($bin is wherever the canu executable is)? This is restricting canu to using only one (very big!) node. Since Canu isn't using its own grid support, your gridOption walltime is never used - something else submitted this to the grid.

skoren commented 6 years ago

Any updates on this?

skoren commented 6 years ago

Idle, grid was disabled per Canu config on system, nothing to fix.