marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

grid #317

Closed sunnycqcn closed 7 years ago

sunnycqcn commented 7 years ago

I used the commond as:

#PBS -l walltime=336:00:00
#PBS -q gcore
#PBS -l naccesspolicy=shared,nodes=1:ppn=20
cd $PBS_O_WORKDIR
module purge
module load bioinfo
module load canu
/home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/canu \
 -p asm -d strigaA \
 genomeSize=1638.1m \
 errorRate=0.035 \
  -pacbio-raw /scratch/snyder/f/fu115/Genome_assembly/fastq/seq/filtered_subreads.fastq

Then get the error as: Could you help me check what is wrong? Thanks, Fuyou

/var/spool/torque/mom_priv/jobs/469384.snyder-adm.rcac.purdue.edu.SC: line 4: fg: no job control
make: *** No targets specified and no makefile found.  Stop.
/var/spool/torque/mom_priv/jobs/469384.snyder-adm.rcac.purdue.edu.SC: line 6: make-dedicated: command not found
/var/spool/torque/mom_priv/jobs/469384.snyder-adm.rcac.purdue.edu.SC: line 7: thread: command not found
-- Detected Java(TM) Runtime Environment '1.8.0_111' (from '/group/bioinfo/apps/apps/jdk1.8.0_111/bin/java').
-- Detected gnuplot version '4.6 patchlevel 6' (from 'gnuplot') and image format 'png'.
-- Detected 20 CPUs and 505 gigabytes of memory.
-- Detected PBS/Torque '5.0.1' with 'pbsnodes' binary in /usr/pbs/bin/pbsnodes.
-- Detecting PBS/Torque resources.
-- 
-- Found  50 hosts with  20 cores and  252 GB memory under PBS/Torque control.
-- Found  16 hosts with  20 cores and  504 GB memory under PBS/Torque control.
--
-- Allowed to run under grid control, and use up to   4 compute threads and   16 GB memory for stage 'bogart (unitigger)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to   4 compute threads and    2 GB memory for stage 'read error detection (overlap error adjustment)'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    1 GB memory for stage 'overlap error adjustment'.
-- Allowed to run under grid control, and use up to   4 compute threads and   32 GB memory for stage 'utgcns (consensus)'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    4 GB memory for stage 'overlap store parallel bucketizer'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    8 GB memory for stage 'overlap store parallel sorting'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    6 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to   5 compute threads and    8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to   5 compute threads and    8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to   4 compute threads and    8 GB memory for stage 'meryl (k-mer counting)'.
-- Allowed to run under grid control, and use up to   2 compute threads and   16 GB memory for stage 'falcon_sense (read correction)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
--
-- This is canu parallel iteration #1, out of a maximum of 2 attempts.
--
-- Final error rates before starting pipeline:
--   
--   genomeSize          -- 4800000
--   errorRate           -- 0.015
--   
--   corOvlErrorRate     -- 0.045
--   obtOvlErrorRate     -- 0.045
--   utgOvlErrorRate     -- 0.045
--   
--   obtErrorRate        -- 0.045
--   
--   cnsErrorRate        -- 0.045
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Mon Dec 26 13:08:30 2016 with 739766.226 GB free disk space

    /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/gatekeeperCreate \
      -minlength 1000 \
      -o /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore.BUILDING \
      /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore.gkp \
    > /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore.BUILDING.err 2>&1

-- Finished on Mon Dec 26 13:08:31 2016 (1 second) with 739766.22 GB free disk space
----------------------------------------
--
-- In gatekeeper store '/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore':
--   Found 12528 reads.
--   Found 115899341 bases (24.14 times coverage).
--
--   Read length histogram (one '*' equals 20.62 reads):
--        0    999      0 
--     1000   1999   1444 **********************************************************************
--     2000   2999   1328 ****************************************************************
--     3000   3999   1065 ***************************************************
--     4000   4999    774 *************************************
--     5000   5999    668 ********************************
--     6000   6999    619 ******************************
--     7000   7999    618 *****************************
--     8000   8999    607 *****************************
--     9000   9999    560 ***************************
--    10000  10999    523 *************************
--    11000  11999    478 ***********************
--    12000  12999    429 ********************
--    13000  13999    379 ******************
--    14000  14999    366 *****************
--    15000  15999    353 *****************
--    16000  16999    329 ***************
--    17000  17999    297 **************
--    18000  18999    294 **************
--    19000  19999    283 *************
--    20000  20999    251 ************
--    21000  21999    195 *********
--    22000  22999    152 *******
--    23000  23999    132 ******
--    24000  24999     75 ***
--    25000  25999     66 ***
--    26000  26999     56 **
--    27000  27999     44 **
--    28000  28999     35 *
--    29000  29999     16 
--    30000  30999     21 *
--    31000  31999     18 
--    32000  32999     11 
--    33000  33999      8 
--    34000  34999      6 
--    35000  35999      6 
--    36000  36999     10 
--    37000  37999      2 
--    38000  38999      3 
--    39000  39999      2 
--    40000  40999      2 
--    41000  41999      2 
--    42000  42999      1 
-- Meryl attempt 1 begins.
----------------------------------------
-- Starting command on Mon Dec 26 13:08:32 2016 with 739766.22 GB free disk space

      qsub \
        -l mem=8g -l nodes=1:ppn=4 \
        -d `pwd` -N "meryl_ecoli" \
        -t 1-1 \
        -j oe -o /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.\$PBS_ARRAYID.out \
        /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.sh 

Array jobs are currently not supported.
See https://www.rcac.purdue.edu/news/detail.cfm?NewsID=616 for information
on converting your array job to a supported workflow.

qsub: Your job has been administratively rejected by the queueing system.
qsub: There may be a more detailed explanation prior to this notice.

-- Finished on Mon Dec 26 13:08:32 2016 (lickety-split) with 739766.22 GB free disk space
----------------------------------------
ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:
================================================================================
Please panic.  canu failed, and it shouldn't have.

Stack trace:

 at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/lib/canu/Execution.pm line 1305.
    canu::Execution::caFailure("Failed to submit batch jobs", undef) called at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/lib/canu/Execution.pm line 1010
    canu::Execution::submitOrRunParallelJob("/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli"..., "ecoli", "meryl", "/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli"..., "meryl", 1) called at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/lib/canu/Meryl.pm line 373
    canu::Meryl::merylCheck("/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli"..., "ecoli", "cor") called at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/canu line 470

canu failed with 'Failed to submit batch jobs'.
skoren commented 7 years ago

Your grid does not support array jobs which Canu requires to run. It is normally a standard feature of grid systems, yours is the first we've seen where the array jobs are administratively disabled.

I would ask if the admins can allow your jobs to run with array jobs as it would be non-trivial for you to modify Canu to run without array support. Otherwise, you would have to run Canu with useGrid=remote. Every time you get to a submit command like the above:

      qsub \
        -l mem=8g -l nodes=1:ppn=4 \
        -d `pwd` -N "meryl_ecoli" \
        -t 1-1 \
        -j oe -o /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.\$PBS_ARRAYID.out \
        /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.sh 

It would stop and you would need to manually edit the command as directed on your grid's support page: https://www.rcac.purdue.edu/news/detail.cfm?NewsID=616 Then, once that job is done re-run your Canu command which will pick up at the next step and stop again when it reaches the next array job to submit. Otherwise, you can submit the Canu command to a single node and run it with useGrid=false which will mean it will run on only a single instance which is OK for smaller genomes (<500mb).

sunnycqcn commented 7 years ago

Hi, Thanks, If I set the command as

PBS -l walltime=336:00:00

PBS -q gcore

PBS -l naccesspolicy=shared,nodes=2:ppn=20

cd $PBS_O_WORKDIR module purge module load bioinfo module load canu /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/canu \ -p asm -d strigaC \ genomeSize=1638.1m \ errorRate=0.035 \ -pacbio-raw /scratch/snyder/f/fu115/Genome_assembly/fastq/seq/filtered_subreads.fastq \ maxMemory=80g maxThreads=20 \ useGrid=true gridEngine="pbs" \ gridEngineThreadsOption="-pe smp THREADS" \ gridEngineMemoryOption="-l h_vmem=MEMORY" \ gridOptions="-V -S /bin/bash" \ gridOptions="-l h=blacklace01.blacklace" \ gridEngineArrayMaxJobs=75000 \ useGrid=remote

Will both 2 nodes run? Because I have right to use 2 nodes. Thanks, Fuyou

On Mon, Dec 26, 2016 at 12:51 PM, Sergey Koren notifications@github.com wrote:

Your grid does not support array jobs which Canu requires to run. It is normally a standard feature of grid systems, yours is the first we've seen where the array jobs are administratively disabled.

I would ask if the admins can allow your jobs to run with array jobs as it would be non-trivial for you to modify Canu to run without array support. Otherwise, you would have to run Canu with useGrid=remote. Every time you get to a submit command like the above:

  qsub \
    -l mem=8g -l nodes=1:ppn=4 \
    -d `pwd` -N "meryl_ecoli" \
    -t 1-1 \
    -j oe -o /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.\$PBS_ARRAYID.out \
    /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.sh

It would stop and you would need to manually edit the command as directed on your grid's support page: https://www.rcac.purdue.edu/news/detail.cfm?NewsID=616 Then, once that job is done re-run your Canu command which will pick up at the next step and stop again when it reaches the next array job to submit. Otherwise, you can submit the Canu command to a single node and run it with useGrid=false which will mean it will run on only a single instance which is OK for smaller genomes (<500mb).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/317#issuecomment-269234336, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKBN7WqSP68PypKAj0jaLdi6VrHjnks5rMAy5gaJpZM4LV4T3 .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

skoren commented 7 years ago

No, without array jobs or following my suggestion above, you can only run on a single node and have to use useGrid=false

sunnycqcn commented 7 years ago

Thanks. I got it. Fuyou

On Mon, Dec 26, 2016 at 1:06 PM, Sergey Koren notifications@github.com wrote:

No, without array jobs or following my suggestion above, you can only run on a single node and have to use useGrid=false

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/317#issuecomment-269235257, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKFFmbWBfZTs4Xmna0CkhFCK7_NFEks5rMBBPgaJpZM4LV4T3 .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

sunnycqcn commented 7 years ago

Hi, Thanks, I tried other server. I get the error like as: [fuf@cc03]$ sh canuS.sh -- Canu v0.0 (+0 commits) r0 unknown-hash-tag-no-repository-available. -- Detected Java(TM) Runtime Environment '1.8.0_111' (from 'java'). -- Detected gnuplot version '4.4 patchlevel 0' (from 'gnuplot') and image format 'png'. -- Detected 64 CPUs and 252 gigabytes of memory. -- Detecting PBS/Torque resources. -- Undefined subroutine &canu::Configure::caExit called at /home/u1/fuf/snow/canu-1.4/Linux-amd64/bin/lib/canu/Configure.pm line 192.. Could you help me check it? Thanks, Fuyou

On Mon, Dec 26, 2016 at 1:18 PM, Fuyou Fu fufuyou@gmail.com wrote:

Thanks. I got it. Fuyou

On Mon, Dec 26, 2016 at 1:06 PM, Sergey Koren notifications@github.com wrote:

No, without array jobs or following my suggestion above, you can only run on a single node and have to use useGrid=false

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/317#issuecomment-269235257, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKFFmbWBfZTs4Xmna0CkhFCK7_NFEks5rMBBPgaJpZM4LV4T3 .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

skoren commented 7 years ago

What is in your canuS.sh script? It looks like you're setting both a grid engine and useGrid=false and the machine you are running on is not reporting the grid configuration. Set both useGrid=0 gridEngine=undefined to make sure it won't poll your grid.

sunnycqcn commented 7 years ago

Hi, This is my sh file. Thnks, Fuyou

!/bin/bash

/home/u1/fuf/snow/canu-1.4/Linux-amd64/bin/canu \ -p asm -d strigaC \ genomeSize=1638.1m \ errorRate=0.035 \ -pacbio-raw p6.25x.fastq \ maxMemory=80g maxThreads=20 \ useGrid=true gridEngine="pbs" \ gridEngineThreadsOption="-pe smp THREADS" \ gridEngineMemoryOption="-l h_vmem=MEMORY" \ gridOptions="-V -S /bin/bash" \ gridOptions="-l h=blacklace01.blacklace" \ gridEngineArrayMaxJobs=75000 \

On Mon, Dec 26, 2016 at 2:00 PM, Sergey Koren notifications@github.com wrote:

What is in your canuS.sh script? It looks like you're setting both a grid engine and ``useGrid=falseand the machine you are running on is not reporting the grid configuration. Set bothuseGrid=0 gridEngine=undefined``` to make sure it won't poll your grid.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/317#issuecomment-269238350, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKMPzSdRkCYzFaskalklCrAk2Lo_7ks5rMBztgaJpZM4LV4T3 .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

skoren commented 7 years ago

You are still setting useGrid=true gridEngine="pbs".

You want to submit the above script to your grid and let Canu run only on the single scheduled node so you want useGrid=false gridEngine=undefined as I said above.

sunnycqcn commented 7 years ago

Thanks, I got it. Fuyou

On Mon, Dec 26, 2016 at 2:26 PM, Sergey Koren notifications@github.com wrote:

You are still setting useGrid=true gridEngine="pbs".

You want to submit the above script to your grid and let Canu run only on the single scheduled node so you want useGrid=false gridEngine=undefined as I said above.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/317#issuecomment-269239754, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKGPJYTSiTo6XOJZEJqtEXVLWjM4Yks5rMCMPgaJpZM4LV4T3 .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

sunnycqcn commented 7 years ago

Hi Koren, Thanks for your suggestions. I can run my job. But how can I rerun? For example, I can finish my meryl. jobSubmit.sh and meryl.sh. But if I use the same parameters for rerun, the job still stops as above error. Thanks,

On Mon, Dec 26, 2016 at 2:41 PM, Fuyou Fu fufuyou@gmail.com wrote:

Thanks, I got it. Fuyou

On Mon, Dec 26, 2016 at 2:26 PM, Sergey Koren notifications@github.com wrote:

You are still setting useGrid=true gridEngine="pbs".

You want to submit the above script to your grid and let Canu run only on the single scheduled node so you want useGrid=false gridEngine=undefined as I said above.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/317#issuecomment-269239754, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKGPJYTSiTo6XOJZEJqtEXVLWjM4Yks5rMCMPgaJpZM4LV4T3 .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

skoren commented 7 years ago

You can't continue the run, running meryl.jobSubmit.sh will just fail to submit the job again since it relies on arrays. You can run meryl.sh by hand which should take a while after which it will continue to the next step but you have to wait for it to finish before resuming. It would be easiest to start from scratch off grid.