canu failed after cormhap step

dtzhu337 commented 6 years ago

I run canu in the campus server using batch submission script below:

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=10:00:00
#PBS -l pmem=120gb
#PBS -A open

# Get started
echo " "
echo "Job started on `hostname` at `date`"
echo " "

# Go to the correct place
cd /storage/home/d/aaa/canu-1.8/Linux-amd64/bin/

# Run the job itself
./canu -p Pacbioassembly -d /storage/home/d/aaa/work/Pacassembly1129 genomeSize=280m -pacbio-raw /storage/home/d/aaa/work/allreads.fq
# Finish up

$ ./canu -version
Use of implicit split to @_ is deprecated at /storage/home/d/aaa/canu-1.8/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 73.
Canu 1.8

I tried several times, canu automatically created other jobs (with different job ID). Then after cormhap step (I can check the job status and found the jobname is cormhap_antPacbi, shoule be generated by canu software itself), it could not process. In the directory it works, only showed the files below. The canu.out is empty.

 antPacbio.report  antPacbio.seqStore  antPacbio.seqStore.err  antPacbio.seqStore.ssi  canu-logs  canu.out  canu-scripts  correction

Is there anyone who knows how to fix this problem? Is it the problem with my script, the canu software or the server?

Thank you

skoren commented 6 years ago

That sounds like your grid might not be letting Canu submit the jobs. What's the output in correction/1-overlapper (any out and sh files in there)? What's in canu-scripts?

dtzhu337 commented 6 years ago

That sounds like your grid might not be letting Canu submit the jobs. What's the output in correction/1-overlapper (any out and sh files in there)? What's in canu-scripts?

Hi Skoren,

Cause we have very limited storage in the server files, I have deleted those files.

The manager of server also told me it probably due to the job submission, and recommended me to use useGrid=false option. It seems like it runs well currently, at least more than 10 hours.

Thank you

dtzhu337 commented 5 years ago

That sounds like your grid might not be letting Canu submit the jobs. What's the output in correction/1-overlapper (any out and sh files in there)? What's in canu-scripts?

Hi Skoren,

I found my job finished, after use useGrid=false option. But the problem is that there is no fasta files showing the assembly results. I've only got the directory/files below.

antPacbio.report antPacbio.seqStore.err canu-logs correction antPacbio.seqStore antPacbio.seqStore.ssi canu-scripts haplotype

What do you think is the problem?

Thank you

skoren commented 5 years ago

If the output didn't get generated but the job stopped, it was probably terminated by your scheduler. You'd have to check the history of the job and the output of stdout/stderr from Canu to get that information.

When you run with useGrid=false, you're restricting canu to that single node where you requested 120gb of memory. If you land on a node with more memory than this, Canu might exceed your memory request and fail, it is safer to reserve a full node in these cases. Running with useGrid=false is also going to be much slower than using the grid so you may still want to ask your IT to diagnose the previous submission issue.

dtzhu337 commented 5 years ago

If the output didn't get generated but the job stopped, it was probably terminated by your scheduler. You'd have to check the history of the job and the output of stdout/stderr from Canu to get that information.

When you run with useGrid=false, you're restricting canu to that single node where you requested 120gb of memory. If you land on a node with more memory than this, Canu might exceed your memory request and fail, it is safer to reserve a full node in these cases. Running with useGrid=false is also going to be much slower than using the grid so you may still want to ask your IT to diagnose the previous submission issue.

The server manager told me that I should have useGrid=false option.

Do you have any idea about how long does it need for canu to assemble the genome? The estimate size is ~280M, and I've got 36GB reads data (transferred to .fq file from .bam files already). I am now using 1000GB memory to continue running the previously stopped one.
From the instruction on the website, I think using the same script to continue is okay for canu. I just wanna make sure this is right.

Best wishes and Thank you

skoren commented 5 years ago

The useGrid=false is the easiest solution since then you don't have to find out why the grid submission job from canu was rejected. However, others have run Canu on PBS grids so it should work. The only issue would be if your run nodes aren't allowed to submit jobs (see FAQ).

A 280 mb genome is not too big so I would guess less than a week. Rather than picking a machine with lots of memory, tell Canu how much memory/threads it is allowed to use. That is, if you reserve 200gb/16 cores then add the options maxMemory=200 maxThreads=16 and it will configure itself to fit.

You can restart with the same script yes.

dtzhu337 commented 5 years ago

Hi skoren,

The process has been several days now. It still has no canu.out stuff. And I am quite not sure about which step it is performing currently.

There is a file named .seqStore.err showing the following information.

Starting file './antPacbio.seqStore.ssi'.

  Loading reads from '/storage/home/d/duz193/work/allreads.fq'
    Processed 11204220 lines.
    Loaded 18715844996 bp from:
      2240844 FASTQ format reads (18715844996 bp).
    WARNING: 215253 reads (9.6059%) with 100624002 bp (0.5348%) were too short (< 1000bp) and were ignored.

Finished with:
  0 warnings (bad base or qv, too short, too long)

Loaded into store:
  18715844996 bp.
  2025591 reads.

Skipped (too short):
  100624002 bp (0.5348%).
  215253 reads (9.6059%).

sqStoreCreate finished successfully.

The canu-scripts is empty, and correction directory only have 0-mercounts and 1-overlapper.

The canu-logs showing the following,

1543631692_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8151_canu                 1543717443_comp-bc-0195.acib.production.int.aci.ics.psu.edu_93063_sqStoreDumpFASTQ
1543631692_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8200_sqStoreCreate        1543717697_comp-bc-0195.acib.production.int.aci.ics.psu.edu_95250_sqStoreDumpFASTQ
1543632415_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8615_sqStoreDumpMetaData  1543725852_comp-bc-0195.acib.production.int.aci.ics.psu.edu_105152_sqStoreDumpFASTQ
1543632416_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8620_sqStoreDumpMetaData  1543725853_comp-bc-0195.acib.production.int.aci.ics.psu.edu_105186_sqStoreDumpFASTQ
1543632442_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8671_meryl                1543733598_comp-bc-0195.acib.production.int.aci.ics.psu.edu_117807_sqStoreDumpFASTQ
1543632443_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8679_meryl                1543733599_comp-bc-0195.acib.production.int.aci.ics.psu.edu_117841_sqStoreDumpFASTQ
1543632444_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8687_meryl                1543741497_comp-bc-0195.acib.production.int.aci.ics.psu.edu_131019_sqStoreDumpFASTQ
1543632445_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8695_meryl                1543741554_comp-bc-0195.acib.production.int.aci.ics.psu.edu_131115_sqStoreDumpFASTQ
1543632446_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8703_meryl                1543749366_comp-bc-0195.acib.production.int.aci.ics.psu.edu_142514_sqStoreDumpFASTQ
1543632446_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8711_meryl                1543749525_comp-bc-0195.acib.production.int.aci.ics.psu.edu_143787_sqStoreDumpFASTQ
1543632447_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8719_meryl                1543757178_comp-bc-0195.acib.production.int.aci.ics.psu.edu_153236_sqStoreDumpFASTQ
1543632448_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8729_meryl                1543757284_comp-bc-0195.acib.production.int.aci.ics.psu.edu_153371_sqStoreDumpFASTQ
1543632449_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8737_meryl                1543765038_comp-bc-0195.acib.production.int.aci.ics.psu.edu_164764_sqStoreDumpFASTQ
1543632450_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8745_meryl                1543765221_comp-bc-0195.acib.production.int.aci.ics.psu.edu_164940_sqStoreDumpFASTQ
1543632450_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8753_meryl                1543773055_comp-bc-0195.acib.production.int.aci.ics.psu.edu_175584_sqStoreDumpFASTQ
1543632451_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8761_meryl                1543773346_comp-bc-0195.acib.production.int.aci.ics.psu.edu_175845_sqStoreDumpFASTQ
1543632452_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8769_meryl                1543780924_comp-bc-0195.acib.production.int.aci.ics.psu.edu_187154_sqStoreDumpFASTQ
1543632452_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8777_meryl                1543781225_comp-bc-0195.acib.production.int.aci.ics.psu.edu_187388_sqStoreDumpFASTQ
1543632453_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8785_meryl                1543789062_comp-bc-0195.acib.production.int.aci.ics.psu.edu_1956_sqStoreDumpFASTQ
1543632463_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8832_meryl                1543789235_comp-bc-0195.acib.production.int.aci.ics.psu.edu_2118_sqStoreDumpFASTQ
1543633300_comp-bc-0222.acib.production.int.aci.ics.psu.edu_9488_meryl                1543796953_comp-bc-0195.acib.production.int.aci.ics.psu.edu_13585_sqStoreDumpFASTQ
1543633487_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10700_meryl               1543797139_comp-bc-0195.acib.production.int.aci.ics.psu.edu_13777_sqStoreDumpFASTQ
1543633492_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10710_meryl               1543805061_comp-bc-0195.acib.production.int.aci.ics.psu.edu_23605_sqStoreDumpFASTQ
1543633582_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10776_meryl               1543805078_comp-bc-0195.acib.production.int.aci.ics.psu.edu_23675_sqStoreDumpFASTQ
1543633590_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10865_sqStoreDumpFASTQ    1543812745_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36475_sqStoreDumpFASTQ
1543633590_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10866_sqStoreDumpFASTQ    1543812904_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36700_sqStoreDumpFASTQ
1543641705_comp-bc-0222.acib.production.int.aci.ics.psu.edu_20844_sqStoreDumpFASTQ    1543820493_comp-bc-0195.acib.production.int.aci.ics.psu.edu_46080_sqStoreDumpFASTQ
1543642008_comp-bc-0222.acib.production.int.aci.ics.psu.edu_21073_sqStoreDumpFASTQ    1543820566_comp-bc-0195.acib.production.int.aci.ics.psu.edu_46188_sqStoreDumpFASTQ
1543649487_comp-bc-0222.acib.production.int.aci.ics.psu.edu_32770_sqStoreDumpFASTQ    1543828222_comp-bc-0195.acib.production.int.aci.ics.psu.edu_61248_sqStoreDumpFASTQ
1543650298_comp-bc-0222.acib.production.int.aci.ics.psu.edu_34460_sqStoreDumpFASTQ    1543828381_comp-bc-0195.acib.production.int.aci.ics.psu.edu_61424_sqStoreDumpFASTQ
1543657697_comp-bc-0222.acib.production.int.aci.ics.psu.edu_47352_sqStoreDumpFASTQ    1543835957_comp-bc-0195.acib.production.int.aci.ics.psu.edu_71879_sqStoreDumpFASTQ
1543658445_comp-bc-0222.acib.production.int.aci.ics.psu.edu_47895_sqStoreDumpFASTQ    1543836297_comp-bc-0195.acib.production.int.aci.ics.psu.edu_72181_sqStoreDumpFASTQ
1543665887_comp-bc-0222.acib.production.int.aci.ics.psu.edu_60288_sqStoreDumpFASTQ    1543843623_comp-bc-0195.acib.production.int.aci.ics.psu.edu_83417_sqStoreDumpFASTQ
1543666520_comp-bc-0222.acib.production.int.aci.ics.psu.edu_60771_sqStoreDumpFASTQ    1543844207_comp-bc-0195.acib.production.int.aci.ics.psu.edu_83860_sqStoreDumpFASTQ
1543676835_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36686_canu                1543850132_comp-bc-0284.acib.production.int.aci.ics.psu.edu_180102_canu
1543676836_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36824_sqStoreDumpFASTQ    1543850132_comp-bc-0284.acib.production.int.aci.ics.psu.edu_180251_sqStoreDumpFASTQ
1543676836_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36825_sqStoreDumpFASTQ    1543850132_comp-bc-0284.acib.production.int.aci.ics.psu.edu_180252_sqStoreDumpFASTQ
1543684916_comp-bc-0195.acib.production.int.aci.ics.psu.edu_47675_sqStoreDumpFASTQ    1543857851_comp-bc-0284.acib.production.int.aci.ics.psu.edu_191813_sqStoreDumpFASTQ
1543685026_comp-bc-0195.acib.production.int.aci.ics.psu.edu_47823_sqStoreDumpFASTQ    1543857858_comp-bc-0284.acib.production.int.aci.ics.psu.edu_191849_sqStoreDumpFASTQ
1543693093_comp-bc-0195.acib.production.int.aci.ics.psu.edu_59401_sqStoreDumpFASTQ    1543865634_comp-bc-0284.acib.production.int.aci.ics.psu.edu_6004_sqStoreDumpFASTQ
1543693128_comp-bc-0195.acib.production.int.aci.ics.psu.edu_59485_sqStoreDumpFASTQ    1543865751_comp-bc-0284.acib.production.int.aci.ics.psu.edu_6154_sqStoreDumpFASTQ
1543701164_comp-bc-0195.acib.production.int.aci.ics.psu.edu_70316_sqStoreDumpFASTQ    1543873573_comp-bc-0284.acib.production.int.aci.ics.psu.edu_16718_sqStoreDumpFASTQ
1543701308_comp-bc-0195.acib.production.int.aci.ics.psu.edu_70476_sqStoreDumpFASTQ    1543873676_comp-bc-0284.acib.production.int.aci.ics.psu.edu_16872_sqStoreDumpFASTQ
1543709263_comp-bc-0195.acib.production.int.aci.ics.psu.edu_82062_sqStoreDumpFASTQ    1544026325_comp-hc-0012.acib.production.int.aci.ics.psu.edu_105202_canu
1543709435_comp-bc-0195.acib.production.int.aci.ics.psu.edu_82259_sqStoreDumpFASTQ

Do you think the software is still running well? The last thing I want to see is that after so many days waiting, it shows nothing. Thank you in advance for your kind help.

Warm regards

skoren commented 5 years ago

Yes, it's probably fine, there isn't going to be a canu.out when you run it with useGrid=false and the canu-scripts folder will be empty as well. All the logging is going to stdout/stderr which should be captured by your grid engine and put into whatever file is the default (I didn't see an output file specification in your script).

You should also be able to use your grid engine to monitor the submitted job to see its resource utilization.

dtzhu337 commented 5 years ago

Yes, it's probably fine, there isn't going to be a canu.out when you run it with useGrid=false and the canu-scripts folder will be empty as well. All the logging is going to stdout/stderr which should be captured by your grid engine and put into whatever file is the default (I didn't see an output file specification in your script).

You should also be able to use your grid engine to monitor the submitted job to see its resource utilization.

Hi again,

It turns out the task has been finished, however I could not find any fasta files showing the contigs.

skoren commented 5 years ago

Same as before, if the fasta files are not there the job did not terminate correctly. Post the stdout/stderr from the submited job which should have more details on what happened along with the job history/accounting of that job (e.g. how long it ran for, memory used/etc).

dtzhu337 commented 5 years ago

BEGIN CORRECTION
--
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Wed Dec  5 11:12:06 2018 with 655197.215 GB free disk space (46 processes; 4 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 101 > ./mhap.000101.out 2>&1
    ./mhap.sh 102 > ./mhap.000102.out 2>&1
    ./mhap.sh 103 > ./mhap.000103.out 2>&1
    ./mhap.sh 104 > ./mhap.000104.out 2>&1
    ./mhap.sh 105 > ./mhap.000105.out 2>&1
    ./mhap.sh 106 > ./mhap.000106.out 2>&1
    ./mhap.sh 107 > ./mhap.000107.out 2>&1
    ./mhap.sh 108 > ./mhap.000108.out 2>&1
    ./mhap.sh 109 > ./mhap.000109.out 2>&1
    ./mhap.sh 110 > ./mhap.000110.out 2>&1
    ./mhap.sh 111 > ./mhap.000111.out 2>&1
    ./mhap.sh 112 > ./mhap.000112.out 2>&1
    ./mhap.sh 113 > ./mhap.000113.out 2>&1
    ./mhap.sh 114 > ./mhap.000114.out 2>&1
    ./mhap.sh 115 > ./mhap.000115.out 2>&1
    ./mhap.sh 116 > ./mhap.000116.out 2>&1
    ./mhap.sh 117 > ./mhap.000117.out 2>&1
    ./mhap.sh 118 > ./mhap.000118.out 2>&1
    ./mhap.sh 119 > ./mhap.000119.out 2>&1
    ./mhap.sh 120 > ./mhap.000120.out 2>&1
    ./mhap.sh 121 > ./mhap.000121.out 2>&1
    ./mhap.sh 122 > ./mhap.000122.out 2>&1
    ./mhap.sh 123 > ./mhap.000123.out 2>&1
    ./mhap.sh 124 > ./mhap.000124.out 2>&1
    ./mhap.sh 125 > ./mhap.000125.out 2>&1
    ./mhap.sh 126 > ./mhap.000126.out 2>&1
    ./mhap.sh 127 > ./mhap.000127.out 2>&1
    ./mhap.sh 128 > ./mhap.000128.out 2>&1
    ./mhap.sh 129 > ./mhap.000129.out 2>&1
    ./mhap.sh 130 > ./mhap.000130.out 2>&1
    ./mhap.sh 131 > ./mhap.000131.out 2>&1
    ./mhap.sh 132 > ./mhap.000132.out 2>&1
    ./mhap.sh 133 > ./mhap.000133.out 2>&1
    ./mhap.sh 134 > ./mhap.000134.out 2>&1
    ./mhap.sh 135 > ./mhap.000135.out 2>&1
    ./mhap.sh 136 > ./mhap.000136.out 2>&1
    ./mhap.sh 137 > ./mhap.000137.out 2>&1
    ./mhap.sh 138 > ./mhap.000138.out 2>&1
    ./mhap.sh 139 > ./mhap.000139.out 2>&1
    ./mhap.sh 140 > ./mhap.000140.out 2>&1
    ./mhap.sh 141 > ./mhap.000141.out 2>&1
    ./mhap.sh 142 > ./mhap.000142.out 2>&1
    ./mhap.sh 143 > ./mhap.000143.out 2>&1
    ./mhap.sh 144 > ./mhap.000144.out 2>&1
    ./mhap.sh 145 > ./mhap.000145.out 2>&1
    ./mhap.sh 146 > ./mhap.000146.out 2>&1

-- Finished on Thu Dec  6 05:23:29 2018 (65483 seconds, fashionably late) with 655101.323 GB free disk space
----------------------------------------
--
-- Mhap overlap jobs failed, retry.
--   job correction/1-overlapper/results/000114.ovb FAILED.
--   job correction/1-overlapper/results/000116.ovb FAILED.
--   job correction/1-overlapper/results/000118.ovb FAILED.
--   job correction/1-overlapper/results/000119.ovb FAILED.
--   job correction/1-overlapper/results/000121.ovb FAILED.
--   job correction/1-overlapper/results/000122.ovb FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Thu Dec  6 05:23:30 2018 with 655101.323 GB free disk space (6 processes; 4 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 114 > ./mhap.000114.out 2>&1
    ./mhap.sh 116 > ./mhap.000116.out 2>&1
    ./mhap.sh 118 > ./mhap.000118.out 2>&1
    ./mhap.sh 119 > ./mhap.000119.out 2>&1
    ./mhap.sh 121 > ./mhap.000121.out 2>&1
    ./mhap.sh 122 > ./mhap.000122.out 2>&1

-- Finished on Thu Dec  6 05:24:38 2018 (68 seconds) with 655102.071 GB free disk space
----------------------------------------
--
-- Mhap overlap jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/results/000114.ovb FAILED.
--   job correction/1-overlapper/results/000116.ovb FAILED.
--   job correction/1-overlapper/results/000118.ovb FAILED.
--   job correction/1-overlapper/results/000119.ovb FAILED.
--   job correction/1-overlapper/results/000121.ovb FAILED.
--   job correction/1-overlapper/results/000122.ovb FAILED.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

Hi skoren,

This is from the stderr document. I tried to restart canu and have another job submitted. The stderr showed

ERROR
-- ERROR  Limited to at most 1000 GB memory via maxMemory option
-- ERROR  Limited to at most 1 threads via maxThreads option
-- ERROR
-- ERROR  Found 1 machine configuration:
-- ERROR    class0 - 1 machines with 1 cores with 1000 GB memory each.
-- ERROR
-- ERROR  Task hap can't run on any available machines.
-- ERROR  It is requesting:
-- ERROR    hapMemory=6-12 memory (gigabytes)
-- ERROR    hapThreads=8-24 threads
-- ERROR
-- ERROR  No available machine configuration can run this task.
-- ERROR
-- ERROR  Possible solutions:
-- ERROR    Increase maxMemory
-- ERROR    Change hapMemory and/or hapThreads
-- ERROR

Does it mean I should change some parameters of the script or need to require more resouce?

Thank you and best wishes

skoren commented 5 years ago

A subset of your jobs failed, what is the output in the failing files (correction/1-overlapper/*122*out for example).

As for the second error, you restricted canu to one thread, I don't think that's what you want. The error is saying that for your size genome it wants at least 8 threads to run and I assume your 1tb node has more than 1 core that you're reserving.

dtzhu337 commented 5 years ago

Found perl:
   /usr/bin/perl

Found java:
   /usr/bin/java
   openjdk version "1.8.0_171"

Found canu:
   /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/canu
Use of implicit split to @_ is deprecated at /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 73.
   Canu 1.8

Running job 122 based on command line options.
Fetch blocks/000040.dat
Fetch blocks/000041.dat
Fetch blocks/000042.dat
Fetch blocks/000043.dat
Fetch blocks/000044.dat
Fetch blocks/000045.dat
Fetch blocks/000046.dat
Fetch blocks/000047.dat
Fetch blocks/000048.dat
Fetch blocks/000049.dat
Fetch blocks/000050.dat
Fetch blocks/000051.dat
Fetch blocks/000052.dat
Fetch blocks/000053.dat

Running block 000039 in query 000122

./mhap.sh: line 1001: 106046 Segmentation fault      (core dumped) $bin/mhapConvert -S ../../antPacbio.seqStore -o ./results/$qry.mhap.ovb.WORKING ./results/$qry.mhap

Found perl:
   /usr/bin/perl

Found java:
   /usr/bin/java
   openjdk version "1.8.0_171"

Found canu:
   /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/canu
Use of implicit split to @_ is deprecated at /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 73.
   Canu 1.8

Running job 114 based on command line options.
Fetch blocks/000036.dat
Fetch blocks/000037.dat
Fetch blocks/000038.dat
Fetch blocks/000039.dat
Fetch blocks/000040.dat
Fetch blocks/000041.dat
Fetch blocks/000042.dat
Fetch blocks/000043.dat
Fetch blocks/000044.dat
Fetch blocks/000045.dat
Fetch blocks/000046.dat
Fetch blocks/000047.dat
Fetch blocks/000048.dat
Fetch blocks/000049.dat

Running block 000035 in query 000114

writeToFile()-- After writing 14964 out of 818379 'ovFile::writeBuffer::sb' objects (1 bytes each): Disk quota exceeded

Found perl:
   /usr/bin/perl

Found java:
   /usr/bin/java
   openjdk version "1.8.0_171"

Found canu:
   /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/canu
Use of implicit split to @_ is deprecated at /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 73.
   Canu 1.8

Running job 118 based on command line options.
Fetch blocks/000038.dat
Fetch blocks/000039.dat
Fetch blocks/000040.dat
Fetch blocks/000041.dat
Fetch blocks/000042.dat
Fetch blocks/000043.dat
Fetch blocks/000044.dat
Fetch blocks/000045.dat
Fetch blocks/000046.dat
Fetch blocks/000047.dat
Fetch blocks/000048.dat
Fetch blocks/000049.dat
Fetch blocks/000050.dat
Fetch blocks/000051.dat

Running block 000037 in query 000118

mhapConvert: mhap/mhapConvert.C:119: int main(int, char**): Assertion `W.toint32(6) <= W.toint32(7)' failed.
./mhap.sh: line 1001: 105971 Aborted                 (core dumped) $bin/mhapConvert -S ../../antPacbio.seqStore -o ./results/$qry.mhap.ovb.WORKING ./results/$qry.mhap

I still got over 20 GB in the disk, btw.

skoren commented 5 years ago

It looks like you're out of space and as a result have partial/corrupted output. Even if you say you have 20gb of disk available, the quota error indicates you probably reached that limit or another limit during the run. At least one job complains about writing output exceeding quota. The other errors are likely due to a truncated output file which was caused by the out-of-space issues.

Remove any files named correction/1-overlapper/results/*WORKING* and correction/1-overlapper/results/*mhap*, get your quota increased, and try again.

dtzhu337 commented 5 years ago

Hi skoren,

I fixed the previous problem now. But new issues comes.

ERROR: ERROR: Failed with exit code 1. (rc=256) ERROR:

ABORT: ABORT: Canu 1.8 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT: ABORT: failed to configure the overlap store. ABORT: ABORT: Disk space available: 1625651.176 GB ABORT: ABORT: Last 50 lines of the relevant log file (correction/antPacbio.ovlStore.config.err): ABORT: ABORT: 22.516 1-overlapper/results/000071.ovb ABORT: 21.477 1-overlapper/results/000072.ovb ABORT: 16.864 1-overlapper/results/000073.ovb ABORT: 23.638 1-overlapper/results/000074.ovb ABORT: 22.394 1-overlapper/results/000075.ovb ABORT: 15.979 1-overlapper/results/000076.ovb ABORT: 24.604 1-overlapper/results/000077.ovb ABORT: 23.469 1-overlapper/results/000078.ovb ABORT: 14.961 1-overlapper/results/000079.ovb ABORT: 25.368 1-overlapper/results/000080.ovb ABORT: 24.425 1-overlapper/results/000081.ovb ABORT: 13.647 1-overlapper/results/000082.ovb ABORT: 24.874 1-overlapper/results/000083.ovb ABORT: 24.101 1-overlapper/results/000084.ovb ABORT: 11.637 1-overlapper/results/000085.ovb ABORT: 24.932 1-overlapper/results/000086.ovb ABORT: 24.525 1-overlapper/results/000087.ovb ABORT: 9.973 1-overlapper/results/000088.ovb ABORT: 25.002 1-overlapper/results/000089.ovb ABORT: 24.824 1-overlapper/results/000090.ovb ABORT: 8.211 1-overlapper/results/000091.ovb ABORT: 25.265 1-overlapper/results/000092.ovb ABORT: 25.159 1-overlapper/results/000093.ovb ABORT: 6.509 1-overlapper/results/000094.ovb ABORT: 26.511 1-overlapper/results/000095.ovb ABORT: 26.537 1-overlapper/results/000096.ovb ABORT: 4.925 1-overlapper/results/000097.ovb ABORT: 25.745 1-overlapper/results/000098.ovb ABORT: 25.909 1-overlapper/results/000099.ovb ABORT: 3.018 1-overlapper/results/000100.ovb ABORT: 26.219 1-overlapper/results/000101.ovb ABORT: 26.431 1-overlapper/results/000102.ovb ABORT: 1.229 1-overlapper/results/000103.ovb ABORT: 24.690 1-overlapper/results/000104.ovb ABORT: 24.413 1-overlapper/results/000105.ovb ABORT: 25.088 1-overlapper/results/000106.ovb ABORT: 22.902 1-overlapper/results/000107.ovb ABORT: 25.614 1-overlapper/results/000108.ovb ABORT: 21.503 1-overlapper/results/000109.ovb ABORT: 25.221 1-overlapper/results/000110.ovb ABORT: 19.298 1-overlapper/results/000111.ovb ABORT: 25.959 1-overlapper/results/000112.ovb ABORT: 17.976 1-overlapper/results/000113.ovb ABORT: 24.360 1-overlapper/results/000114.ovb ABORT: 15.009 1-overlapper/results/000115.ovb ABORT: 24.441 1-overlapper/results/000116.ovb ABORT: 13.203 1-overlapper/results/000117.ovb ABORT: 24.054 1-overlapper/results/000118.ovb ABORT: 11.233 1-overlapper/results/000119.ovb ABORT: loadFromFile()-- After loading 0 out of 1 'ovStoreHistogram::nr' objects (8 bytes each): End of file

Any ideas about what happened? Because of not enough overlapping?

brianwalenz commented 5 years ago

Basically the same problem - job 119 ran out of space writing the output and left an incomplete or even empty output. Remove correction/1-overlapper/results/*0119* andcorrection/1-overlapper/*files` and retry. There might be more than one such job, so check for any empty files in the results/ directory and remove those too!

dtzhu337 commented 5 years ago

Basically the same problem - job 119 ran out of space writing the output and left an incomplete or even empty output. Remove correction/1-overlapper/results/*0119* andcorrection/1-overlapper/*files` and retry. There might be more than one such job, so check for any empty files in the results/ directory and remove those too!

Hi,

The last I found even if I deleted the job, it still could not get through. So I think there was maybe something overwritten in the process because of out of storage. So I tried to re-do all the stuff. But this time, the question still come out like this.

-- Mhap overlap jobs failed, tried 2 times, giving up. -- job correction/1-overlapper/results/000002.ovb FAILED. -- job correction/1-overlapper/results/000003.ovb FAILED. -- job correction/1-overlapper/results/000004.ovb FAILED. -- job correction/1-overlapper/results/000005.ovb FAILED. -- job correction/1-overlapper/results/000007.ovb FAILED. -- job correction/1-overlapper/results/000008.ovb FAILED. -- job correction/1-overlapper/results/000009.ovb FAILED. -- job correction/1-overlapper/results/000011.ovb FAILED. -- job correction/1-overlapper/results/000012.ovb FAILED. -- job correction/1-overlapper/results/000013.ovb FAILED. -- job correction/1-overlapper/results/000015.ovb FAILED. -- job correction/1-overlapper/results/000016.ovb FAILED. -- job correction/1-overlapper/results/000017.ovb FAILED. -- job correction/1-overlapper/results/000019.ovb FAILED. -- job correction/1-overlapper/results/000020.ovb FAILED. -- job correction/1-overlapper/results/000021.ovb FAILED. -- job correction/1-overlapper/results/000023.ovb FAILED. -- job correction/1-overlapper/results/000024.ovb FAILED. -- job correction/1-overlapper/results/000025.ovb FAILED. -- job correction/1-overlapper/results/000027.ovb FAILED. -- job correction/1-overlapper/results/000028.ovb FAILED. -- job correction/1-overlapper/results/000029.ovb FAILED. -- job correction/1-overlapper/results/000031.ovb FAILED. -- job correction/1-overlapper/results/000032.ovb FAILED. -- job correction/1-overlapper/results/000033.ovb FAILED. -- job correction/1-overlapper/results/000035.ovb FAILED. -- job correction/1-overlapper/results/000036.ovb FAILED. -- job correction/1-overlapper/results/000037.ovb FAILED. -- job correction/1-overlapper/results/000039.ovb FAILED. -- job correction/1-overlapper/results/000040.ovb FAILED. -- job correction/1-overlapper/results/000041.ovb FAILED. -- job correction/1-overlapper/results/000043.ovb FAILED. -- job correction/1-overlapper/results/000044.ovb FAILED. -- job correction/1-overlapper/results/000045.ovb FAILED. -- job correction/1-overlapper/results/000046.ovb FAILED. -- job correction/1-overlapper/results/000047.ovb FAILED. -- job correction/1-overlapper/results/000048.ovb FAILED. -- job correction/1-overlapper/results/000049.ovb FAILED. -- job correction/1-overlapper/results/000050.ovb FAILED. -- job correction/1-overlapper/results/000051.ovb FAILED. -- job correction/1-overlapper/results/000052.ovb FAILED. -- job correction/1-overlapper/results/000053.ovb FAILED. -- job correction/1-overlapper/results/000054.ovb FAILED. -- job correction/1-overlapper/results/000055.ovb FAILED. -- job correction/1-overlapper/results/000056.ovb FAILED. -- job correction/1-overlapper/results/000057.ovb FAILED. -- job correction/1-overlapper/results/000058.ovb FAILED. -- job correction/1-overlapper/results/000059.ovb FAILED. -- job correction/1-overlapper/results/000060.ovb FAILED. -- job correction/1-overlapper/results/000061.ovb FAILED. -- job correction/1-overlapper/results/000062.ovb FAILED. -- job correction/1-overlapper/results/000063.ovb FAILED. -- job correction/1-overlapper/results/000064.ovb FAILED. -- job correction/1-overlapper/results/000065.ovb FAILED. -- job correction/1-overlapper/results/000066.ovb FAILED. -- job correction/1-overlapper/results/000067.ovb FAILED. -- job correction/1-overlapper/results/000068.ovb FAILED. -- job correction/1-overlapper/results/000069.ovb FAILED. -- job correction/1-overlapper/results/000070.ovb FAILED. -- job correction/1-overlapper/results/000071.ovb FAILED. -- job correction/1-overlapper/results/000072.ovb FAILED. -- job correction/1-overlapper/results/000073.ovb FAILED. -- job correction/1-overlapper/results/000074.ovb FAILED. -- job correction/1-overlapper/results/000075.ovb FAILED. -- job correction/1-overlapper/results/000076.ovb FAILED. -- job correction/1-overlapper/results/000077.ovb FAILED. -- job correction/1-overlapper/results/000078.ovb FAILED. -- job correction/1-overlapper/results/000079.ovb FAILED. -- job correction/1-overlapper/results/000080.ovb FAILED. -- job correction/1-overlapper/results/000081.ovb FAILED. -- job correction/1-overlapper/results/000082.ovb FAILED. -- job correction/1-overlapper/results/000083.ovb FAILED. -- job correction/1-overlapper/results/000084.ovb FAILED. -- job correction/1-overlapper/results/000085.ovb FAILED. -- job correction/1-overlapper/results/000086.ovb FAILED. -- job correction/1-overlapper/results/000087.ovb FAILED. -- job correction/1-overlapper/results/000088.ovb FAILED. -- job correction/1-overlapper/results/000089.ovb FAILED. -- job correction/1-overlapper/results/000090.ovb FAILED. -- job correction/1-overlapper/results/000091.ovb FAILED. -- job correction/1-overlapper/results/000092.ovb FAILED. -- job correction/1-overlapper/results/000093.ovb FAILED. -- job correction/1-overlapper/results/000094.ovb FAILED. -- job correction/1-overlapper/results/000095.ovb FAILED. -- job correction/1-overlapper/results/000096.ovb FAILED. -- job correction/1-overlapper/results/000097.ovb FAILED. -- job correction/1-overlapper/results/000098.ovb FAILED. -- job correction/1-overlapper/results/000099.ovb FAILED. -- job correction/1-overlapper/results/000100.ovb FAILED. -- job correction/1-overlapper/results/000101.ovb FAILED. -- job correction/1-overlapper/results/000102.ovb FAILED. -- job correction/1-overlapper/results/000103.ovb FAILED. -- job correction/1-overlapper/results/000104.ovb FAILED. -- job correction/1-overlapper/results/000105.ovb FAILED. -- job correction/1-overlapper/results/000106.ovb FAILED. -- job correction/1-overlapper/results/000107.ovb FAILED. -- job correction/1-overlapper/results/000108.ovb FAILED. -- job correction/1-overlapper/results/000109.ovb FAILED. -- job correction/1-overlapper/results/000110.ovb FAILED. -- job correction/1-overlapper/results/000111.ovb FAILED. -- job correction/1-overlapper/results/000112.ovb FAILED. -- job correction/1-overlapper/results/000113.ovb FAILED. -- job correction/1-overlapper/results/000114.ovb FAILED. -- job correction/1-overlapper/results/000115.ovb FAILED. -- job correction/1-overlapper/results/000116.ovb FAILED. -- job correction/1-overlapper/results/000117.ovb FAILED. -- job correction/1-overlapper/results/000118.ovb FAILED. -- job correction/1-overlapper/results/000119.ovb FAILED. -- job correction/1-overlapper/results/000120.ovb FAILED. -- job correction/1-overlapper/results/000121.ovb FAILED. -- job correction/1-overlapper/results/000122.ovb FAILED. -- job correction/1-overlapper/results/000123.ovb FAILED. -- job correction/1-overlapper/results/000124.ovb FAILED. -- job correction/1-overlapper/results/000125.ovb FAILED. -- job correction/1-overlapper/results/000126.ovb FAILED. -- job correction/1-overlapper/results/000127.ovb FAILED. -- job correction/1-overlapper/results/000128.ovb FAILED. -- job correction/1-overlapper/results/000129.ovb FAILED. -- job correction/1-overlapper/results/000130.ovb FAILED. -- job correction/1-overlapper/results/000131.ovb FAILED. -- job correction/1-overlapper/results/000132.ovb FAILED. -- job correction/1-overlapper/results/000133.ovb FAILED. -- job correction/1-overlapper/results/000134.ovb FAILED. -- job correction/1-overlapper/results/000135.ovb FAILED. -- job correction/1-overlapper/results/000136.ovb FAILED. -- job correction/1-overlapper/results/000137.ovb FAILED. -- job correction/1-overlapper/results/000138.ovb FAILED. -- job correction/1-overlapper/results/000139.ovb FAILED. -- job correction/1-overlapper/results/000140.ovb FAILED. -- job correction/1-overlapper/results/000141.ovb FAILED. -- job correction/1-overlapper/results/000142.ovb FAILED. -- job correction/1-overlapper/results/000143.ovb FAILED. -- job correction/1-overlapper/results/000144.ovb FAILED. -- job correction/1-overlapper/results/000145.ovb FAILED. -- job correction/1-overlapper/results/000146.ovb FAILED.

ABORT: ABORT: Canu 1.8 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT:

I tried to delete the /correction/1-overlapper/results/000146(all the numbers).working files, but it still didn't work. It's not a problem about space. Is there any thing I can do to this?

Thank you

brianwalenz commented 5 years ago

As you're finding, out of space errors are insidious and really hard to fix.

I'd suggest starting overlaps again. It looks like it thinks nearly every overlap job failed, so a fresh start isn't as drastic as it sounds.

Remove the 1-overlapper directory, and any ovlStore files or directories. This should leave, I think, just 0-mercounts in the correction/ directory.

brianwalenz commented 5 years ago

Gave up? Success? Or still running? Assuming you restarted, and it misbehaves again, open a new issue and refer back to this one.

dtzhu337 commented 5 years ago

Hi Brian,

Thank you so much for asking. It seems to have a new problem. I just submitted a new issue. Hope you can help me fix it.

Thank you

Dantong

On Fri, Dec 21, 2018 at 12:31 AM Brian Walenz notifications@github.com wrote:

Gave up? Success? Or still running? Assuming you restarted, and it misbehaves again, open a new issue and refer back to this one.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/1167#issuecomment-449259859, or mute the thread https://github.com/notifications/unsubscribe-auth/ArY2RMdBrXD1i-uyfUgz6Suwwb4rVSL3ks5u7HJGgaJpZM4Y8Xst .

marbl / canu

canu failed after cormhap step #1167