Closed dtzhu337 closed 5 years ago
That sounds like your grid might not be letting Canu submit the jobs. What's the output in correction/1-overlapper (any out and sh files in there)? What's in canu-scripts?
That sounds like your grid might not be letting Canu submit the jobs. What's the output in correction/1-overlapper (any out and sh files in there)? What's in canu-scripts?
Hi Skoren,
Cause we have very limited storage in the server files, I have deleted those files.
The manager of server also told me it probably due to the job submission, and recommended me to use useGrid=false option. It seems like it runs well currently, at least more than 10 hours.
Thank you
That sounds like your grid might not be letting Canu submit the jobs. What's the output in correction/1-overlapper (any out and sh files in there)? What's in canu-scripts?
Hi Skoren,
I found my job finished, after use useGrid=false option. But the problem is that there is no fasta files showing the assembly results. I've only got the directory/files below.
antPacbio.report antPacbio.seqStore.err canu-logs correction antPacbio.seqStore antPacbio.seqStore.ssi canu-scripts haplotype
What do you think is the problem?
Thank you
If the output didn't get generated but the job stopped, it was probably terminated by your scheduler. You'd have to check the history of the job and the output of stdout/stderr from Canu to get that information.
When you run with useGrid=false, you're restricting canu to that single node where you requested 120gb of memory. If you land on a node with more memory than this, Canu might exceed your memory request and fail, it is safer to reserve a full node in these cases. Running with useGrid=false is also going to be much slower than using the grid so you may still want to ask your IT to diagnose the previous submission issue.
If the output didn't get generated but the job stopped, it was probably terminated by your scheduler. You'd have to check the history of the job and the output of stdout/stderr from Canu to get that information.
When you run with useGrid=false, you're restricting canu to that single node where you requested 120gb of memory. If you land on a node with more memory than this, Canu might exceed your memory request and fail, it is safer to reserve a full node in these cases. Running with useGrid=false is also going to be much slower than using the grid so you may still want to ask your IT to diagnose the previous submission issue.
The server manager told me that I should have useGrid=false option.
Do you have any idea about how long does it need for canu to assemble the genome? The estimate size is ~280M, and I've got 36GB reads data (transferred to .fq file from .bam files already). I am now using 1000GB memory to continue running the previously stopped one.
From the instruction on the website, I think using the same script to continue is okay for canu. I just wanna make sure this is right.
Best wishes and Thank you
The useGrid=false is the easiest solution since then you don't have to find out why the grid submission job from canu was rejected. However, others have run Canu on PBS grids so it should work. The only issue would be if your run nodes aren't allowed to submit jobs (see FAQ).
A 280 mb genome is not too big so I would guess less than a week. Rather than picking a machine with lots of memory, tell Canu how much memory/threads it is allowed to use. That is, if you reserve 200gb/16 cores then add the options maxMemory=200 maxThreads=16
and it will configure itself to fit.
You can restart with the same script yes.
Hi skoren,
The process has been several days now. It still has no canu.out stuff. And I am quite not sure about which step it is performing currently.
There is a file named .seqStore.err showing the following information.
Starting file './antPacbio.seqStore.ssi'.
Loading reads from '/storage/home/d/duz193/work/allreads.fq'
Processed 11204220 lines.
Loaded 18715844996 bp from:
2240844 FASTQ format reads (18715844996 bp).
WARNING: 215253 reads (9.6059%) with 100624002 bp (0.5348%) were too short (< 1000bp) and were ignored.
Finished with:
0 warnings (bad base or qv, too short, too long)
Loaded into store:
18715844996 bp.
2025591 reads.
Skipped (too short):
100624002 bp (0.5348%).
215253 reads (9.6059%).
sqStoreCreate finished successfully.
The canu-scripts is empty, and correction directory only have 0-mercounts and 1-overlapper.
The canu-logs showing the following,
1543631692_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8151_canu 1543717443_comp-bc-0195.acib.production.int.aci.ics.psu.edu_93063_sqStoreDumpFASTQ
1543631692_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8200_sqStoreCreate 1543717697_comp-bc-0195.acib.production.int.aci.ics.psu.edu_95250_sqStoreDumpFASTQ
1543632415_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8615_sqStoreDumpMetaData 1543725852_comp-bc-0195.acib.production.int.aci.ics.psu.edu_105152_sqStoreDumpFASTQ
1543632416_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8620_sqStoreDumpMetaData 1543725853_comp-bc-0195.acib.production.int.aci.ics.psu.edu_105186_sqStoreDumpFASTQ
1543632442_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8671_meryl 1543733598_comp-bc-0195.acib.production.int.aci.ics.psu.edu_117807_sqStoreDumpFASTQ
1543632443_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8679_meryl 1543733599_comp-bc-0195.acib.production.int.aci.ics.psu.edu_117841_sqStoreDumpFASTQ
1543632444_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8687_meryl 1543741497_comp-bc-0195.acib.production.int.aci.ics.psu.edu_131019_sqStoreDumpFASTQ
1543632445_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8695_meryl 1543741554_comp-bc-0195.acib.production.int.aci.ics.psu.edu_131115_sqStoreDumpFASTQ
1543632446_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8703_meryl 1543749366_comp-bc-0195.acib.production.int.aci.ics.psu.edu_142514_sqStoreDumpFASTQ
1543632446_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8711_meryl 1543749525_comp-bc-0195.acib.production.int.aci.ics.psu.edu_143787_sqStoreDumpFASTQ
1543632447_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8719_meryl 1543757178_comp-bc-0195.acib.production.int.aci.ics.psu.edu_153236_sqStoreDumpFASTQ
1543632448_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8729_meryl 1543757284_comp-bc-0195.acib.production.int.aci.ics.psu.edu_153371_sqStoreDumpFASTQ
1543632449_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8737_meryl 1543765038_comp-bc-0195.acib.production.int.aci.ics.psu.edu_164764_sqStoreDumpFASTQ
1543632450_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8745_meryl 1543765221_comp-bc-0195.acib.production.int.aci.ics.psu.edu_164940_sqStoreDumpFASTQ
1543632450_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8753_meryl 1543773055_comp-bc-0195.acib.production.int.aci.ics.psu.edu_175584_sqStoreDumpFASTQ
1543632451_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8761_meryl 1543773346_comp-bc-0195.acib.production.int.aci.ics.psu.edu_175845_sqStoreDumpFASTQ
1543632452_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8769_meryl 1543780924_comp-bc-0195.acib.production.int.aci.ics.psu.edu_187154_sqStoreDumpFASTQ
1543632452_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8777_meryl 1543781225_comp-bc-0195.acib.production.int.aci.ics.psu.edu_187388_sqStoreDumpFASTQ
1543632453_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8785_meryl 1543789062_comp-bc-0195.acib.production.int.aci.ics.psu.edu_1956_sqStoreDumpFASTQ
1543632463_comp-bc-0222.acib.production.int.aci.ics.psu.edu_8832_meryl 1543789235_comp-bc-0195.acib.production.int.aci.ics.psu.edu_2118_sqStoreDumpFASTQ
1543633300_comp-bc-0222.acib.production.int.aci.ics.psu.edu_9488_meryl 1543796953_comp-bc-0195.acib.production.int.aci.ics.psu.edu_13585_sqStoreDumpFASTQ
1543633487_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10700_meryl 1543797139_comp-bc-0195.acib.production.int.aci.ics.psu.edu_13777_sqStoreDumpFASTQ
1543633492_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10710_meryl 1543805061_comp-bc-0195.acib.production.int.aci.ics.psu.edu_23605_sqStoreDumpFASTQ
1543633582_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10776_meryl 1543805078_comp-bc-0195.acib.production.int.aci.ics.psu.edu_23675_sqStoreDumpFASTQ
1543633590_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10865_sqStoreDumpFASTQ 1543812745_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36475_sqStoreDumpFASTQ
1543633590_comp-bc-0222.acib.production.int.aci.ics.psu.edu_10866_sqStoreDumpFASTQ 1543812904_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36700_sqStoreDumpFASTQ
1543641705_comp-bc-0222.acib.production.int.aci.ics.psu.edu_20844_sqStoreDumpFASTQ 1543820493_comp-bc-0195.acib.production.int.aci.ics.psu.edu_46080_sqStoreDumpFASTQ
1543642008_comp-bc-0222.acib.production.int.aci.ics.psu.edu_21073_sqStoreDumpFASTQ 1543820566_comp-bc-0195.acib.production.int.aci.ics.psu.edu_46188_sqStoreDumpFASTQ
1543649487_comp-bc-0222.acib.production.int.aci.ics.psu.edu_32770_sqStoreDumpFASTQ 1543828222_comp-bc-0195.acib.production.int.aci.ics.psu.edu_61248_sqStoreDumpFASTQ
1543650298_comp-bc-0222.acib.production.int.aci.ics.psu.edu_34460_sqStoreDumpFASTQ 1543828381_comp-bc-0195.acib.production.int.aci.ics.psu.edu_61424_sqStoreDumpFASTQ
1543657697_comp-bc-0222.acib.production.int.aci.ics.psu.edu_47352_sqStoreDumpFASTQ 1543835957_comp-bc-0195.acib.production.int.aci.ics.psu.edu_71879_sqStoreDumpFASTQ
1543658445_comp-bc-0222.acib.production.int.aci.ics.psu.edu_47895_sqStoreDumpFASTQ 1543836297_comp-bc-0195.acib.production.int.aci.ics.psu.edu_72181_sqStoreDumpFASTQ
1543665887_comp-bc-0222.acib.production.int.aci.ics.psu.edu_60288_sqStoreDumpFASTQ 1543843623_comp-bc-0195.acib.production.int.aci.ics.psu.edu_83417_sqStoreDumpFASTQ
1543666520_comp-bc-0222.acib.production.int.aci.ics.psu.edu_60771_sqStoreDumpFASTQ 1543844207_comp-bc-0195.acib.production.int.aci.ics.psu.edu_83860_sqStoreDumpFASTQ
1543676835_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36686_canu 1543850132_comp-bc-0284.acib.production.int.aci.ics.psu.edu_180102_canu
1543676836_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36824_sqStoreDumpFASTQ 1543850132_comp-bc-0284.acib.production.int.aci.ics.psu.edu_180251_sqStoreDumpFASTQ
1543676836_comp-bc-0195.acib.production.int.aci.ics.psu.edu_36825_sqStoreDumpFASTQ 1543850132_comp-bc-0284.acib.production.int.aci.ics.psu.edu_180252_sqStoreDumpFASTQ
1543684916_comp-bc-0195.acib.production.int.aci.ics.psu.edu_47675_sqStoreDumpFASTQ 1543857851_comp-bc-0284.acib.production.int.aci.ics.psu.edu_191813_sqStoreDumpFASTQ
1543685026_comp-bc-0195.acib.production.int.aci.ics.psu.edu_47823_sqStoreDumpFASTQ 1543857858_comp-bc-0284.acib.production.int.aci.ics.psu.edu_191849_sqStoreDumpFASTQ
1543693093_comp-bc-0195.acib.production.int.aci.ics.psu.edu_59401_sqStoreDumpFASTQ 1543865634_comp-bc-0284.acib.production.int.aci.ics.psu.edu_6004_sqStoreDumpFASTQ
1543693128_comp-bc-0195.acib.production.int.aci.ics.psu.edu_59485_sqStoreDumpFASTQ 1543865751_comp-bc-0284.acib.production.int.aci.ics.psu.edu_6154_sqStoreDumpFASTQ
1543701164_comp-bc-0195.acib.production.int.aci.ics.psu.edu_70316_sqStoreDumpFASTQ 1543873573_comp-bc-0284.acib.production.int.aci.ics.psu.edu_16718_sqStoreDumpFASTQ
1543701308_comp-bc-0195.acib.production.int.aci.ics.psu.edu_70476_sqStoreDumpFASTQ 1543873676_comp-bc-0284.acib.production.int.aci.ics.psu.edu_16872_sqStoreDumpFASTQ
1543709263_comp-bc-0195.acib.production.int.aci.ics.psu.edu_82062_sqStoreDumpFASTQ 1544026325_comp-hc-0012.acib.production.int.aci.ics.psu.edu_105202_canu
1543709435_comp-bc-0195.acib.production.int.aci.ics.psu.edu_82259_sqStoreDumpFASTQ
Do you think the software is still running well? The last thing I want to see is that after so many days waiting, it shows nothing. Thank you in advance for your kind help.
Warm regards
Yes, it's probably fine, there isn't going to be a canu.out when you run it with useGrid=false and the canu-scripts folder will be empty as well. All the logging is going to stdout/stderr which should be captured by your grid engine and put into whatever file is the default (I didn't see an output file specification in your script).
You should also be able to use your grid engine to monitor the submitted job to see its resource utilization.
Yes, it's probably fine, there isn't going to be a canu.out when you run it with useGrid=false and the canu-scripts folder will be empty as well. All the logging is going to stdout/stderr which should be captured by your grid engine and put into whatever file is the default (I didn't see an output file specification in your script).
You should also be able to use your grid engine to monitor the submitted job to see its resource utilization.
Hi again,
It turns out the task has been finished, however I could not find any fasta files showing the contigs.
Same as before, if the fasta files are not there the job did not terminate correctly. Post the stdout/stderr from the submited job which should have more details on what happened along with the job history/accounting of that job (e.g. how long it ran for, memory used/etc).
BEGIN CORRECTION
--
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Wed Dec 5 11:12:06 2018 with 655197.215 GB free disk space (46 processes; 4 concurrently)
cd correction/1-overlapper
./mhap.sh 101 > ./mhap.000101.out 2>&1
./mhap.sh 102 > ./mhap.000102.out 2>&1
./mhap.sh 103 > ./mhap.000103.out 2>&1
./mhap.sh 104 > ./mhap.000104.out 2>&1
./mhap.sh 105 > ./mhap.000105.out 2>&1
./mhap.sh 106 > ./mhap.000106.out 2>&1
./mhap.sh 107 > ./mhap.000107.out 2>&1
./mhap.sh 108 > ./mhap.000108.out 2>&1
./mhap.sh 109 > ./mhap.000109.out 2>&1
./mhap.sh 110 > ./mhap.000110.out 2>&1
./mhap.sh 111 > ./mhap.000111.out 2>&1
./mhap.sh 112 > ./mhap.000112.out 2>&1
./mhap.sh 113 > ./mhap.000113.out 2>&1
./mhap.sh 114 > ./mhap.000114.out 2>&1
./mhap.sh 115 > ./mhap.000115.out 2>&1
./mhap.sh 116 > ./mhap.000116.out 2>&1
./mhap.sh 117 > ./mhap.000117.out 2>&1
./mhap.sh 118 > ./mhap.000118.out 2>&1
./mhap.sh 119 > ./mhap.000119.out 2>&1
./mhap.sh 120 > ./mhap.000120.out 2>&1
./mhap.sh 121 > ./mhap.000121.out 2>&1
./mhap.sh 122 > ./mhap.000122.out 2>&1
./mhap.sh 123 > ./mhap.000123.out 2>&1
./mhap.sh 124 > ./mhap.000124.out 2>&1
./mhap.sh 125 > ./mhap.000125.out 2>&1
./mhap.sh 126 > ./mhap.000126.out 2>&1
./mhap.sh 127 > ./mhap.000127.out 2>&1
./mhap.sh 128 > ./mhap.000128.out 2>&1
./mhap.sh 129 > ./mhap.000129.out 2>&1
./mhap.sh 130 > ./mhap.000130.out 2>&1
./mhap.sh 131 > ./mhap.000131.out 2>&1
./mhap.sh 132 > ./mhap.000132.out 2>&1
./mhap.sh 133 > ./mhap.000133.out 2>&1
./mhap.sh 134 > ./mhap.000134.out 2>&1
./mhap.sh 135 > ./mhap.000135.out 2>&1
./mhap.sh 136 > ./mhap.000136.out 2>&1
./mhap.sh 137 > ./mhap.000137.out 2>&1
./mhap.sh 138 > ./mhap.000138.out 2>&1
./mhap.sh 139 > ./mhap.000139.out 2>&1
./mhap.sh 140 > ./mhap.000140.out 2>&1
./mhap.sh 141 > ./mhap.000141.out 2>&1
./mhap.sh 142 > ./mhap.000142.out 2>&1
./mhap.sh 143 > ./mhap.000143.out 2>&1
./mhap.sh 144 > ./mhap.000144.out 2>&1
./mhap.sh 145 > ./mhap.000145.out 2>&1
./mhap.sh 146 > ./mhap.000146.out 2>&1
-- Finished on Thu Dec 6 05:23:29 2018 (65483 seconds, fashionably late) with 655101.323 GB free disk space
----------------------------------------
--
-- Mhap overlap jobs failed, retry.
-- job correction/1-overlapper/results/000114.ovb FAILED.
-- job correction/1-overlapper/results/000116.ovb FAILED.
-- job correction/1-overlapper/results/000118.ovb FAILED.
-- job correction/1-overlapper/results/000119.ovb FAILED.
-- job correction/1-overlapper/results/000121.ovb FAILED.
-- job correction/1-overlapper/results/000122.ovb FAILED.
--
--
-- Running jobs. Second attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Thu Dec 6 05:23:30 2018 with 655101.323 GB free disk space (6 processes; 4 concurrently)
cd correction/1-overlapper
./mhap.sh 114 > ./mhap.000114.out 2>&1
./mhap.sh 116 > ./mhap.000116.out 2>&1
./mhap.sh 118 > ./mhap.000118.out 2>&1
./mhap.sh 119 > ./mhap.000119.out 2>&1
./mhap.sh 121 > ./mhap.000121.out 2>&1
./mhap.sh 122 > ./mhap.000122.out 2>&1
-- Finished on Thu Dec 6 05:24:38 2018 (68 seconds) with 655102.071 GB free disk space
----------------------------------------
--
-- Mhap overlap jobs failed, tried 2 times, giving up.
-- job correction/1-overlapper/results/000114.ovb FAILED.
-- job correction/1-overlapper/results/000116.ovb FAILED.
-- job correction/1-overlapper/results/000118.ovb FAILED.
-- job correction/1-overlapper/results/000119.ovb FAILED.
-- job correction/1-overlapper/results/000121.ovb FAILED.
-- job correction/1-overlapper/results/000122.ovb FAILED.
--
ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
Hi skoren,
This is from the stderr document. I tried to restart canu and have another job submitted. The stderr showed
ERROR
-- ERROR Limited to at most 1000 GB memory via maxMemory option
-- ERROR Limited to at most 1 threads via maxThreads option
-- ERROR
-- ERROR Found 1 machine configuration:
-- ERROR class0 - 1 machines with 1 cores with 1000 GB memory each.
-- ERROR
-- ERROR Task hap can't run on any available machines.
-- ERROR It is requesting:
-- ERROR hapMemory=6-12 memory (gigabytes)
-- ERROR hapThreads=8-24 threads
-- ERROR
-- ERROR No available machine configuration can run this task.
-- ERROR
-- ERROR Possible solutions:
-- ERROR Increase maxMemory
-- ERROR Change hapMemory and/or hapThreads
-- ERROR
Does it mean I should change some parameters of the script or need to require more resouce?
Thank you and best wishes
A subset of your jobs failed, what is the output in the failing files (correction/1-overlapper/*122*out
for example).
As for the second error, you restricted canu to one thread, I don't think that's what you want. The error is saying that for your size genome it wants at least 8 threads to run and I assume your 1tb node has more than 1 core that you're reserving.
Found perl:
/usr/bin/perl
Found java:
/usr/bin/java
openjdk version "1.8.0_171"
Found canu:
/storage/home/d/duz193/canu-1.8/Linux-amd64/bin/canu
Use of implicit split to @_ is deprecated at /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 73.
Canu 1.8
Running job 122 based on command line options.
Fetch blocks/000040.dat
Fetch blocks/000041.dat
Fetch blocks/000042.dat
Fetch blocks/000043.dat
Fetch blocks/000044.dat
Fetch blocks/000045.dat
Fetch blocks/000046.dat
Fetch blocks/000047.dat
Fetch blocks/000048.dat
Fetch blocks/000049.dat
Fetch blocks/000050.dat
Fetch blocks/000051.dat
Fetch blocks/000052.dat
Fetch blocks/000053.dat
Running block 000039 in query 000122
./mhap.sh: line 1001: 106046 Segmentation fault (core dumped) $bin/mhapConvert -S ../../antPacbio.seqStore -o ./results/$qry.mhap.ovb.WORKING ./results/$qry.mhap
Found perl:
/usr/bin/perl
Found java:
/usr/bin/java
openjdk version "1.8.0_171"
Found canu:
/storage/home/d/duz193/canu-1.8/Linux-amd64/bin/canu
Use of implicit split to @_ is deprecated at /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 73.
Canu 1.8
Running job 114 based on command line options.
Fetch blocks/000036.dat
Fetch blocks/000037.dat
Fetch blocks/000038.dat
Fetch blocks/000039.dat
Fetch blocks/000040.dat
Fetch blocks/000041.dat
Fetch blocks/000042.dat
Fetch blocks/000043.dat
Fetch blocks/000044.dat
Fetch blocks/000045.dat
Fetch blocks/000046.dat
Fetch blocks/000047.dat
Fetch blocks/000048.dat
Fetch blocks/000049.dat
Running block 000035 in query 000114
writeToFile()-- After writing 14964 out of 818379 'ovFile::writeBuffer::sb' objects (1 bytes each): Disk quota exceeded
Found perl:
/usr/bin/perl
Found java:
/usr/bin/java
openjdk version "1.8.0_171"
Found canu:
/storage/home/d/duz193/canu-1.8/Linux-amd64/bin/canu
Use of implicit split to @_ is deprecated at /storage/home/d/duz193/canu-1.8/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 73.
Canu 1.8
Running job 118 based on command line options.
Fetch blocks/000038.dat
Fetch blocks/000039.dat
Fetch blocks/000040.dat
Fetch blocks/000041.dat
Fetch blocks/000042.dat
Fetch blocks/000043.dat
Fetch blocks/000044.dat
Fetch blocks/000045.dat
Fetch blocks/000046.dat
Fetch blocks/000047.dat
Fetch blocks/000048.dat
Fetch blocks/000049.dat
Fetch blocks/000050.dat
Fetch blocks/000051.dat
Running block 000037 in query 000118
mhapConvert: mhap/mhapConvert.C:119: int main(int, char**): Assertion `W.toint32(6) <= W.toint32(7)' failed.
./mhap.sh: line 1001: 105971 Aborted (core dumped) $bin/mhapConvert -S ../../antPacbio.seqStore -o ./results/$qry.mhap.ovb.WORKING ./results/$qry.mhap
I still got over 20 GB in the disk, btw.
It looks like you're out of space and as a result have partial/corrupted output. Even if you say you have 20gb of disk available, the quota error indicates you probably reached that limit or another limit during the run. At least one job complains about writing output exceeding quota. The other errors are likely due to a truncated output file which was caused by the out-of-space issues.
Remove any files named correction/1-overlapper/results/*WORKING*
and correction/1-overlapper/results/*mhap*
, get your quota increased, and try again.
Hi skoren,
I fixed the previous problem now. But new issues comes.
ERROR: ERROR: Failed with exit code 1. (rc=256) ERROR:
Any ideas about what happened? Because of not enough overlapping?
Basically the same problem - job 119 ran out of space writing the output and left an incomplete or even empty output. Remove correction/1-overlapper/results/*0119* and
correction/1-overlapper/*files` and retry. There might be more than one such job, so check for any empty files in the results/ directory and remove those too!
Basically the same problem - job 119 ran out of space writing the output and left an incomplete or even empty output. Remove
correction/1-overlapper/results/*0119* and
correction/1-overlapper/*files` and retry. There might be more than one such job, so check for any empty files in the results/ directory and remove those too!
Hi,
The last I found even if I deleted the job, it still could not get through. So I think there was maybe something overwritten in the process because of out of storage. So I tried to re-do all the stuff. But this time, the question still come out like this.
ABORT: ABORT: Canu 1.8 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT:
I tried to delete the /correction/1-overlapper/results/000146(all the numbers).working files, but it still didn't work. It's not a problem about space. Is there any thing I can do to this?
Thank you
As you're finding, out of space errors are insidious and really hard to fix.
I'd suggest starting overlaps again. It looks like it thinks nearly every overlap job failed, so a fresh start isn't as drastic as it sounds.
Remove the 1-overlapper directory, and any ovlStore files or directories. This should leave, I think, just 0-mercounts in the correction/ directory.
Gave up? Success? Or still running? Assuming you restarted, and it misbehaves again, open a new issue and refer back to this one.
Hi Brian,
Thank you so much for asking. It seems to have a new problem. I just submitted a new issue. Hope you can help me fix it.
Thank you
Dantong
On Fri, Dec 21, 2018 at 12:31 AM Brian Walenz notifications@github.com wrote:
Gave up? Success? Or still running? Assuming you restarted, and it misbehaves again, open a new issue and refer back to this one.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/1167#issuecomment-449259859, or mute the thread https://github.com/notifications/unsubscribe-auth/ArY2RMdBrXD1i-uyfUgz6Suwwb4rVSL3ks5u7HJGgaJpZM4Y8Xst .
I run canu in the campus server using batch submission script below:
I tried several times, canu automatically created other jobs (with different job ID). Then after cormhap step (I can check the job status and found the jobname is cormhap_antPacbi, shoule be generated by canu software itself), it could not process. In the directory it works, only showed the files below. The canu.out is empty.
Is there anyone who knows how to fix this problem? Is it the problem with my script, the canu software or the server?
Thank you