marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
655 stars 179 forks source link

unitigging failed #1594

Closed noor-albader closed 4 years ago

noor-albader commented 4 years ago

Not sure how to overcome a unitigging fail: Are there intermediate files I can remove to restart canu? Is there a step I have to run manually beforehand to restart canu?

Using canu -version 1.8 running on a grid My canu command:

#!/bin/bash
#SBATCH --time=240:10:00
module load canu/1.8/gnu6.4.0;
time canu -d palm_canu -p palm genomeSize=1000m -pacbio-raw /home/albadenm/c2042/data/palm/assembly_ready/palm.PB.fa.gz usegrid=1 gridOptions="--time=5-00:00:00 --partition=batch --mem-per-cpu=16g" gridOptionsJobName=palm-using-grid
echo "canu is done!"

My canu.out log with crash report:

Found perl:
   /usr/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /usr/bin/java
   openjdk version "1.8.0_212"

Found canu:
   /sw/csi/canu/1.8/el7.5_gnu6.4.0/canu-1.8/Linux-amd64/bin/canu
   Canu 1.8

-- Canu 1.8
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
--
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_212' (from 'java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 40 CPUs and 377 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /opt/slurm/cluster/ibex/install/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1048575 jobs.
--
-- Found  10 hosts with  20 cores and  246 GB memory under Slurm control.
-- Found   2 hosts with  64 cores and 2010 GB memory under Slurm control.
-- Found   2 hosts with  16 cores and  246 GB memory under Slurm control.
-- Found   4 hosts with  32 cores and 3007 GB memory under Slurm control.
-- Found   6 hosts with  64 cores and  990 GB memory under Slurm control.
-- Found   1 host  with  16 cores and  246 GB memory under Slurm control.
-- Found 154 hosts with  20 cores and  118 GB memory under Slurm control.
-- Found   3 hosts with  64 cores and 1506 GB memory under Slurm control.
-- Found  16 hosts with  36 cores and  246 GB memory under Slurm control.
-- Found   1 host  with  64 cores and 1375 GB memory under Slurm control.
-- Found  74 hosts with  16 cores and   54 GB memory under Slurm control.
-- Found   2 hosts with  16 cores and  120 GB memory under Slurm control.
-- Found  22 hosts with  64 cores and  246 GB memory under Slurm control.
-- Found  14 hosts with  48 cores and 3007 GB memory under Slurm control.
-- Found 108 hosts with  40 cores and  366 GB memory under Slurm control.
-- Found   1 host  with  64 cores and  498 GB memory under Slurm control.
-- Found   1 host  with  80 cores and 1503 GB memory under Slurm control.
-- Found   1 host  with  80 cores and 2010 GB memory under Slurm control.
-- Found  16 hosts with  16 cores and  118 GB memory under Slurm control.
-- Found   1 host  with  16 cores and   50 GB memory under Slurm control.
-- Found   3 hosts with  64 cores and  750 GB memory under Slurm control.
-- Found  16 hosts with  32 cores and  366 GB memory under Slurm control.
-- Found   2 hosts with  64 cores and  988 GB memory under Slurm control.
-- Found  30 hosts with  48 cores and  745 GB memory under Slurm control.
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     25 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       16 GB   16 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   16 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    16 GB    4 CPUs  (overlap detection)
-- Grid:  utgovl    16 GB    4 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32 GB    1 CPU   (overlap store sorting)
-- Grid:  red       12 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      256 GB   16 CPUs  (contig construction with bogart)
-- Grid:  gfa       16 GB   16 CPUs  (GFA alignment and processing)
--
-- In 'palm.seqStore', found PacBio reads:
--   Raw:        4967706
--   Corrected:  947614
--   Trimmed:    915230
--
-- Generating assembly 'palm' in '/ibex/scratch/projects/c2042/analysis/genome_assembly_palm/palm_canu/palm_canu'
--
-- Parameters:
--
--  genomeSize        1000000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Creating overlap store unitigging/palm.ovlStore using:
--      2 buckets
--      2 slices
--        using at most 16 GB memory each
--
-- Overlap store bucketizer jobs failed, tried 2 times, giving up.
--   job unitigging/palm.ovlStore.BUILDING/bucket0002 FAILED.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

My unitigging/ovlStore.BUILDING/logs report:

Found perl:
   /usr/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /usr/bin/java
   openjdk version "1.8.0_212"

Found canu:
   /sw/csi/canu/1.8/el7.5_gnu6.4.0/canu-1.8/Linux-amd64/bin/canu
   Canu 1.8

Running job 2 based on SLURM_ARRAY_TASK_ID=2 and offset=0.

Attempting to increase maximum allowed processes and open files.
  Max processes per user limited to 1542347, no increase possible.
  Max open files limited to 131072, no increase possible.

Overwriting incomplete result from presumed crashed job in directory '.
/palm.ovlStore.BUILDING/create0002'.

Opened '../palm.seqStore' with 4967706 reads.

Constructing slice 2 for store './palm.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   99 - '1-overlapper/001/000002.ovb'
Bucketizing input    2 out of   99 - '1-overlapper/001/000004.ovb'
Bucketizing input    3 out of   99 - '1-overlapper/001/000006.ovb'
ERROR: short read on file '1-overlapper/001/000006': read 0 bytes, expected 13715.
skoren commented 4 years ago

You have disk corruption in one of the files. You'll have to go back and re-run the corrupted files. Remove the palm.ovlStore.BUILDING/ folder, cd to the 1-overlapper folder, remove 001/000006.* files, run sh overlap.sh 6 and re-run canu command you used before from the top level. It may fail again if more than one file is corrupt in which case you'd have to follow the same steps above for that file as well. Make sure you're not out of space as well on your system.

noor-albader commented 4 years ago

Hi I was able to remove the following and palm.ovlStore.BUILDING/ folder and 1-overlapper/001/000006.*

Not sure what you mean by sh overlap.sh 6 ? not sure how to run overlap.sh since I can't find it in the canu module. Also where would I run the script?

Thanks

skoren commented 4 years ago

As I said in the initial reply, you have to go into the overlapper folder to run it. The script is generated by Canu already:

cd unitigging/1-overlapper
sh overlap.sh 6
cd ../../../
<re-run initial canu command assuming above is successful>

You should do this on a note with at least 16gb of memory reserved as that is what your overlap jobs expect to have available.

noor-albader commented 4 years ago

Thank you for your reply! But I do not see an overlap.sh in my unitigging/1-overlapper

Here are the contents of my unitigging/1-overlapper:

drwxr-sr-x 2 albadenm ibex-c2042   579 Jan 16 17:42 001
-rw-r--r-- 1 albadenm ibex-c2042  6685 Jan 11 07:16 overlap.8265962_100.out
-rw-r--r-- 1 albadenm ibex-c2042  6740 Jan 11 17:31 overlap.8265962_101.out
-rw-r--r-- 1 albadenm ibex-c2042  6806 Jan 11 07:53 overlap.8265962_102.out
-rw-r--r-- 1 albadenm ibex-c2042  6738 Jan 11 17:44 overlap.8265962_103.out
-rw-r--r-- 1 albadenm ibex-c2042  6809 Jan 11 07:48 overlap.8265962_104.out
-rw-r--r-- 1 albadenm ibex-c2042  6743 Jan 11 17:15 overlap.8265962_105.out
-rw-r--r-- 1 albadenm ibex-c2042  6808 Jan 11 08:20 overlap.8265962_106.out
-rw-r--r-- 1 albadenm ibex-c2042  6745 Jan 11 16:57 overlap.8265962_107.out
-rw-r--r-- 1 albadenm ibex-c2042  6808 Jan 11 08:38 overlap.8265962_108.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 17:15 overlap.8265962_109.out
-rw-r--r-- 1 albadenm ibex-c2042  6484 Jan 10 23:30 overlap.8265962_10.out
-rw-r--r-- 1 albadenm ibex-c2042  6694 Jan 11 08:57 overlap.8265962_110.out
-rw-r--r-- 1 albadenm ibex-c2042  6612 Jan 11 18:32 overlap.8265962_111.out
-rw-r--r-- 1 albadenm ibex-c2042  6709 Jan 11 10:18 overlap.8265962_112.out
-rw-r--r-- 1 albadenm ibex-c2042  6606 Jan 11 17:05 overlap.8265962_113.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 09:55 overlap.8265962_114.out
-rw-r--r-- 1 albadenm ibex-c2042  6605 Jan 11 18:15 overlap.8265962_115.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 11:20 overlap.8265962_116.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 17:47 overlap.8265962_117.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 11:16 overlap.8265962_118.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 18:11 overlap.8265962_119.out
-rw-r--r-- 1 albadenm ibex-c2042  6484 Jan 11 00:29 overlap.8265962_11.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 11:59 overlap.8265962_120.out
-rw-r--r-- 1 albadenm ibex-c2042  6732 Jan 11 18:10 overlap.8265962_121.out
-rw-r--r-- 1 albadenm ibex-c2042  6837 Jan 11 12:23 overlap.8265962_122.out
-rw-r--r-- 1 albadenm ibex-c2042  6745 Jan 11 17:01 overlap.8265962_123.out
-rw-r--r-- 1 albadenm ibex-c2042  6838 Jan 11 12:13 overlap.8265962_124.out
-rw-r--r-- 1 albadenm ibex-c2042  6740 Jan 11 17:45 overlap.8265962_125.out
-rw-r--r-- 1 albadenm ibex-c2042  6837 Jan 11 13:26 overlap.8265962_126.out
-rw-r--r-- 1 albadenm ibex-c2042  6737 Jan 11 16:54 overlap.8265962_127.out
-rw-r--r-- 1 albadenm ibex-c2042  6836 Jan 11 12:48 overlap.8265962_128.out
-rw-r--r-- 1 albadenm ibex-c2042  6742 Jan 11 18:08 overlap.8265962_129.out
-rw-r--r-- 1 albadenm ibex-c2042  6490 Jan 11 00:52 overlap.8265962_12.out
-rw-r--r-- 1 albadenm ibex-c2042  6838 Jan 11 14:33 overlap.8265962_130.out
-rw-r--r-- 1 albadenm ibex-c2042  6742 Jan 11 17:42 overlap.8265962_131.out
-rw-r--r-- 1 albadenm ibex-c2042  6838 Jan 11 14:24 overlap.8265962_132.out
-rw-r--r-- 1 albadenm ibex-c2042  6610 Jan 11 18:08 overlap.8265962_133.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 15:32 overlap.8265962_134.out
-rw-r--r-- 1 albadenm ibex-c2042  6615 Jan 11 17:48 overlap.8265962_135.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 15:59 overlap.8265962_136.out
-rw-r--r-- 1 albadenm ibex-c2042  6620 Jan 11 17:32 overlap.8265962_137.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 15:33 overlap.8265962_138.out
-rw-r--r-- 1 albadenm ibex-c2042  6615 Jan 11 17:24 overlap.8265962_139.out
-rw-r--r-- 1 albadenm ibex-c2042  6495 Jan 11 01:22 overlap.8265962_13.out
-rw-r--r-- 1 albadenm ibex-c2042  6710 Jan 11 16:11 overlap.8265962_140.out
-rw-r--r-- 1 albadenm ibex-c2042  6612 Jan 12 00:28 overlap.8265962_141.out
-rw-r--r-- 1 albadenm ibex-c2042  6700 Jan 11 21:48 overlap.8265962_142.out
-rw-r--r-- 1 albadenm ibex-c2042  6425 Jan 10 18:26 overlap.8265962_143.out
-rw-r--r-- 1 albadenm ibex-c2042  6615 Jan 12 00:42 overlap.8265962_144.out
-rw-r--r-- 1 albadenm ibex-c2042  6709 Jan 11 23:52 overlap.8265962_145.out
-rw-r--r-- 1 albadenm ibex-c2042  6554 Jan 10 18:53 overlap.8265962_146.out
-rw-r--r-- 1 albadenm ibex-c2042  6617 Jan 11 23:47 overlap.8265962_147.out
-rw-r--r-- 1 albadenm ibex-c2042  6714 Jan 11 23:31 overlap.8265962_148.out
-rw-r--r-- 1 albadenm ibex-c2042  6578 Jan 10 19:31 overlap.8265962_149.out
-rw-r--r-- 1 albadenm ibex-c2042  6495 Jan 11 01:40 overlap.8265962_14.out
-rw-r--r-- 1 albadenm ibex-c2042  6737 Jan 12 00:03 overlap.8265962_150.out
-rw-r--r-- 1 albadenm ibex-c2042  6828 Jan 11 22:36 overlap.8265962_151.out
-rw-r--r-- 1 albadenm ibex-c2042  6731 Jan 10 20:03 overlap.8265962_152.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 20:31 overlap.8265962_153.out
-rw-r--r-- 1 albadenm ibex-c2042  6714 Jan 11 19:53 overlap.8265962_154.out
-rw-r--r-- 1 albadenm ibex-c2042  6610 Jan 10 20:30 overlap.8265962_155.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 21:37 overlap.8265962_156.out
-rw-r--r-- 1 albadenm ibex-c2042  6712 Jan 11 19:51 overlap.8265962_157.out
-rw-r--r-- 1 albadenm ibex-c2042  6621 Jan 10 20:57 overlap.8265962_158.out
-rw-r--r-- 1 albadenm ibex-c2042  6617 Jan 11 21:47 overlap.8265962_159.out
-rw-r--r-- 1 albadenm ibex-c2042  6495 Jan 11 02:10 overlap.8265962_15.out
-rw-r--r-- 1 albadenm ibex-c2042  6711 Jan 11 21:01 overlap.8265962_160.out
-rw-r--r-- 1 albadenm ibex-c2042  6660 Jan 10 21:30 overlap.8265962_161.out
-rw-r--r-- 1 albadenm ibex-c2042  6610 Jan 11 20:18 overlap.8265962_162.out
-rw-r--r-- 1 albadenm ibex-c2042  6707 Jan 11 19:42 overlap.8265962_163.out
-rw-r--r-- 1 albadenm ibex-c2042  6675 Jan 10 22:05 overlap.8265962_164.out
-rw-r--r-- 1 albadenm ibex-c2042  6607 Jan 11 20:27 overlap.8265962_165.out
-rw-r--r-- 1 albadenm ibex-c2042  6701 Jan 11 19:36 overlap.8265962_166.out
-rw-r--r-- 1 albadenm ibex-c2042  6677 Jan 10 22:38 overlap.8265962_167.out
-rw-r--r-- 1 albadenm ibex-c2042  6607 Jan 11 20:52 overlap.8265962_168.out
-rw-r--r-- 1 albadenm ibex-c2042  6692 Jan 11 20:00 overlap.8265962_169.out
-rw-r--r-- 1 albadenm ibex-c2042  6499 Jan 11 02:42 overlap.8265962_16.out
-rw-r--r-- 1 albadenm ibex-c2042  6677 Jan 10 23:13 overlap.8265962_170.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 22:11 overlap.8265962_171.out
-rw-r--r-- 1 albadenm ibex-c2042  6713 Jan 11 21:26 overlap.8265962_172.out
-rw-r--r-- 1 albadenm ibex-c2042  6677 Jan 10 23:55 overlap.8265962_173.out
-rw-r--r-- 1 albadenm ibex-c2042  6741 Jan 11 20:56 overlap.8265962_174.out
-rw-r--r-- 1 albadenm ibex-c2042  6837 Jan 11 20:35 overlap.8265962_175.out
-rw-r--r-- 1 albadenm ibex-c2042  6802 Jan 11 00:26 overlap.8265962_176.out
-rw-r--r-- 1 albadenm ibex-c2042  6742 Jan 11 20:55 overlap.8265962_177.out
-rw-r--r-- 1 albadenm ibex-c2042  6836 Jan 11 20:16 overlap.8265962_178.out
-rw-r--r-- 1 albadenm ibex-c2042  6805 Jan 11 00:55 overlap.8265962_179.out
-rw-r--r-- 1 albadenm ibex-c2042  6499 Jan 11 03:15 overlap.8265962_17.out
-rw-r--r-- 1 albadenm ibex-c2042  6620 Jan 12 16:26 overlap.8265962_180.out
-rw-r--r-- 1 albadenm ibex-c2042  6714 Jan 12 13:25 overlap.8265962_181.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 07:00 overlap.8265962_182.out
-rw-r--r-- 1 albadenm ibex-c2042  6610 Jan 12 17:31 overlap.8265962_183.out
-rw-r--r-- 1 albadenm ibex-c2042  6701 Jan 12 11:46 overlap.8265962_184.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 09:44 overlap.8265962_185.out
-rw-r--r-- 1 albadenm ibex-c2042  6615 Jan 12 22:56 overlap.8265962_186.out
-rw-r--r-- 1 albadenm ibex-c2042  6712 Jan 12 20:24 overlap.8265962_187.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 10:37 overlap.8265962_188.out
-rw-r--r-- 1 albadenm ibex-c2042  6614 Jan 12 21:51 overlap.8265962_189.out
-rw-r--r-- 1 albadenm ibex-c2042  6499 Jan 11 03:52 overlap.8265962_18.out
-rw-r--r-- 1 albadenm ibex-c2042  6706 Jan 12 23:14 overlap.8265962_190.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 10:31 overlap.8265962_191.out
-rw-r--r-- 1 albadenm ibex-c2042  6588 Jan 11 08:56 overlap.8265962_192.out
-rw-r--r-- 1 albadenm ibex-c2042  6682 Jan 11 09:12 overlap.8265962_193.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 10 23:31 overlap.8265962_194.out
-rw-r--r-- 1 albadenm ibex-c2042  6501 Jan 11 04:16 overlap.8265962_19.out
-rw-r--r-- 1 albadenm ibex-c2042  6197 Jan 10 18:43 overlap.8265962_1.out
-rw-r--r-- 1 albadenm ibex-c2042  6505 Jan 11 04:28 overlap.8265962_20.out
-rw-r--r-- 1 albadenm ibex-c2042  6505 Jan 11 05:02 overlap.8265962_21.out
-rw-r--r-- 1 albadenm ibex-c2042  6628 Jan 11 05:52 overlap.8265962_22.out
-rw-r--r-- 1 albadenm ibex-c2042  6502 Jan 11 06:13 overlap.8265962_23.out
-rw-r--r-- 1 albadenm ibex-c2042  6650 Jan 11 06:43 overlap.8265962_24.out
-rw-r--r-- 1 albadenm ibex-c2042  6535 Jan 11 07:11 overlap.8265962_25.out
-rw-r--r-- 1 albadenm ibex-c2042  6658 Jan 11 07:55 overlap.8265962_26.out
-rw-r--r-- 1 albadenm ibex-c2042  6657 Jan 11 08:42 overlap.8265962_27.out
-rw-r--r-- 1 albadenm ibex-c2042  6658 Jan 11 08:25 overlap.8265962_28.out
-rw-r--r-- 1 albadenm ibex-c2042  6539 Jan 11 09:30 overlap.8265962_29.out
-rw-r--r-- 1 albadenm ibex-c2042  6262 Jan 10 19:18 overlap.8265962_2.out
-rw-r--r-- 1 albadenm ibex-c2042  6543 Jan 11 09:57 overlap.8265962_30.out
-rw-r--r-- 1 albadenm ibex-c2042  6547 Jan 11 10:38 overlap.8265962_31.out
-rw-r--r-- 1 albadenm ibex-c2042  6561 Jan 11 11:26 overlap.8265962_32.out
-rw-r--r-- 1 albadenm ibex-c2042  6568 Jan 11 11:37 overlap.8265962_33.out
-rw-r--r-- 1 albadenm ibex-c2042  6567 Jan 11 12:04 overlap.8265962_34.out
-rw-r--r-- 1 albadenm ibex-c2042  6588 Jan 11 12:38 overlap.8265962_35.out
-rw-r--r-- 1 albadenm ibex-c2042  6714 Jan 11 12:27 overlap.8265962_36.out
-rw-r--r-- 1 albadenm ibex-c2042  6592 Jan 11 12:56 overlap.8265962_37.out
-rw-r--r-- 1 albadenm ibex-c2042  6601 Jan 11 13:21 overlap.8265962_38.out
-rw-r--r-- 1 albadenm ibex-c2042  6601 Jan 11 13:28 overlap.8265962_39.out
-rw-r--r-- 1 albadenm ibex-c2042  6446 Jan 10 19:51 overlap.8265962_3.out
-rw-r--r-- 1 albadenm ibex-c2042  6605 Jan 11 14:50 overlap.8265962_40.out
-rw-r--r-- 1 albadenm ibex-c2042  6729 Jan 11 14:54 overlap.8265962_41.out
-rw-r--r-- 1 albadenm ibex-c2042  6609 Jan 11 14:27 overlap.8265962_42.out
-rw-r--r-- 1 albadenm ibex-c2042  6734 Jan 11 15:58 overlap.8265962_43.out
-rw-r--r-- 1 albadenm ibex-c2042  6612 Jan 11 15:40 overlap.8265962_44.out
-rw-r--r-- 1 albadenm ibex-c2042  6609 Jan 11 19:15 overlap.8265962_45.out
-rw-r--r-- 1 albadenm ibex-c2042  6612 Jan 11 19:07 overlap.8265962_46.out
-rw-r--r-- 1 albadenm ibex-c2042  6740 Jan 11 18:22 overlap.8265962_47.out
-rw-r--r-- 1 albadenm ibex-c2042  6479 Jan 10 18:25 overlap.8265962_48.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 18:03 overlap.8265962_49.out
-rw-r--r-- 1 albadenm ibex-c2042  6479 Jan 10 20:25 overlap.8265962_4.out
-rw-r--r-- 1 albadenm ibex-c2042  6547 Jan 10 18:42 overlap.8265962_50.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 18:51 overlap.8265962_51.out
-rw-r--r-- 1 albadenm ibex-c2042  6572 Jan 10 19:10 overlap.8265962_52.out
-rw-r--r-- 1 albadenm ibex-c2042  6615 Jan 11 18:08 overlap.8265962_53.out
-rw-r--r-- 1 albadenm ibex-c2042  6601 Jan 10 19:42 overlap.8265962_54.out
-rw-r--r-- 1 albadenm ibex-c2042  6740 Jan 11 18:15 overlap.8265962_55.out
-rw-r--r-- 1 albadenm ibex-c2042  6733 Jan 10 20:11 overlap.8265962_56.out
-rw-r--r-- 1 albadenm ibex-c2042  6614 Jan 11 17:16 overlap.8265962_57.out
-rw-r--r-- 1 albadenm ibex-c2042  6609 Jan 10 20:37 overlap.8265962_58.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 17:11 overlap.8265962_59.out
-rw-r--r-- 1 albadenm ibex-c2042  6396 Jan 10 21:03 overlap.8265962_5.out
-rw-r--r-- 1 albadenm ibex-c2042  6624 Jan 10 21:19 overlap.8265962_60.out
-rw-r--r-- 1 albadenm ibex-c2042  6735 Jan 11 17:20 overlap.8265962_61.out
-rw-r--r-- 1 albadenm ibex-c2042  6797 Jan 10 21:45 overlap.8265962_62.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 19:29 overlap.8265962_63.out
-rw-r--r-- 1 albadenm ibex-c2042  6673 Jan 10 22:05 overlap.8265962_64.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 16:53 overlap.8265962_65.out
-rw-r--r-- 1 albadenm ibex-c2042  6673 Jan 10 22:32 overlap.8265962_66.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 16:37 overlap.8265962_67.out
-rw-r--r-- 1 albadenm ibex-c2042  6675 Jan 10 22:58 overlap.8265962_68.out
-rw-r--r-- 1 albadenm ibex-c2042  6613 Jan 11 17:30 overlap.8265962_69.out
-rw-r--r-- 1 albadenm ibex-c2042  7519 Jan 11 00:07 overlap.8265962_6.out
-rw-r--r-- 1 albadenm ibex-c2042  6675 Jan 10 23:41 overlap.8265962_70.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 17:02 overlap.8265962_71.out
-rw-r--r-- 1 albadenm ibex-c2042  6677 Jan 11 00:00 overlap.8265962_72.out
-rw-r--r-- 1 albadenm ibex-c2042  6614 Jan 11 17:33 overlap.8265962_73.out
-rw-r--r-- 1 albadenm ibex-c2042  6678 Jan 11 00:42 overlap.8265962_74.out
-rw-r--r-- 1 albadenm ibex-c2042  6613 Jan 11 17:19 overlap.8265962_75.out
-rw-r--r-- 1 albadenm ibex-c2042  6678 Jan 11 01:35 overlap.8265962_76.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 17:25 overlap.8265962_77.out
-rw-r--r-- 1 albadenm ibex-c2042  6678 Jan 11 01:40 overlap.8265962_78.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 18:15 overlap.8265962_79.out
-rw-r--r-- 1 albadenm ibex-c2042  8188 Jan 11 00:41 overlap.8265962_7.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 02:06 overlap.8265962_80.out
-rw-r--r-- 1 albadenm ibex-c2042  6611 Jan 11 17:48 overlap.8265962_81.out
-rw-r--r-- 1 albadenm ibex-c2042  6678 Jan 11 02:32 overlap.8265962_82.out
-rw-r--r-- 1 albadenm ibex-c2042  6739 Jan 11 17:35 overlap.8265962_83.out
-rw-r--r-- 1 albadenm ibex-c2042  6805 Jan 11 03:07 overlap.8265962_84.out
-rw-r--r-- 1 albadenm ibex-c2042  6611 Jan 11 17:43 overlap.8265962_85.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 03:56 overlap.8265962_86.out
-rw-r--r-- 1 albadenm ibex-c2042  6611 Jan 11 18:09 overlap.8265962_87.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 04:22 overlap.8265962_88.out
-rw-r--r-- 1 albadenm ibex-c2042  6609 Jan 11 18:02 overlap.8265962_89.out
-rw-r--r-- 1 albadenm ibex-c2042  8793 Jan 11 01:40 overlap.8265962_8.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 04:45 overlap.8265962_90.out
-rw-r--r-- 1 albadenm ibex-c2042  6618 Jan 11 17:46 overlap.8265962_91.out
-rw-r--r-- 1 albadenm ibex-c2042  6680 Jan 11 05:13 overlap.8265962_92.out
-rw-r--r-- 1 albadenm ibex-c2042  6610 Jan 11 17:55 overlap.8265962_93.out
-rw-r--r-- 1 albadenm ibex-c2042  6669 Jan 11 06:01 overlap.8265962_94.out
-rw-r--r-- 1 albadenm ibex-c2042  6616 Jan 11 17:29 overlap.8265962_95.out
-rw-r--r-- 1 albadenm ibex-c2042  6706 Jan 11 05:53 overlap.8265962_96.out
-rw-r--r-- 1 albadenm ibex-c2042  6611 Jan 11 17:08 overlap.8265962_97.out
-rw-r--r-- 1 albadenm ibex-c2042  6689 Jan 11 06:34 overlap.8265962_98.out
-rw-r--r-- 1 albadenm ibex-c2042  6610 Jan 11 17:53 overlap.8265962_99.out
-rw-r--r-- 1 albadenm ibex-c2042  8820 Jan 11 01:52 overlap.8265962_9.out
-rw-r--r-- 1 albadenm ibex-c2042    28 Jan 10 18:23 overlap.jobSubmit-01.out
-rwxr-xr-x 1 albadenm ibex-c2042   246 Jan 10 18:23 overlap.jobSubmit-01.sh
-rwxr-xr-x 1 albadenm ibex-c2042 27000 Jan 10 18:23 overlap.sh
-rw-r--r-- 1 albadenm ibex-c2042  5432 Jan 12 23:14 ovljob.files
-rw-r--r-- 1 albadenm ibex-c2042 11058 Jan 12 23:14 ovljob.more.files
-rw-r--r-- 1 albadenm ibex-c2042 22201 Jan 10 18:23 palm.partition.err
-rw-r--r-- 1 albadenm ibex-c2042   776 Jan 10 18:23 palm.partition.ovlbat
-rw-r--r-- 1 albadenm ibex-c2042  1358 Jan 10 18:23 palm.partition.ovljob
-rw-r--r-- 1 albadenm ibex-c2042 11281 Jan 10 18:23 palm.partition.ovlopt

Is there another way to run sh overlap.sh 6or download that script from somewhere?

skoren commented 4 years ago

The script is listed in your output:

-rwxr-xr-x 1 albadenm ibex-c2042 27000 Jan 10 18:23 overlap.sh

It's created at runtime by canu so there's nowhere to download it from but you should be able to run it.

noor-albader commented 4 years ago

Thank you for your quick responses! Super useful!

After removing the palm.ovlStore.BUILDING/ folder and 1-overlapper/001/000006.*

I was able to perform with no error the following:

cd unitigging/1-overlapper
sh overlap.sh 6

and three new files (1-overlapper/001/000006.*) were created, one of which was labeled stats:

head  000006.stats
 Kmer hits without olaps = 74114865
 Kmer hits with olaps = 303985
 Multiple overlaps/pair = 0
 Total overlaps produced = 303985
      Contained overlaps = 81441
       Dovetail overlaps = 222544
Rejected by short window = 0
Rejected by long window = 0

But when trying to rerun the original canu command:

cd ../../../
<re-run initial canu command assuming above is successful>

I get the same error in my unitigging/ovlStore.BUILDING/logs report and canu.out in my original post.

Should I try re-running Canu from scratch? It doesn't seem like I can overcome the following error: ERROR: short read on file '1-overlapper/001/000006': read 0 bytes, expected 13715

skoren commented 4 years ago

What are the sizes of the 1-overlapper/001/000006* files? Have you confirmed you're not running out of disk/quota space?

noor-albader commented 4 years ago

Here are my 1-overlapper/001/000006* (also 000005* and 000004* for reference ):

-rw-r--r-- 1 albadenm ibex-c2042 19870840 Jan 10 20:25 000004.oc
-rw-r--r-- 1 albadenm ibex-c2042  4046954 Jan 10 20:25 000004.ovb
-rw-r--r-- 1 albadenm ibex-c2042      258 Jan 10 20:25 000004.stats
-rw-r--r-- 1 albadenm ibex-c2042 19870840 Jan 10 21:03 000005.oc
-rw-r--r-- 1 albadenm ibex-c2042  5612616 Jan 10 21:03 000005.ovb
-rw-r--r-- 1 albadenm ibex-c2042      258 Jan 10 21:03 000005.stats
-rw-r--r-- 1 albadenm ibex-c2042 19870840 Jan 21 06:17 000006.oc
-rw-r--r-- 1 albadenm ibex-c2042  6009271 Jan 21 06:17 000006.ovb
-rw-r--r-- 1 albadenm ibex-c2042      258 Jan 21 06:17 000006.stats

human readable file sizes:

du -h 
5.8M    000006.ovb
19M 000006.oc

I have got over 30T left so its not running out of disk/quota space..

skoren commented 4 years ago

Very strange, it seems like the file is truncated but size looks OK and I assume you saw no errors when you re-ran it. Are you able to share this data, see the FAQ for instructions to send it to us. We'd need the offending overlap files unitigging/1-overlapper/001/000002/4/6* along with unitigging/1-overlapper/overlap.8265962_6.out and any sh files in unitigging/ovlStore.BUILDING/scripts folder.

noor-albader commented 4 years ago

There is no unitigging/1-overlapper/001/000002/4/6* The directory only goes up to unitigging/1-overlapper/001/

I can send the unitigging/1-overlapper/001/000006* files, along with unitigging/1-overlapper/overlap.8265962_6.out

skoren commented 4 years ago

I mean all the overlapping output files needed for that bucket (unitigging/1-overlapper/001/000002*; unitigging/1-overlapper/001/000004*; unitigging/1-overlapper/001/000006*). The shell scripts in the ovlStore.BUILDING folder are also important as they capture how the bucket is being created on your system.

noor-albader commented 4 years ago

Sorry there is a time difference between us. When I sent my last comment (~24hrs ago), I shared, using the FAQ instructions, the following data and hope you have received them : unitigging/1-overlapper/001/000006* unitigging/1-overlapper/overlap.8265962_6.out

Would you like me to also send the over all the files in unitigging/1-overlapper/001/* /this bucket (all even numbers)?

skoren commented 4 years ago

Please send only the unitigging/1-overlapper/001/000002* and unitigging/1-overlapper/001/000004* files, those are the only ones needed in addition to unitigging/1-overlapper/001/000006. Also send over all files in the folder unitigging/ovlStore.BUILDING/scripts

noor-albader commented 4 years ago

Sorry for the late reply!

Ok I have already sent (but will resend, just in case!): 1-overlapper/001/000006*

Additionally, I will now send the: unitigging/1-overlapper/001/000002* unitigging/1-overlapper/001/000004* unitigging/ovlStore.BUILDING/scripts

noor-albader commented 4 years ago

Just kidding, The ftp protocol would not let me place the 1-overlapper/001/000006* in you folder again.

All of the other files I was about to transfer to you.

Thank you

skoren commented 4 years ago

Thanks, I got the files. I'm going to look at the overlaps in each but in the meantime, the output of the job with the issue seems like multiple concurrent instances were running at once:

Starting 1-242523 with 7579 per thread

Thread 00 processes reads 1-7579

...

Bye.
Thread 00 writes    reads 1-7579 (10993 overlaps 10993/2530953/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 30317-37895

...

Bye.
mv: cannot stat ‘./001/000006.ovb.WORKING’: No such file or directory

Concurrent jobs could definitely cause an issue (the handling for this condition got improved in 1.9), do any of your other output files have a similar error message? Is it possible this job got run twice (by hand and by Canu) at the same time?

noor-albader commented 4 years ago

yes I have seen this error after I tried running by hand and not realizing it failed and then running canu: './001/000006.ovb.WORKING’: No such file or directory

Is it possible this job got run twice (by hand and by Canu) at the same time? yes once; but this error only popped up after I got my original error in my first post: job unitigging/palm.ovlStore.BUILDING/bucket0002 FAILED

In the mean time, I have started a new instance of assembly. Hopefully I won't get the same (original error)

skoren commented 4 years ago

Any updates?

skoren commented 4 years ago

Closing, idle. Not able to reproduce locally (the users file are corrupt but we haven't seen this corruption locally) and seems like it may be a collision of same jobs running simultaneously. There was a fix post canu 1.8 to resolve this race condition (which may be due to some slurm versions not holding jobs properly).