marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
653 stars 179 forks source link

bucketize fail in unitigging step #1294

Closed SixPlusSeven closed 5 years ago

SixPlusSeven commented 5 years ago

Here is my error log:

-- Found 1477 hosts with  24 cores and   62 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     62 GB   24 CPUs  (k-mer counting)
-- Grid:  hap       16 GB   24 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   48 GB   24 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24 GB   24 CPUs  (overlap detection)
-- Grid:  utgovl    24 GB   24 CPUs  (overlap detection)
-- Grid:  ovb        4 GB   24 CPUs  (overlap store bucketizer)
-- Grid:  ovs        8 GB   24 CPUs  (overlap store sorting)
-- Grid:  red       16 GB   24 CPUs  (read error detection)
-- Grid:  oea        8 GB   24 CPUs  (overlap error adjustment)
-- Grid:  bat       62 GB   24 CPUs  (contig construction with bogart)
-- Grid:  gfa       32 GB   24 CPUs  (GFA alignment and processing)
--

...

--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0400 (  4.00%)
--    utgOvlErrorRate 0.0400 (  4.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0400 (  4.00%)
--    utgErrorRate    0.0400 (  4.00%)
--    cnsErrorRate    0.0400 (  4.00%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Creating overlap store unitigging/out.ovlStore using:
--      7 buckets
--      7 slices
--        using at most 27 GB memory each
--
-- Overlap store bucketizer jobs failed, tried 2 times, giving up.
--   job unitigging/out.ovlStore.BUILDING/bucket0001 FAILED.
--   job unitigging/out.ovlStore.BUILDING/bucket0002 FAILED.
--   job unitigging/out.ovlStore.BUILDING/bucket0003 FAILED.
--   job unitigging/out.ovlStore.BUILDING/bucket0004 FAILED.
--   job unitigging/out.ovlStore.BUILDING/bucket0005 FAILED.
--   job unitigging/out.ovlStore.BUILDING/bucket0006 FAILED.
--   job unitigging/out.ovlStore.BUILDING/bucket0007 FAILED.
--
ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

All of mine bucketizer jobs failed. And here is one of the detail log:

Running job 5 based on SLURM_ARRAY_TASK_ID=5 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820556/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0005'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 5 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   67 - '1-overlapper/001/000005.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

And I have about 99G files in 1-overlapper/001/, the largest one is 365M. How to deal with this problem?

Thanks, Alex

skoren commented 5 years ago

This usually happens with disk corruption in a previous step (see also #1134, #1264) but it is suspicious that all the jobs failed. Are you out of space on disk?

If you still have space, this seems to be an intermittent issue on your system because all your previous overlap steps (correction, trimming) didn't have issues. Each of the failed jobs should report the file it last tried to open and failed. We need to validate the output from the previous step. Use the ascii overlap dump utility to scan the files. From your unitigging folder run:


overlapConvert -G asm.gkpStore -coords 1-overlapper/001/000001.ovb
for all ovb files you have. This will print something like:

    299737          2  N     682    8131   8809       0    679  0.013300
    289432          2  N    2527    1960   4475       0   2509  0.014700

if columns 4,5,6,7 are bigger than the read lengths in your dataset, the file is corrupt. It may also crash reading the file when it is corrupt. 
SixPlusSeven commented 5 years ago

I get this, so which cloumn is the read length?

overlapConvert -S ../out.seqStore -coords 1-overlapper/001/000001.ovb
    310544     141312  N    9317    5350  14600       0   9238  0.019300
    141312     206231  N    9136    1301  10310       0   9060  0.025900
    141312     857506  N    7301    3093  10310       0   7239  0.024100
    834818     141312  N   10402    8868  19183       0  10310  0.020000
    141312     856493  N    9094     890   9856       0   9002  0.030600
    141312     171044  N    6563    3832  10310       0   6486  0.028900
    141312     843809  N    9347    1040  10310       0   9289  0.017000
    338668     141312  N   10447    7231  17599       0  10310  0.025200
    853958     141312  N    7404    8059  15379       0   7336  0.026100
    863214     141312  N    7348   12830  20135       0   7288  0.017200
    171350     141312  N   10409    1154  11489       0  10310  0.019000
    256803     141312  N    9385    3678  12934       0   9238  0.036500
    158118     141312  N    2087    8962  11025       0   2050  0.037600
    141312     235335  N    9991     459  10310       0   9902  0.027100
    309094     141312  N    2261    8686  10922       0   2222  0.036500
    173939     141312  N    2934   11099  14012       0   2906  0.021000
    246297     141312  N    9374    3362  12656       0   9268  0.023600
    865263     141312  N    9866   14792  24581       0   9783  0.018300
    141312     838421  N    7450    1317   8700       0   7383  0.021500
    310880     141312  N    7228    3271  10434       0   7126  0.028500
    843325     141312  N   10412   14868  25193       0  10310  0.022000
    321282     141312  N    7569    2812  10318       0   7483  0.022500
......

It seems that there is no problem...

skoren commented 5 years ago

There is no column for read length, the 5350 14600 0 9238 are the coordinates of the overlap in the reads. These should not be overly large (e.g. larger than your longest read). Did you check every ovb file, the failing one in the log was 000005.ovb. What do the rest of the bucketizing failed jobs report.

SixPlusSeven commented 5 years ago

Here is the rest of the bucketizing failed jobs report.

1-bucketize.1820390_1.out:

Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820390/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0001'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 1 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   69 - '1-overlapper/001/000001.ovb'
Bucketizing input    2 out of   69 - '1-overlapper/001/000008.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

1-bucketize.1820390_2.out:

Running job 2 based on SLURM_ARRAY_TASK_ID=2 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820391/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0002'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 2 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   70 - '1-overlapper/001/000002.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

1-bucketize.1820390_3.out:

Running job 3 based on SLURM_ARRAY_TASK_ID=3 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820392/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0003'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 3 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   69 - '1-overlapper/001/000003.ovb'
Bucketizing input    2 out of   69 - '1-overlapper/001/000010.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

1-bucketize.1820390_4.out:

Running job 4 based on SLURM_ARRAY_TASK_ID=4 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820393/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0004'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 4 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   74 - '1-overlapper/001/000004.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

1-bucketize.1820390_5.out:

Running job 5 based on SLURM_ARRAY_TASK_ID=5 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820394/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0005'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 5 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   67 - '1-overlapper/001/000005.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

1-bucketize.1820390_6.out:

Running job 6 based on SLURM_ARRAY_TASK_ID=6 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820395/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0006'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 6 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   69 - '1-overlapper/001/000006.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

1-bucketize.1820390_7.out:

Running job 7 based on SLURM_ARRAY_TASK_ID=7 and offset=0.

Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820396/slurm_script: line 69: [: unlimited: integer expression expected
  Max processes per user limited to 1024, no increase possible.
  Max open files limited to 819200, no increase possible.

Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0007'.

Opened '../out.seqStore' with 18033145 reads.

Constructing slice 7 for store './out.ovlStore.BUILDING'.
 - Filtering overlaps over 1.0000 fraction error.

Bucketizing input    1 out of   69 - '1-overlapper/001/000007.ovb'
Bucketizing input    2 out of   69 - '1-overlapper/001/000014.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()

And I get an error when using overlapConvert:

overlapConvert -S ../out.seqStore -coords 1-overlapper/001/000005.ovb > tt
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

But 000001.ovb is all right. Does this mean that part of my overlap result is wrong?

skoren commented 5 years ago

Yes, it seems you have corrupted files for part of your result. Are you sure you didn't run out of space?

To continue, you'd have to identify all the bad ovb files, remove them, remove the out.ovlStore.BUILDING folder and re-start Canu. Confirm you have space available though, you can remove some of the earlier output from Canu (trimming/*.ovlStore for example) if needed.

SixPlusSeven commented 5 years ago

Thanks! I'll check all *.ovb by overlapConvert. And can I know why this happened? I hope to avoid this happening in the future.

skoren commented 5 years ago

No way to know why this happened, the program did not get any errors from the OS or file system so either the corruption happened after you were done running or you ran out of space as I suggested before in which case linux is not very good at returning error messages correctly.

SixPlusSeven commented 5 years ago

May not be caused by disk space:

Disk quotas for user :
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
    /PARA/   4.25T   40.5T  40.51T       - 1118959       0       0       -

Do I need to delete the oc files and stats files in 1-overlapper/001/ folder at the same time?

skoren commented 5 years ago

You can remove the oc files, though it isn't required, if the ovb files are missing the oc will get replaced.

SixPlusSeven commented 5 years ago

I identify all the bad ovb files(18 in 487), remove them, remove the out.ovlStore.BUILDING folder. remove the out.ovlStore.config and out.ovlStore.config.txt and re-start Canu. But canu do not rerun the overlaper. Should I run the wrong overlap manually?

SixPlusSeven commented 5 years ago

This problem is solved by rerun the error overlap. Thanks!

oushujun commented 4 years ago

For what's worth, here are the command lines I used to find out corrupted ovb files:

# enter the log folder cd out.ovlStore.BUILDING/log

# find out unsuccessful files for i in *out; do echo -n "$i "; tail -1 $i; done|grep -v Success|awk '{print $1}' > fail.list

# get the failed ovb file name and ID for i incat fail.list; do grep Bucketizing $i|tail -1|perl -nle 's/.*\///; s/\x27//; my $num=$1 if /0+([0-9]+).ovb/; print "$_\t$num"'; done|sort -u > fail.ovb

# rerun the failed ovb files in unitigging/1-overlapper (the numeric string following -a is the failed ovb ID in the second column of the fail.ovb file. (This line should be different if your job manager is qsub) sbatch --mem-per-cpu=5536m --cpus-per-task=16 -t 35:00:00 -o overlap.%A_%a.out -Dpwd-J "utgovl_ab10" -a 26,27,42,44,46,62,64,103 ./overlap.sh 0

Hope this helps!

Shujun