Closed SixPlusSeven closed 5 years ago
This usually happens with disk corruption in a previous step (see also #1134, #1264) but it is suspicious that all the jobs failed. Are you out of space on disk?
If you still have space, this seems to be an intermittent issue on your system because all your previous overlap steps (correction, trimming) didn't have issues. Each of the failed jobs should report the file it last tried to open and failed. We need to validate the output from the previous step. Use the ascii overlap dump utility to scan the files. From your unitigging folder run:
overlapConvert -G asm.gkpStore -coords 1-overlapper/001/000001.ovb
for all ovb files you have. This will print something like:
299737 2 N 682 8131 8809 0 679 0.013300
289432 2 N 2527 1960 4475 0 2509 0.014700
if columns 4,5,6,7 are bigger than the read lengths in your dataset, the file is corrupt. It may also crash reading the file when it is corrupt.
I get this, so which cloumn is the read length?
overlapConvert -S ../out.seqStore -coords 1-overlapper/001/000001.ovb
310544 141312 N 9317 5350 14600 0 9238 0.019300
141312 206231 N 9136 1301 10310 0 9060 0.025900
141312 857506 N 7301 3093 10310 0 7239 0.024100
834818 141312 N 10402 8868 19183 0 10310 0.020000
141312 856493 N 9094 890 9856 0 9002 0.030600
141312 171044 N 6563 3832 10310 0 6486 0.028900
141312 843809 N 9347 1040 10310 0 9289 0.017000
338668 141312 N 10447 7231 17599 0 10310 0.025200
853958 141312 N 7404 8059 15379 0 7336 0.026100
863214 141312 N 7348 12830 20135 0 7288 0.017200
171350 141312 N 10409 1154 11489 0 10310 0.019000
256803 141312 N 9385 3678 12934 0 9238 0.036500
158118 141312 N 2087 8962 11025 0 2050 0.037600
141312 235335 N 9991 459 10310 0 9902 0.027100
309094 141312 N 2261 8686 10922 0 2222 0.036500
173939 141312 N 2934 11099 14012 0 2906 0.021000
246297 141312 N 9374 3362 12656 0 9268 0.023600
865263 141312 N 9866 14792 24581 0 9783 0.018300
141312 838421 N 7450 1317 8700 0 7383 0.021500
310880 141312 N 7228 3271 10434 0 7126 0.028500
843325 141312 N 10412 14868 25193 0 10310 0.022000
321282 141312 N 7569 2812 10318 0 7483 0.022500
......
It seems that there is no problem...
There is no column for read length, the 5350 14600 0 9238
are the coordinates of the overlap in the reads. These should not be overly large (e.g. larger than your longest read). Did you check every ovb file, the failing one in the log was 000005.ovb. What do the rest of the bucketizing failed jobs report.
Here is the rest of the bucketizing failed jobs report.
1-bucketize.1820390_1.out:
Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0.
Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820390/slurm_script: line 69: [: unlimited: integer expression expected
Max processes per user limited to 1024, no increase possible.
Max open files limited to 819200, no increase possible.
Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0001'.
Opened '../out.seqStore' with 18033145 reads.
Constructing slice 1 for store './out.ovlStore.BUILDING'.
- Filtering overlaps over 1.0000 fraction error.
Bucketizing input 1 out of 69 - '1-overlapper/001/000001.ovb'
Bucketizing input 2 out of 69 - '1-overlapper/001/000008.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()
1-bucketize.1820390_2.out:
Running job 2 based on SLURM_ARRAY_TASK_ID=2 and offset=0.
Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820391/slurm_script: line 69: [: unlimited: integer expression expected
Max processes per user limited to 1024, no increase possible.
Max open files limited to 819200, no increase possible.
Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0002'.
Opened '../out.seqStore' with 18033145 reads.
Constructing slice 2 for store './out.ovlStore.BUILDING'.
- Filtering overlaps over 1.0000 fraction error.
Bucketizing input 1 out of 70 - '1-overlapper/001/000002.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()
1-bucketize.1820390_3.out:
Running job 3 based on SLURM_ARRAY_TASK_ID=3 and offset=0.
Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820392/slurm_script: line 69: [: unlimited: integer expression expected
Max processes per user limited to 1024, no increase possible.
Max open files limited to 819200, no increase possible.
Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0003'.
Opened '../out.seqStore' with 18033145 reads.
Constructing slice 3 for store './out.ovlStore.BUILDING'.
- Filtering overlaps over 1.0000 fraction error.
Bucketizing input 1 out of 69 - '1-overlapper/001/000003.ovb'
Bucketizing input 2 out of 69 - '1-overlapper/001/000010.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()
1-bucketize.1820390_4.out:
Running job 4 based on SLURM_ARRAY_TASK_ID=4 and offset=0.
Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820393/slurm_script: line 69: [: unlimited: integer expression expected
Max processes per user limited to 1024, no increase possible.
Max open files limited to 819200, no increase possible.
Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0004'.
Opened '../out.seqStore' with 18033145 reads.
Constructing slice 4 for store './out.ovlStore.BUILDING'.
- Filtering overlaps over 1.0000 fraction error.
Bucketizing input 1 out of 74 - '1-overlapper/001/000004.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()
1-bucketize.1820390_5.out:
Running job 5 based on SLURM_ARRAY_TASK_ID=5 and offset=0.
Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820394/slurm_script: line 69: [: unlimited: integer expression expected
Max processes per user limited to 1024, no increase possible.
Max open files limited to 819200, no increase possible.
Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0005'.
Opened '../out.seqStore' with 18033145 reads.
Constructing slice 5 for store './out.ovlStore.BUILDING'.
- Filtering overlaps over 1.0000 fraction error.
Bucketizing input 1 out of 67 - '1-overlapper/001/000005.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()
1-bucketize.1820390_6.out:
Running job 6 based on SLURM_ARRAY_TASK_ID=6 and offset=0.
Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820395/slurm_script: line 69: [: unlimited: integer expression expected
Max processes per user limited to 1024, no increase possible.
Max open files limited to 819200, no increase possible.
Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0006'.
Opened '../out.seqStore' with 18033145 reads.
Constructing slice 6 for store './out.ovlStore.BUILDING'.
- Filtering overlaps over 1.0000 fraction error.
Bucketizing input 1 out of 69 - '1-overlapper/001/000006.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()
1-bucketize.1820390_7.out:
Running job 7 based on SLURM_ARRAY_TASK_ID=7 and offset=0.
Attempting to increase maximum allowed processes and open files.
/tmp/slurmd/job1820396/slurm_script: line 69: [: unlimited: integer expression expected
Max processes per user limited to 1024, no increase possible.
Max open files limited to 819200, no increase possible.
Overwriting incomplete result from presumed crashed job in directory './out.ovlStore.BUILDING/create0007'.
Opened '../out.seqStore' with 18033145 reads.
Constructing slice 7 for store './out.ovlStore.BUILDING'.
- Filtering overlaps over 1.0000 fraction error.
Bucketizing input 1 out of 69 - '1-overlapper/001/000007.ovb'
Bucketizing input 2 out of 69 - '1-overlapper/001/000014.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
utility/arrays.H::106 in setArraySize<char, long unsigned int>()
utility/arrays.H::145 in resizeArray<char, long unsigned int>()
stores/ovStoreFile.C::365 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::391 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBucketizer.C::238 in main()
(null)::0 in (null)()
(null)::0 in (null)()
And I get an error when using overlapConvert:
overlapConvert -S ../out.seqStore -coords 1-overlapper/001/000005.ovb > tt
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
But 000001.ovb is all right. Does this mean that part of my overlap result is wrong?
Yes, it seems you have corrupted files for part of your result. Are you sure you didn't run out of space?
To continue, you'd have to identify all the bad ovb files, remove them, remove the out.ovlStore.BUILDING folder and re-start Canu. Confirm you have space available though, you can remove some of the earlier output from Canu (trimming/*.ovlStore for example) if needed.
Thanks! I'll check all *.ovb by overlapConvert. And can I know why this happened? I hope to avoid this happening in the future.
No way to know why this happened, the program did not get any errors from the OS or file system so either the corruption happened after you were done running or you ran out of space as I suggested before in which case linux is not very good at returning error messages correctly.
May not be caused by disk space:
Disk quotas for user :
Filesystem used quota limit grace files quota limit grace
/PARA/ 4.25T 40.5T 40.51T - 1118959 0 0 -
Do I need to delete the oc files and stats files in 1-overlapper/001/ folder at the same time?
You can remove the oc files, though it isn't required, if the ovb files are missing the oc will get replaced.
I identify all the bad ovb files(18 in 487), remove them, remove the out.ovlStore.BUILDING folder. remove the out.ovlStore.config and out.ovlStore.config.txt and re-start Canu. But canu do not rerun the overlaper. Should I run the wrong overlap manually?
This problem is solved by rerun the error overlap. Thanks!
For what's worth, here are the command lines I used to find out corrupted ovb files:
# enter the log folder
cd out.ovlStore.BUILDING/log
# find out unsuccessful files
for i in *out; do echo -n "$i "; tail -1 $i; done|grep -v Success|awk '{print $1}' > fail.list
# get the failed ovb file name and ID
for i in
cat fail.list; do grep Bucketizing $i|tail -1|perl -nle 's/.*\///; s/\x27//; my $num=$1 if /0+([0-9]+).ovb/; print "$_\t$num"'; done|sort -u > fail.ovb
# rerun the failed ovb files in unitigging/1-overlapper (the numeric string following -a
is the failed ovb ID in the second column of the fail.ovb
file. (This line should be different if your job manager is qsub
)
sbatch --mem-per-cpu=5536m --cpus-per-task=16 -t 35:00:00 -o overlap.%A_%a.out -D
pwd-J "utgovl_ab10" -a 26,27,42,44,46,62,64,103 ./overlap.sh 0
Hope this helps!
Shujun
Here is my error log:
All of mine bucketizer jobs failed. And here is one of the detail log:
And I have about 99G files in 1-overlapper/001/, the largest one is 365M. How to deal with this problem?
Thanks, Alex