Closed sunnycqcn closed 7 years ago
The fact that all the jobs fail and in <1s makes me suspect there is an issue with the input from the previous step. Is this assembly completely run with Canu 1.4? Canu 1.4 is not backwards compatible with 1.3 so if you ran part of the assembly with 1.3 you cannot switch to 1.4. What is the output in the /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.ovlStore.BUILDING/logs/1-bucketize.*.out
files?
Hi Koren, I only used Canu 1.4 for assembly. Thanks,
Running job 1 based on command line options. Removing incomplete bucket /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.ovlStore.BUILDING/create0001 /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.ovlStore.BUILDING/scripts/1-bucketize.sh: line 126: [: unlimited: integer expression expected Max processes per user limited to unlimited, no increase possible. Max open files limited to 32768, no increase possible.
maxError fraction: 1.000 percent: 100.000 encoded: 4095 Bucketizing /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/1-overlapper/results/000001.ovb
Failed with 'Segmentation fault'
Backtrace (mangled):
/home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer(_Z17AS_UTL_catchCrashiP7siginfoPv+0x27)[0x404b97] /lib64/libpthread.so.0[0x34e5c0f7e0] /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer[0x40afae] /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer[0x4027ed] /lib64/libc.so.6(__libc_start_main+0xfd)[0x34e501ed1d] /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer[0x402159]
Backtrace (demangled):
[0] /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer::AS_UTL_catchCrash(int, siginfo, void) + 0x27 [0x404b97] [1] /lib64/libpthread.so.0() [0x34e5c0f7e0] [2] /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer() [0x40afae] [3] /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer() [0x4027ed] [4] /lib64/libc.so.6::(null) + 0xfd [0x34e501ed1d] [5] /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/ovStoreBucketizer() [0x402159]
GDB:
/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.ovlStore.BUILDING/scripts/1-bucketize.sh: line 173: 30974 Segmentation fault (core dumped) $bin/ovStoreBucketizer -O /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.ovlStore.BUILDING -G /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.gkpStore -C /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.ovlStore.BUILDING/config -job $jobid -i $jn
On Wed, Dec 28, 2016 at 7:27 AM, Sergey Koren notifications@github.com wrote:
The fact that all the jobs fail and in <1s makes me suspect there is an issue with the input from the previous step. Is this assembly completely run with Canu 1.4? Canu 1.4 is not backwards compatible with 1.3 so if you ran part of the assembly with 1.3 you cannot switch to 1.4. What is the output in the /scratch/snyder/f/fu115/Genome_assembly/PBonly/ canutest/strigaA/correction/asm.ovlStore.BUILDING/logs/1-bucketize.*.out files?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/319#issuecomment-269476962, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKJAUU9y0x5cw0HJCLwA2Tmh4sRuXks5rMmPbgaJpZM4LW45G .
-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA
I have a very similar problem. I've seen it with 1.4, but not with the same data and 1.3 (no, I'm not switching mid-run between the two).
With gdb I got to:
with: gkStore_getRead (this=0xbcf5d0, foverlap=..., roverlap=...) at stores/gkStore.H:502 502 (_readIDtoPartitionID[id] != _partitionID)) {
the _readIDtoPartitionID was indeed not a valid address.
Hoping that narrows the search some. Thanks!
@frogsicle is that the line GDB reported for the crash? _readIDtoPartitionID should be NULL (or 0) because partitions are not used here. Is it a different value?
@sunnycqcn are you able to share your /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/asm.gkpStore
and /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/strigaA/correction/1-overlapper/results/000001.ovb
files? You can upload them to our ftp site here:ftp://ftp.cbcb.umd.edu/incoming/sergek/
So far as I understand gdb, yes.
(gdb) print _readsPerPartition
$3 = (uint32 *) 0x676f6c2f474e4944
(gdb) print *_readsPerPartition
Cannot access memory at address 0x676f6c2f474e4944
And in case it slipped a line (due to me adding a print statement or such):
500 gkRead *gkStore_getRead(uint32 id) {
501 if ((_readIDtoPartitionID) &&
502 (_readIDtoPartitionID[id] != _partitionID)) {
_readsPerPartition should not be used in line 502. What's the value of _readIDtoPartitionID?
Can you share your asm.gkpStore and any of the ovb/counts files to our FTP site so I can see if I can reproduce the error locally?
oops, sorry grabbed the wrong one, but same idea:
(gdb) print _readIDtoPartitionID
$1 = (uint32 *) 0x36312e657a697465
(gdb) print *_readIDtoPartitionID
Cannot access memory at address 0x36312e657a697465
where's the ftp site?
Hmm very interesting, I confirmed this should be initialized to NULL, what does
ls -lha correction/asm.gkpStore
show? The ftp site is: tp://ftp.cbcb.umd.edu/incoming/sergek/
$ ls -lha correction/Sp40d_02.gkpStore
total 12G
drwxrwxr-x. 2 alisandra alisandra 15 23. Dez 17:10 .
drwxrwxr-x. 6 alisandra alisandra 8 23. Dez 22:47 ..
-rw-rw-r--. 1 alisandra alisandra 11G 23. Dez 17:09 blobs
-rw-rw-r--. 1 alisandra alisandra 65M 23. Dez 17:09 errorLog
-rw-rw-r--. 1 alisandra alisandra 56 23. Dez 17:09 info
-rw-rw-r--. 1 alisandra alisandra 280 23. Dez 17:09 info.txt
-rw-rw-r--. 1 alisandra alisandra 5,3K 23. Dez 17:09 libraries
-rw-rw-r--. 1 alisandra alisandra 2,5K 23. Dez 17:10 libraries.txt
-rw-rw-r--. 1 alisandra alisandra 8,9K 23. Dez 17:09 load.dat
-rw-rw-r--. 1 alisandra alisandra 2,4K 23. Dez 17:10 readlengthhistogram.txt
-rw-rw-r--. 1 alisandra alisandra 748 23. Dez 17:10 readlengths.gp
-rw-rw-r--. 1 alisandra alisandra 19M 23. Dez 17:10 readlengths.txt
-rw-rw-r--. 1 alisandra alisandra 241M 23. Dez 17:09 readNames.txt
-rw-rw-r--. 1 alisandra alisandra 54M 23. Dez 17:09 reads
-rw-rw-r--. 1 alisandra alisandra 97M 23. Dez 17:10 reads.txt
and thanks, I'll go upload stuff.
Thanks, while it's uploading, can you set a breakpoint at gkStore.C:650 and print the values of path, partID, and gkStore_mode, and partID==UINT32_MAX
(gdb) print partID
$1 = 4294967295
(gdb) print path
$2 = 0x7fff3cd578f3 "/mnt/data/alisandra/Lost/canu/Sp40d_02/correction/Sp40d_02.gkpStore"
(gdb) print mode
$3 = gkStore_readOnly
(gdb) print UINT32_MAX
No symbol "UINT32_MAX" in current context.
Via the (uncommented) logs, it ends up inside the first if statement, though, so I guess that's a True.
[gkStore()-- opening '/mnt/data/alisandra/Lost/canu/Sp40d_02/correction/Sp40d_02.gkpStore' for read-only access.]
I tried some times. I got the same problem.
On Wed, Dec 28, 2016 at 12:15 PM, frogsicle notifications@github.com wrote:
(gdb) print partID $1 = 4294967295 (gdb) print path $2 = 0x7fff3cd578f3 "/mnt/data/alisandra/Lost/canu/Sp40d_02/correction/ Sp40d_02.gkpStore" (gdb) print mode $3 = gkStore_readOnly (gdb) print UINT32_MAX No symbol "UINT32_MAX" in current context.
Via the (uncommented) logs, it ends up inside the first if statement, though, so I guess that's a True.
[gkStore()-- opening '/mnt/data/alisandra/Lost/canu/Sp40d_02/correction/Sp40d_02.gkpStore' for read-only access.]
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/319#issuecomment-269516811, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKEAeR6XtRT2H9GYiIPNVcvujIlcpks5rMqcrgaJpZM4LW45G .
-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA
I can confirm with the exact same parameters the overlap bucketizing runs without error on our systems. Are you using Canu 1.4 compiled from source or a pre-compiled binary? If it is pre-compiled, try building it locally from source in debug mode (make BUILDDEBUG=1) and see if it runs then.
@frogsicle can you post the output from uname -a
and gcc --version
. Also for the crash, print the values of _partitionID
and _numberOfPartitions
. I want to see exactly why your store seems to think it should be partitioned when it is not.
Finally, can you try running the attached binary, it has (a lot) of debugging information added so should hopefully better capture when the crash is happening. ovStoreBucketizer.gz
So, an update: For completeness: $ uname -a Linux kerry.plabipd.de 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)
Further examination indicates that _readIDtoPartitionID is indeed NULL for the first gkStore_getNumReads() + 1 iterations. Then it changes to not NULL but not valid, and the seg fault occurs. Similarly _partitionID and _numberOfPartitions start at 0; but right before the seg fault they get an invalid memory address. Indeed the whole gkStore instance seemed to go from valid -> invalid
While attemtping to figure out what was going on, I added a print statement to track addresses of various gkStore references in ovStoreBucketizer.C; and to my surprise it ran through. Playing with this further indicated the print statements were changing the address where things were stored (ASLR dissabled for trouble shooting purposes). But I'm in a little over my head. I pushed the print statements in question to a fork of canu under my account so you can at least see what I'm talking about.
Let me how I can help further, and thanks again!
This should be fixed now. Thanks for the print statements. Reopen if needed.
Thanks, I will update my results. I think I can get the results today. Fuyou
Hi Brian, I did not finish my run and stop at 4-unitigger. I get the error files as following: unitigger.err:
Graph error threshold = 0.105 (10.500%) Max error threshold = 0.105 (10.500%)
Minimum overlap length = 500 bases
number of threads = 10 (command line)
==> LOADING AND FILTERING OVERLAPS.
ReadInfo()-- Using 943770 reads, no minimum read length used.
OverlapCache()-- limited to 258048MB memory (user supplied). OverlapCache()-- 3MB for read data. OverlapCache()-- 14MB for best edges. OverlapCache()-- 25MB for unitig layouts. OverlapCache()-- 0MB for tigs. OverlapCache()-- 7MB for id maps. OverlapCache()-- 57MB for error profiles. OverlapCache()-- 10MB for overlap cache pointers. OverlapCache()-- 48MB for overlap cache initial bucket. OverlapCache()-- 160MB for overlap cache thread data. OverlapCache()-- 3MB for number of overlaps per read. OverlapCache()-- 0MB for other processes. OverlapCache()-- --------- OverlapCache()-- 273MB for data structures (sum of above). OverlapCache()-- 257774MB available for overlaps.
OverlapCache()-- Loading number of overlaps per read. OverlapCache()-- Retain at least 18 overlaps/read, based on 3.04x coverage. OverlapCache()-- Initial guess at 17900 overlaps/read (maximum 19702 overlaps/read). OverlapCache()-- 17900 overlaps/read - load all for 943759 reads, some for 11 reads - 947722812 overlaps to load - 14461MB OverlapCache()-- 19702 overlaps/read - load all for 943770 reads, some for 0 reads - 947729700 overlaps to load - 14461MB
OverlapCache()-- minPer = 18 overlaps/reads OverlapCache()-- maxPer = 19702 overlaps/reads OverlapCache()-- numBelow = 943769 reads (all overlaps loaded) OverlapCache()-- numEqual = 1 reads (all overlaps loaded) OverlapCache()-- numAbove = 0 reads (some overlaps loaded) OverlapCache()-- totalLoad = 947729700 overlaps (100.00%)
OverlapCache()-- availForOverlaps = 257774MB OverlapCache()-- totalMemory = 0MB for organization OverlapCache()-- totalMemory = 14461MB for overlaps OverlapCache()-- totalMemory = 14461MB used
OverlapCache()-- Loading: overlaps processed 189 (000.00%) loaded 189 (000.00%) droppeddupe 0 (000.00%) bogart: bogart/AS_BAT_OverlapCache.C:468: uint32 OverlapCache::filterDuplicates(uint32&): Assertion `_ovs[jj].a_iid != 0' failed.
Failed with 'Aborted'
Backtrace (mangled):
/home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart(_Z17AS_UTL_catchCrashiP7siginfoPv+0x27)[0x4f0887] /lib64/libpthread.so.0[0x313b20f7e0] /lib64/libc.so.6(gsignal+0x35)[0x313a6325e5] /lib64/libc.so.6(abort+0x175)[0x313a633dc5] /lib64/libc.so.6[0x313a62b70e] /lib64/libc.so.6(assert_perror_fail+0x0)[0x313a62b7d0] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart[0x4b139e] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart[0x4b1c05] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart[0x4b2c41] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart[0x404372] /lib64/libc.so.6(libc_start_main+0xfd)[0x313a61ed1d] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart[0x403399]
Backtrace (demangled):
[0] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart::AS_UTL_catchCrash(int, siginfo, void) + 0x27 [0x4f0887] [1] /lib64/libpthread.so.0() [0x313b20f7e0] [2] /lib64/libc.so.6::(null) + 0x35 [0x313a6325e5] [3] /lib64/libc.so.6::(null) + 0x175 [0x313a633dc5] [4] /lib64/libc.so.6() [0x313a62b70e] [5] /lib64/libc.so.6::(null) + 0 [0x313a62b7d0] [6] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart() [0x4b139e] [7] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart() [0x4b1c05] [8] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart() [0x4b2c41] [9] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart() [0x404372] [10] /lib64/libc.so.6::(null) + 0xfd [0x313a61ed1d] [11] /home/fu115/DIRECTORY/canu/canu/Linux-amd64/bin/bogart() [0x403399]
GDB:
-- Starting command on Fri Dec 30 13:11:19 2016 with 3261.968 GB free disk space
/depot/bioinfo/apps/apps/canu-1.4/Linux-amd64/bin/gatekeeperCreate \
-minlength 1000 \
-o /depot/tmengist/etc/strigaB/correction/asm.gkpStore.BUILDING \
/depot/tmengist/etc/strigaB/correction/asm.gkpStore.gkp \
> /depot/tmengist/etc/strigaB/correction/asm.gkpStore.BUILDING.err 2>&1
-- Read length histogram (one '*' equals 2729.27 reads): -- 0 999 0 -- 1000 1999 191049
-- 2000 2999 171449
-- Starting concurrent execution on Fri Dec 30 13:13:14 2016 with 3253.855 GB free disk space (1 processes; 1 concurrently)
/depot/tmengist/etc/strigaB/correction/0-mercounts/meryl.sh 1 >
/depot/tmengist/etc/strigaB/correction/0-mercounts/meryl.000001.out 2>&1
-- Starting command on Fri Dec 30 13:31:21 2016 with 3164.578 GB free disk space
/depot/bioinfo/apps/apps/canu-1.4/Linux-amd64/bin/meryl \
-Dh \
-s /depot/tmengist/etc/strigaB/correction/0-mercounts/asm.ms16 \
> /depot/tmengist/etc/strigaB/correction/0-mercounts/asm.ms16.histogram
\ 2> /depot/tmengist/etc/strigaB/correction/0-mercounts/ asm.ms16.histogram.info
-- Starting concurrent execution on Fri Dec 30 13:32:22 2016 with 3230.638 GB free disk space (32 processes; 2 concurrently)
/depot/tmengist/etc/strigaB/correction/1-overlapper/precompute.sh 1 >
/depot/tmengist/etc/strigaB/correction/1-overlapper/precompute.000001.out 2>&1 ... /depot/tmengist/etc/strigaB/correction/1-overlapper/precompute.sh 32 > /depot/tmengist/etc/strigaB/correction/1-overlapper/precompute.000032.out 2>&1
-- Starting concurrent execution on Fri Dec 30 19:01:28 2016 with 3047.425 GB free disk space (77 processes; 2 concurrently)
/depot/tmengist/etc/strigaB/correction/1-overlapper/mhap.sh 1 >
/depot/tmengist/etc/strigaB/correction/1-overlapper/mhap.000001.out 2>&1 ... /depot/tmengist/etc/strigaB/correction/1-overlapper/mhap.sh 77 > /depot/tmengist/etc/strigaB/correction/1-overlapper/mhap.000077.out 2>&1
-- Starting command on Sat Dec 31 14:24:09 2016 with 2537.82 GB free disk space
/depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/scripts/0-config.sh \
/depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/config.err 2>&1
-- Starting concurrent execution on Sat Dec 31 14:24:12 2016 with 2537.796 GB free disk space (77 processes; 20 concurrently)
/depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/scripts/1-bucketize.sh 1 > /depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/logs/1-bucketize.000001.out 2>&1 ... /depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/scripts/1-bucketize.sh 77 > /depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/logs/1-bucketize.000077.out 2>&1
-- Starting concurrent execution on Sat Dec 31 14:29:23 2016 with 1864.818 GB free disk space (158 processes; 20 concurrently)
/depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/scripts/2-sort.sh 1 > /depot/tmengist/etc/strigaB/correction/asm.ovlStore.BUILDING/logs/2-sort.000001.out 2>&1
Thanks a bunch, it ran all the way through for me nicely!
Hi, When I run canu1.4, my job stoped at overlapStoreBucketize. My command is