openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
867 stars 210 forks source link

Core dump segfault Virtuoso 7.1 #150

Open ghost opened 10 years ago

ghost commented 10 years ago

Hi,

My Virtuoso crashed with a segfault and I don't know why. Here all the message and analyse that I can provide on the error. Syslog message :

Mar 13 14:36:32 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222175.862511] virtuoso-t[23768]: segfault at 7f7be1506dc0 ip 00000000006785be sp 00007f7d70153ae0 error 4 in virtuoso-t[400000+b5d000]
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808728] INFO: task virtuoso-t:23753 blocked for more than 120 seconds.
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808767] virtuoso-t    D 0000000000000001     0 23753      1 0x00000000
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808773]  ffff881fd41f7ce8 0000000000000086 0000000000015e00 0000000000015e00
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808778]  ffff881fcd2d9ad0 ffff881fd41f7fd8 0000000000015e00 ffff881fcd2d9700
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808782]  0000000000015e00 ffff881fd41f7fd8 0000000000015e00 ffff881fcd2d9ad0
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808787] Call Trace:
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808802]  [<ffffffff8106c075>] exit_mm+0x95/0x150
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808806]  [<ffffffff8106c375>] do_exit+0x135/0x390
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808813]  [<ffffffff8156067e>] ? _spin_lock+0xe/0x20
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808818]  [<ffffffff8106c625>] do_group_exit+0x55/0xd0
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808825]  [<ffffffff8107cf37>] get_signal_to_deliver+0x1d7/0x3d0
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808833]  [<ffffffff81012a05>] do_signal+0x75/0x1c0
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808839]  [<ffffffff81097a52>] ? futex_wake+0x112/0x130
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808843]  [<ffffffff81099f19>] ? do_futex+0xc9/0x1b0
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808847]  [<ffffffff81012bad>] do_notify_resume+0x5d/0x80
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808855]  [<ffffffff8114773a>] ? sys_write+0x7a/0x80
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808859]  [<ffffffff8101343e>] int_signal+0x12/0x17
Mar 13 14:39:46 bdcbdd001.conso.qualif.gen01.ke.p.fti.net kernel: [4222369.808863] INFO: task virtuoso-t:23762 blocked for more than 120 seconds.

And GDB analyse :

/ke/local/toolchain3-x86_64-nptl/tools/bin/gdb -se /usr/bin/virtuoso-t -c virtuoso-t.1394717792.23753.11.bdcbdd001.conso.qualif.gen01.ke.p.fti.net
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/virtuoso-t...(no debugging symbols found)...done.
[New Thread 23768]
[New Thread 23771]
[New Thread 23774]
[New Thread 23779]
[New Thread 23777]
[New Thread 23773]
[New Thread 23819]
[New Thread 23778]
[New Thread 23769]
[New Thread 23770]
[New Thread 23797]
[New Thread 23780]
[New Thread 23859]
[New Thread 23867]
[New Thread 23766]
[New Thread 23869]
[New Thread 23818]
[New Thread 23824]
[New Thread 23775]
[New Thread 23776]
[New Thread 23784]
[New Thread 4992]
[New Thread 23786]
[New Thread 32359]
[New Thread 23799]
[New Thread 23829]
[New Thread 23787]
[New Thread 23763]
[New Thread 23783]
[New Thread 23872]
[New Thread 23820]
[New Thread 19147]
[New Thread 23781]
[New Thread 23822]
[New Thread 23821]
[New Thread 23868]
[New Thread 23860]
[New Thread 23772]
[New Thread 23836]
[New Thread 23800]
[New Thread 23828]
[New Thread 23785]
[New Thread 23782]
[New Thread 23823]
[New Thread 23825]
[New Thread 23765]
[New Thread 23863]
[New Thread 23871]
[New Thread 23767]
[New Thread 23801]
[New Thread 23832]
[New Thread 23753]
[New Thread 23841]
[New Thread 23788]
[New Thread 23789]
[New Thread 23826]
[New Thread 23833]
[New Thread 23764]
[New Thread 23862]
[New Thread 23870]
[New Thread 23817]
[New Thread 23803]
[New Thread 23840]
[New Thread 23790]
[New Thread 23848]
[New Thread 19148]
[New Thread 23794]
[New Thread 23827]
[New Thread 23837]
[New Thread 25985]
[New Thread 23861]
[New Thread 23866]
[New Thread 23804]
[New Thread 23844]
[New Thread 23816]
[New Thread 23791]
[New Thread 23853]
[New Thread 23798]
[New Thread 23846]
[New Thread 23830]
[New Thread 23814]
[New Thread 23874]
[New Thread 23858]
[New Thread 23805]
[New Thread 23851]
[New Thread 23865]
[New Thread 23792]
[New Thread 23864]
[New Thread 23815]
[New Thread 23802]
[New Thread 23849]
[New Thread 23831]
[New Thread 23810]
[New Thread 23855]
[New Thread 23856]
[New Thread 23807]
[New Thread 23793]
[New Thread 23806]
[New Thread 23857]
[New Thread 23813]
[New Thread 23834]
[New Thread 23875]
[New Thread 23762]
[New Thread 23808]
[New Thread 23854]
[New Thread 23795]
[New Thread 23873]
[New Thread 23812]
[New Thread 23835]
[New Thread 23811]
[New Thread 23796]
[New Thread 23809]
[New Thread 23852]
[New Thread 23838]
[New Thread 23850]
[New Thread 23847]
[New Thread 23845]
[New Thread 23843]
[New Thread 23839]
[New Thread 23842]

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libssl.so.0.9.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.0.9.8
Reading symbols from /lib/libcrypto.so.0.9.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.0.9.8
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib/virtuoso/hosting/hosting_perl.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/virtuoso/hosting/hosting_perl.so
Reading symbols from /usr/lib/libperl.so.5.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libperl.so.5.10
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `/usr/bin/virtuoso-t +wait +configfile /etc/virtuoso/virtuoso.ini +debug'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000006785be in ?? ()
(gdb)

Virtuoso version :

Virtuoso version 07.10.3207 on Linux (x86_64-pc-linux-gnu), Single Server Edition

Hope these Informations will be usefull.

Best.

Julien.

HughWilliams commented 10 years ago

Can you type "bt" or "backtrace" at the "gdb" prompt to rewind the stack to see the function call leading up to the crash ?

If your Virtuoso binary is not stripped ie symbols removed (" file virtuoso-t ") you may need to rebuild without stripping as follows:

  1. Edit ~//Makefile
  2. To CONFIGURE_ARGS add --with-debug
  3. To CONFIGURE_ENV prepend CC='cc -g’
  4. Then do "make clean all deinstall reinstall” to build a new debug unstripped binary (virtuoso-t)
ghost commented 10 years ago

Here the bt result :

(gdb) bt
#0  0x00000000006785be in ?? ()
#1  0x0000000000657c7e in ?? ()
#2  0x00000000006183b9 in ?? ()
#3  0x000000000061f053 in table_source_input ()
#4  0x000000000061872a in ?? ()
#5  0x00000000006189f1 in ?? ()
#6  0x000000000061f38f in table_source_input ()
#7  0x00000000005e8162 in ?? ()
#8  0x0000000000654600 in aq_qr_func ()
#9  0x0000000000462e7d in aq_thread_func ()
#10 0x0000000000930d2f in ?? ()
#11 0x00007f7da3d889ca in start_thread () from /lib/libpthread.so.0
#12 0x00007f7da365e21d in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()

(gdb) frame 0x00000000006785be
#0  0x0000000000000000 in ?? ()

(gdb) info locals 
No symbol table info available.

We have a big dump of around 47Gb and 1.3Gb compressed. I can provide it too.

HughWilliams commented 10 years ago

The binary is stripped and thus needs to be rebuilt with the symbols in place to the the missing symbol info in the back trace ...

ghost commented 10 years ago

Ok, but the thing is I don't know how to reproduce the segfault to remake the call trace. Is-it a problem ?

HughWilliams commented 10 years ago

If it is not instantly reproducible then just leave it running with the debug binary and next time it does crash the resultant core will have the back trace with symbol info.

ghost commented 10 years ago

Ok I do that !