intel / murphy

Resource Policy
Other
18 stars 15 forks source link

crash - double free? #6

Open pohly opened 10 years ago

pohly commented 10 years ago

In the SyncEvolution nightly testing, murphyd randomly crashes. This breaks roughly one out of five test runs.

I managed to catch one such crash in gdb. Looks like memory corruption in response to a client disconnecting from D-Bus without freeing its resources. In the nightly testing, I rely on that at a lot, so it occurs frequently.

I am still using the (now outdated) 98c7740638d4af4c code revision. I'll update to afd08b5004f28b and check whether the problem still occurs.

Program received signal SIGABRT, Aborted. 0x00007f4a61f9d3d5 in raise () from /lib64/libc.so.6 Thread 1 (Thread 0x7f4a649e0700 (LWP 8481)):

0 0x00007f4a61f9d3d5 in raise () from /lib64/libc.so.6

1 0x00007f4a61f9e858 in abort () from /lib64/libc.so.6

2 0x00007f4a61fdd44b in __libc_message () from /lib64/libc.so.6

3 0x00007f4a61fe2fc6 in malloc_printerr () from /lib64/libc.so.6

4 0x00007f4a61fe3d43 in _int_free () from /lib64/libc.so.6

5 0x00007f4a63076013 in free_entry (free=0, entry=0xae9230, ht=) at common/hashtbl.c:143

6 mrp_htbl_remove (ht=0xae9a00, key=0x1ec, free=free@entry=0) at common/hashtbl.c:282

7 0x00007f4a63f607d9 in remove_from_id_hash (id=) at resource/resource-lua.c:587

8 mrp_resource_lua_unregister_resource_set (rset=rset@entry=0xb9a8a0) at resource/resource-lua.c:231

9 0x00007f4a63f57649 in mrp_resource_set_destroy (rset=0xb9a8a0) at resource/resource-set.c:134

10 0x000000000040a91a in destroy_rset (rset=0xba5440) at plugins/plugin-resource-dbus.c:910

11 0x00007f4a63075ff8 in free_entry (free=1, entry=0xb9cd90, ht=) at common/hashtbl.c:142

12 mrp_htbl_remove (ht=0xaf2aa0, key=0xb52e30, free=free@entry=1) at common/hashtbl.c:282

13 0x000000000040ba3e in dbus_name_cb (user_data=0xba5440, dbus=, name=,

up=<optimized out>, owner=<optimized out>) at plugins/plugin-resource-dbus.c:1041

14 dbus_name_cb (dbus=, name=, up=, owner=,

user_data=0xba5440) at plugins/plugin-resource-dbus.c:1030

15 0x00007f4a645c92ea in name_owner_change_cb (msg=, dbus=0xaedd70, data=)

at common/dbus.c:494

16 name_owner_change_cb (dbus=0xaedd70, msg=, data=) at common/dbus.c:456

17 0x00007f4a645c8fb0 in dispatch_signal (c=, msg=0xbaee10, data=0xaedd70) at common/dbus.c:1093

18 0x00007f4a6438e57e in dbus_connection_dispatch () from /lib64/libdbus-1.so.3

19 0x00007f4a645cc66d in pump_cb (d=0xaee610, user_data=0xaef570) at common/dbus-glue.c:306

20 0x00007f4a63079eca in dispatch_deferred (ml=0xada570) at common/mainloop.c:1837

21 mrp_mainloop_dispatch (ml=ml@entry=0xada570) at common/mainloop.c:2015

22 0x00007f4a6307af61 in mrp_mainloop_iterate (ml=ml@entry=0xada570) at common/mainloop.c:2039

23 0x00007f4a6307af90 in mrp_mainloop_run (ml=0xada570) at common/mainloop.c:2046

24 0x00000000004053b4 in run_mainloop (ctx=0xada4a0) at daemon/daemon.c:241

25 main (argc=, argv=, envp=0x7fffbac83da8) at daemon/daemon.c:292

ipuustin commented 10 years ago

Thanks for the bug report! If you update to the later version and still see the bug, could you consider attaching here a valgrind log? That should find the double frees effectively. Just install valgrind to your machine and run murphy with the following command line (or similar):

murphyd -f -vvvv -V --log-file=valgrind.log --leak-check=full --num-callers=40

File valgrind.log in your working directory should then contain the errors (and the memory leaks). If you can't reproduce the bug, remove --leak-check=full to improve the timings a bit.

pohly commented 10 years ago

I managed to catch the error under valgrind:

==2604== Invalid write of size 8 ==2604== at 0x69FFFB5: luaV_gettable (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F2464: lua_getfield (in /usr/lib64/liblua.so.5.1) ==2604== by 0x65D8715: mrp_lua_create_object (object.c:483) ==2604== by 0x54AF7ED: mrp_resource_lua_register_resource_set (resource-lua.c:222) ==2604== by 0x54A4D72: mrp_resource_set_create (resource-set.c:121) ==2604== by 0x40D29C: create_rset (plugin-resource-dbus.c:1042) ==2604== by 0x40F3DD: mgr_cb (plugin-resource-dbus.c:1905) ==2604== by 0x4E3ACA6: dispatch_method (dbus-libdbus.c:1146) ==2604== by 0x5065364: ??? (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x50575CF: dbus_connection_dispatch (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x4E3DD57: pump_cb (dbus-libdbus-glue.c:306) ==2604== by 0x63A3E1A: dispatch_deferred (mainloop.c:1835) ==2604== by 0x63A4BD2: mrp_mainloop_dispatch (mainloop.c:2013) ==2604== by 0x63A4C6C: mrp_mainloop_iterate (mainloop.c:2037) ==2604== by 0x63A4CA5: mrp_mainloop_run (mainloop.c:2044) ==2604== by 0x405800: run_mainloop (daemon.c:241) ==2604== by 0x405990: main (daemon.c:292) ==2604== Address 0x81e3820 is 0 bytes after a block of size 864 alloc'd ==2604== at 0x4C2C48E: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==2604== by 0x69FA26B: luaMrealloc (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F5FF8: luaD_reallocstack (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F8149: ??? (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F8BDC: luaC_step (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F210C: lua_pushlstring (in /usr/lib64/liblua.so.5.1) ==2604== by 0x65D8A66: mrp_lua_destroy_object (object.c:566) ==2604== by 0x54AF8E1: mrp_resource_lua_unregister_resource_set (resource-lua.c:237) ==2604== by 0x54A4DB4: mrp_resource_set_destroy (resource-set.c:138) ==2604== by 0x40CE2B: destroy_rset (plugin-resource-dbus.c:941) ==2604== by 0x40D3F1: htbl_free_rsets (plugin-resource-dbus.c:1084) ==2604== by 0x639F08A: free_entry (hashtbl.c:142) ==2604== by 0x639F518: mrp_htbl_remove (hashtbl.c:282) ==2604== by 0x40D3A5: dbus_name_cb (plugin-resource-dbus.c:1072) ==2604== by 0x4E38D69: name_owner_change_cb (dbus-libdbus.c:525) ==2604== by 0x4E3B082: dispatch_signal (dbus-libdbus.c:1207) ==2604== by 0x505757D: dbus_connection_dispatch (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x4E3DD57: pump_cb (dbus-libdbus-glue.c:306) ==2604== by 0x63A3E1A: dispatch_deferred (mainloop.c:1835) ==2604== by 0x63A4BD2: mrp_mainloop_dispatch (mainloop.c:2013) ... --2604-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --2604-- si_code=1; Faulting address: 0x11F464C8; sp: 0x402899da0

There are more errors like the first one before the crash. I can't attach the full log here (only image attachements supported?!).

This is on some kind of OpenSUSE, using lua 5.1.5.

The distro also has lua 5.2.5 but at least when I checked, murphy did not compile against that.

pohly commented 10 years ago

My murphy is afd08b50.

ipuustin commented 10 years ago

Thanks! I'll take a look. Krisztian recently committed patches which make Murphy compile against Lua5.2.

klihub commented 10 years ago

Thanks,

Hmm... do those other errors you mentioned have exactly identical stack fingerprints ? I'd be especially interested in the first one...

Cheers, Krisztian

PS. Just in case, could you send the full log to my personal e-mail address (kli@iki.fi) ? Maybe I can spot something and get enlightened...

On Tue, Mar 18, 2014 at 4:38 PM, Patrick Ohly notifications@github.comwrote:

I managed to catch the error under valgrind:

==2604== Invalid write of size 8 ==2604== at 0x69FFFB5: luaV_gettable (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F2464: lua_getfield (in /usr/lib64/liblua.so.5.1) ==2604== by 0x65D8715: mrp_lua_create_object (object.c:483) ==2604== by 0x54AF7ED: mrp_resource_lua_register_resource_set (resource-lua.c:222) ==2604== by 0x54A4D72: mrp_resource_set_create (resource-set.c:121) ==2604== by 0x40D29C: create_rset (plugin-resource-dbus.c:1042) ==2604== by 0x40F3DD: mgr_cb (plugin-resource-dbus.c:1905) ==2604== by 0x4E3ACA6: dispatch_method (dbus-libdbus.c:1146) ==2604== by 0x5065364: ??? (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x50575CF: dbus_connection_dispatch (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x4E3DD57: pump_cb (dbus-libdbus-glue.c:306) ==2604== by 0x63A3E1A: dispatch_deferred (mainloop.c:1835) ==2604== by 0x63A4BD2: mrp_mainloop_dispatch (mainloop.c:2013) ==2604== by 0x63A4C6C: mrp_mainloop_iterate (mainloop.c:2037) ==2604== by 0x63A4CA5: mrp_mainloop_run (mainloop.c:2044) ==2604== by 0x405800: run_mainloop (daemon.c:241) ==2604== by 0x405990: main (daemon.c:292) ==2604== Address 0x81e3820 is 0 bytes after a block of size 864 alloc'd ==2604== at 0x4C2C48E: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==2604== by 0x69FA26B: luaMrealloc (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F5FF8: luaD_reallocstack (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F8149: ??? (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F8BDC: luaC_step (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F210C: lua_pushlstring (in /usr/lib64/liblua.so.5.1) ==2604== by 0x65D8A66: mrp_lua_destroy_object (object.c:566) ==2604== by 0x54AF8E1: mrp_resource_lua_unregister_resource_set (resource-lua.c:237) ==2604== by 0x54A4DB4: mrp_resource_set_destroy (resource-set.c:138) ==2604== by 0x40CE2B: destroy_rset (plugin-resource-dbus.c:941) ==2604== by 0x40D3F1: htbl_free_rsets (plugin-resource-dbus.c:1084) ==2604== by 0x639F08A: free_entry (hashtbl.c:142) ==2604== by 0x639F518: mrp_htbl_remove (hashtbl.c:282) ==2604== by 0x40D3A5: dbus_name_cb (plugin-resource-dbus.c:1072) ==2604== by 0x4E38D69: name_owner_change_cb (dbus-libdbus.c:525) ==2604== by 0x4E3B082: dispatch_signal (dbus-libdbus.c:1207) ==2604== by 0x505757D: dbus_connection_dispatch (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x4E3DD57: pump_cb (dbus-libdbus-glue.c:306) ==2604== by 0x63A3E1A: dispatch_deferred (mainloop.c:1835) ==2604== by 0x63A4BD2: mrp_mainloop_dispatch (mainloop.c:2013) ... --2604-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV)

  • exiting --2604-- si_code=1; Faulting address: 0x11F464C8; sp: 0x402899da0

There are more errors like the first one before the crash. I can't attach the full log here (only image attachements supported?!).

This is on some kind of OpenSUSE, using lua 5.1.5.

The distro also has lua 5.2.5 but at least when I checked, murphy did not compile against that.

— Reply to this email directly or view it on GitHubhttps://github.com/01org/murphy/issues/6#issuecomment-37939784 .

klihub commented 10 years ago

Hi,

I pushed a workaround for the bug causing this. Could you give it a spin and check if you can still reproduce the problem ? The fix in question is commit 7daa1664de56063a90257b1f8825e66c2893d539. Pulling in 01.org/mastershould give you that. Thanks in advance,

Krisztian

On Tue, Mar 18, 2014 at 4:38 PM, Patrick Ohly notifications@github.comwrote:

I managed to catch the error under valgrind:

==2604== Invalid write of size 8 ==2604== at 0x69FFFB5: luaV_gettable (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F2464: lua_getfield (in /usr/lib64/liblua.so.5.1) ==2604== by 0x65D8715: mrp_lua_create_object (object.c:483) ==2604== by 0x54AF7ED: mrp_resource_lua_register_resource_set (resource-lua.c:222) ==2604== by 0x54A4D72: mrp_resource_set_create (resource-set.c:121) ==2604== by 0x40D29C: create_rset (plugin-resource-dbus.c:1042) ==2604== by 0x40F3DD: mgr_cb (plugin-resource-dbus.c:1905) ==2604== by 0x4E3ACA6: dispatch_method (dbus-libdbus.c:1146) ==2604== by 0x5065364: ??? (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x50575CF: dbus_connection_dispatch (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x4E3DD57: pump_cb (dbus-libdbus-glue.c:306) ==2604== by 0x63A3E1A: dispatch_deferred (mainloop.c:1835) ==2604== by 0x63A4BD2: mrp_mainloop_dispatch (mainloop.c:2013) ==2604== by 0x63A4C6C: mrp_mainloop_iterate (mainloop.c:2037) ==2604== by 0x63A4CA5: mrp_mainloop_run (mainloop.c:2044) ==2604== by 0x405800: run_mainloop (daemon.c:241) ==2604== by 0x405990: main (daemon.c:292) ==2604== Address 0x81e3820 is 0 bytes after a block of size 864 alloc'd ==2604== at 0x4C2C48E: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==2604== by 0x69FA26B: luaMrealloc (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F5FF8: luaD_reallocstack (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F8149: ??? (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F8BDC: luaC_step (in /usr/lib64/liblua.so.5.1) ==2604== by 0x69F210C: lua_pushlstring (in /usr/lib64/liblua.so.5.1) ==2604== by 0x65D8A66: mrp_lua_destroy_object (object.c:566) ==2604== by 0x54AF8E1: mrp_resource_lua_unregister_resource_set (resource-lua.c:237) ==2604== by 0x54A4DB4: mrp_resource_set_destroy (resource-set.c:138) ==2604== by 0x40CE2B: destroy_rset (plugin-resource-dbus.c:941) ==2604== by 0x40D3F1: htbl_free_rsets (plugin-resource-dbus.c:1084) ==2604== by 0x639F08A: free_entry (hashtbl.c:142) ==2604== by 0x639F518: mrp_htbl_remove (hashtbl.c:282) ==2604== by 0x40D3A5: dbus_name_cb (plugin-resource-dbus.c:1072) ==2604== by 0x4E38D69: name_owner_change_cb (dbus-libdbus.c:525) ==2604== by 0x4E3B082: dispatch_signal (dbus-libdbus.c:1207) ==2604== by 0x505757D: dbus_connection_dispatch (in /lib64/libdbus-1.so.3.7.2) ==2604== by 0x4E3DD57: pump_cb (dbus-libdbus-glue.c:306) ==2604== by 0x63A3E1A: dispatch_deferred (mainloop.c:1835) ==2604== by 0x63A4BD2: mrp_mainloop_dispatch (mainloop.c:2013) ... --2604-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV)

  • exiting --2604-- si_code=1; Faulting address: 0x11F464C8; sp: 0x402899da0

There are more errors like the first one before the crash. I can't attach the full log here (only image attachements supported?!).

This is on some kind of OpenSUSE, using lua 5.1.5.

The distro also has lua 5.2.5 but at least when I checked, murphy did not compile against that.

— Reply to this email directly or view it on GitHubhttps://github.com/01org/murphy/issues/6#issuecomment-37939784 .

pohly commented 10 years ago

I'm using 7daa166 now. I'll let you know if and/or when I see the issue again. However, remember that it happened only occasionally before, so I can't say for sure whether it is gone.