labapart / gattlib

Library to access GATT information from BLE (Bluetooth Low Energy) devices
http://labapart.com/
454 stars 160 forks source link

Segmentation fault when gattlib closes a connection #48

Open joeBlbs opened 6 years ago

joeBlbs commented 6 years ago

This is related to the post #10, which apparently has been solved. However, I am still getting similar behavior even though I am using the most recent versions of gattlib and bluez in a raspberry pi. My code is a bit different. It is a loop that connects to a bluetooth device and reads a characteristic every 2 seconds. If the read operation is not successful, it disconnects and reconnects. Sometimes on the first attempt to disconnect, sometimes later, it crashes. This is a typical stack trace: `[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". Program terminated with signal SIGSEGV, Segmentation fault.

0 0x766f8bc8 in g_source_destroy () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0

(gdb) bt

0 0x766f8bc8 in g_source_destroy () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0

1 0x76f1f37c in io_destroy (io=0x742086e8) at /home/pi/gattlib/bluez/bluez5/src/shared/io-glib.c:123

2 0x76f0b440 in bt_att_free (att=0x74204d00) at /home/pi/gattlib/bluez/bluez5/src/shared/att.c:955

3 0x76f0b8f4 in bt_att_unref (att=0x74204d00) at /home/pi/gattlib/bluez/bluez5/src/shared/att.c:1065

4 0x76ed8c00 in g_attrib_unref (attrib=0x7420a9c8) at /home/pi/gattlib/bluez/bluez5/attrib/gattrib.c:195

5 0x76ed0c24 in gattlib_disconnect (connection=0xeb2110) at /home/pi/gattlib/bluez/gattlib_connect.c:389`

As it is typically the case with segfaults, the actual problem arises from memory corruption that happened in a previous operation. I tried to dig in and found out that the code that reads a characteristic sporadically does something unusual, and almost always immediately after that I am getting an error when reading and the subsequent crash when closing the connection. Specifically, I found out that the following code in gattlib_connect.c: `GSource gattlib_watch_connection_full(GIOChannel io, GIOCondition condition, GIOFunc func, gpointer user_data, GDestroyNotify notify) { // Create a main loop source GSource *source = g_io_create_watch (io, condition); assert(source != NULL);

g_source_set_callback (source, (GSourceFunc)func, user_data, notify); /*HERE*/

// Attaches it to the main loop context
guint id = g_source_attach(source, g_gattlib_thread.loop_context);
g_source_unref (source);
assert(id != 0);

return source;

} ` Is modifying the first four bytes of user_data with all zeroes after g_source_set_callback returns. At least that is what it does in most of the cases. Sometimes it changes the same four bytes with a value that looks like a data pointer, but in the vast majority of cases it just fills the value with zeros.

This function does not normally change the value of use_data, but when it does, I am consistently getting the segmentation fault when later on I am trying to close the connection.

My wild guess is that it is related to multithreading. I have noticed that if I somehow slow down my application by adding multiple printfs to stdout the error is less likely to appear.

joeBlbs commented 6 years ago

I see that it is also related to https://github.com/labapart/gattlib/issues/47

Buguito commented 6 years ago

I'm having the same problem but after a connection and previous or during a character reading, dunno which one since gdb couldn't backtrace too far ( it says previous frame identical).

In my case i got it at the libglib-2.0.so.0, and it just shows to lines:

0 .......... in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0

1 .......... in g_main_loop_quit() from /lib/arm-linux-gnueabihf/libglib-2.0.so.0

There is another thread showing #0 in poll () at ../sysdeps/unix/syscall-templace.S:81

Can't get more info since it says previous frame identical to this frame (corrupt stack?)

Any help appreciated.

Buguito commented 6 years ago

Well, as far as i can tell my case is different, i'm using the DBUS version. The discovery works ok. The first connect to the device goes ok and i get to read the characteristic.

The second attempt to connect to the device crashes and i get a segfault. I've trace it back to the "stop_scan_func', wich in theory is called do to timeout (even though it triggers inmediatly at connecting and not after the 4 seconds CONNECT_TIMEOUT).

That function calls g_main_loop_quit(data), being data the "loop", and it crashed there at the glib2, apparently.

So i think i'm getting a different issue here since i'm running the DBUS version of gattlib

joeBlbs commented 6 years ago

@Buguito , our issues might or might not be related. It is common for me to see stack traces like yours when my program fails.

greymfm commented 6 years ago

I can confirm I (Raspberry PI W, BlueZ 5.43) get sigfaults after disconnect and a sequence connect followed by notify too - I get sigfaults with both cases (DBUS and non-DBUS versions)

dbonnell commented 4 years ago

I'm having the same issue as @Buguito reported on Nov 7.

Beginning scan ...
[New Thread 0x76745390 (LWP 8885)]
[New Thread 0x75dff390 (LWP 8886)]
Discovered 9F:BE:B5:C6:62:DE - 'JBBluetooth_62DE'
...
Discovered 14:F4:8F:98:4D:1B - ' BT_4D1B'
Discovered 6E:B8:45:0A:13:4F - 'Galaxy S10'

Thread 1 "cc3501_A1" hit Breakpoint 2, stop_scan_func (data=0x53fc0) at /home/pi/CC3501_libs/gattlib/dbus/gattlib.c:61
61              g_main_loop_quit(data);
(gdb) c
Continuing.
Scan completed
Opening a BLE connection to Galaxy S10 [6E:B8:45:0A:13:4F] ...
Connected 1 device(s)
Waiting for notifications ...

Thread 1 "cc3501_A1" hit Breakpoint 2, stop_scan_func (data=0x37c88) at /home/pi/CC3501_libs/gattlib/dbus/gattlib.c:61
61              g_main_loop_quit(data);
(gdb) p data
$2 = (gpointer) 0x37c88
(gdb) c
Continuing.

Thread 1 "cc3501_A1" received signal SIGSEGV, Segmentation fault.
0x76ef61f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0

It doesn't segfault everytime, but it does it very often. The notifications are also flaky, only working maybe 1 in 5 runs.

UPDATE The call to g_main_loop_quit that causes the segfault comes from the timeout in gattlib_connect, not the scan. The timeout fires AFTER the connection has been established and the loop has gone out of scope and been GC'd or overwritten.

I've changed the code in gattlib_connect to remove the timer:

g_main_loop_run(loop);
g_source_remove(timer);
g_main_loop_unref(loop);

No more segfaults but you will get a GLib-CRITICAL warning that the source was not found when attempting to remove it. I'll look into avoiding that later but for now the warning is better than a crash!