weliem / bluez_inc

A C library for Bluez (BLE) that hides all DBus communication. It doesn't get easier than this. This library can also be used in C++.
MIT License
84 stars 19 forks source link

Raspberry Pi - Service Discovery Slow / Disconnect #38

Closed bojennett closed 3 months ago

bojennett commented 4 months ago

I'm really excited about this project. I just compiled it on a RPI4 running Raspbian, and am trying to connect to my custom device, which has 6 services (device information, battery, heart rate, and 3 custom services). The device information service has several characteristics (obviously), and each of the other services has 1 or at most 2.

On the RPI4 it is taking anywhere from 29 - 35 seconds after the connection for me to see the logs that the services were discovered, and within 2 seconds of that, the device is disconnected. My device doesn't initiate the connection as far as I can tell - there is no logic to auto-disconnect. But it was surprising how long it took and then to immediately disconnect.

I also installed this on an older Linux laptop running Ubuntu, and the device connected, but it never found any services - the callback returned with "found 0 services", again taking about 30 seconds, and then disconnected.

Not sure what I can do here to help show the issue or debug it?

bojennett commented 4 months ago

I'm happy to be a contributor to this project. I have written iOS and Android bluetooth libraries for this device (along with several others), and would LOVE to have a Linux library for my device. I'm not strong (at all) in dbus, but am strong with bluetooth.

bojennett commented 4 months ago

This project is such a great idea - you are essentially creating CoreBluetooth / BluetoothGatt directly in C on Linux.

weliem commented 4 months ago

Hi, hard to say what is going on in you case. In my experience connections and service discovery is fairly quick. What version of Bluez are you using? What Linux kernel version?

Try updating your Bluez version first....

weliem commented 4 months ago

Do you get the same results if you use 'bluetoothctl'?

bojennett commented 4 months ago

let me see if I can find out more info on that. what I can say is I instead tried to connect to a Polar H10, and yes, things went very quick. Our device has no problem working on iOS or Android, but I wouldn't put it past the manufacturer of the SoC on our device (it's an Ambiq Apollo Blue 3+) that maybe there is something undocumented in the bluetooth flow, and that's confusing things. Let me see what I can find out.

bojennett commented 4 months ago

Bluez on the RPI -> 5.66-1-rpt1-deb12u1

bojennett commented 4 months ago

note that I don't think it's your library, because I've also installed bleak - a Python library for bluetooth. Not sure if you're familiar with this? But it had similar problems. I could see it connect (our device will blink an LED when a successful bluetooth connection is made) but then I see it disconnect. So the behavior is, I think, underneath you - the core bluetooth driver or whatever.

bojennett commented 4 months ago

btw, here is the device I'm talking about. I work for a company named Biostrap. We build wrist wearables mostly for research, and this is our newest device, it's called "Kairos": https://shop.biostrap.com

bojennett commented 4 months ago

I would LOOOOOOVE for this to work. regardless of the issue I'm experience currently, this library is such a great idea.

bojennett commented 4 months ago

here's the bleak project: https://github.com/hbldh/bleak. I tried the second example where it connects and reads the model number, and it eventually times out, but I can see a connection being made.

bojennett commented 4 months ago

OK, I get the same issue with bluetoothctl. I will connect to the device, and it takes a while, and eventually the services are returned, with "resolved", and then it goes to services resolved "no", and then it disconnects.

So it must be something with the device and how it interacts with linux. I don't know what it could be, though. it's all APIs on the device firmware supply by Apollo, so ¯_(ツ)_/¯

I guess we can close this issue out as "device has a problem"

weliem commented 4 months ago

I have heared of BioStrap! Impressive device! I would love to try one!

Bluez 5.66 is a bit old. Try building Bluez yourself in order to get the latest version running. I am using an Intel Nuc with an AX200 adapter and most devices work totally fine.

It could of course be that your device is not playing nice with Bluez. You could make a log with the 'btmon' command and look at that with your firmware engineer.

bojennett commented 4 months ago

ok, just updated to 5.75, which I think is the latest. I did this on the PC, not the RPI. The PC was Ubuntu and was running 5.53 of Bluez, and its failure was that it didn't find any services before disconnecting.

After updating to 5.75, the PC now finds all the services, but again after this long delay, and then disconnects (so it behaves like he RPI running 5.66).

I guess I have to go into what Bluez thinks it is seeing from the device that makes it unhappy. it's very strange, because everything at the firmware level is running through Arm MbedOS. it' snot like our user written firmware twiddles bits or anything - we just say to generate a notification, or respond to a write with error or whatever through the available APIs. We don't even participate in service and characteristic discovery... we don't get involved until they do things like turn notifications on, just so we know that it is now ok for us as a user to generate them.

But there must be something we are doing that is actually not kosher that iOS and Android can seem to deal with and Linux simply cannot.

weliem commented 4 months ago

You try to play with the Bluez configuration

https://github.com/bluez/bluez/blob/master/src/main.conf

Try modifying the connection intervals....

bojennett commented 4 months ago
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Nothing for this code to do here.  This is all on the device side or potentially blueZ configuration.
bojennett commented 4 months ago

Hey, I found the major issue with our device. We have two characteristics that are indicate only. However, in the attributes for these characteristics, we had logged that it had write permission. Not sure why that would confuse Linux as it doesn't confuse iOS and Android, but I fixed that.

I've run into a second problem and I don't know if this is a Linux problem or a device problem. The data behaves as if I have "too many" characteristics. If I have everything in there, I still get the same problem - it takes 30 seconds to do service discovery and then times out and disconnects. However, if I remove just ONE single characteristic, then services are discovered quickly and I don't disconnect (and I can do everything - read the device information service, etc.)

I can't find anything online in Bluez / bluetootctl about this, which makes me think there isn't a limit, but I'm wondering if you've found anything like this.

bojennett commented 4 months ago

Right now, I have a total of 5 services - battery, device information, heart rate, OTA, and my custom. In battery, there is one characteristic. In heart rate, there are two (the value and body location). In DIS, there are 7. In OTA, there are two (basically two serial ports), and in my custom one there are 3. I can't do anything about the OTA or my own. I can maybe not have as many DIS, and I could probably get rid of body location in HRS, but it seems like a weird thing that I would have "too many"

weliem commented 4 months ago

it shouldn't take this long. I have worked with other devices that have a similar amount of services and characteristics without any issues. For example, Omron blood pressure meters...

It must be something else. Did you make a detailed log with the btmon command? That will give you a HCI level log that may provide insights as to why it is taking this long....

bojennett commented 4 months ago

Yes, I don't get it. Arm MBED is kind of a mess. Like I said, I had a configuration set in one of the characteristics that nobody else seems to care about, but Linux did. What I had was wrong, but wasn't fatal until Linux. I'm sure there must be something else that I'm doing that it doesn't like, but no idea what, but first I wanted to better understand if there were Linux blueZ limitations I wasn't aware of. Given that I hadn't seen anything on StackOverflow, I was sure there weren't, but I thought I'd ask.

bojennett commented 4 months ago

I would like to volunteer my services to this project if you're looking for additional contributors

weliem commented 4 months ago

I very much welcome any good contributions!

Do you already have something in mind?

bojennett commented 4 months ago

Only one immediate - for iOS and Android, I have built an enumerated type that has all the assigned standard UUIDs for bluetooth gatt services, characteristics, and descriptors, with properties to return the title. So if you have characteristic "2a19", it would return the title of "Battery Level". Having these all present could make life easier for somebody who is just using standard devices in that they don't have to look anything up - they can just refer to the enumerated value (CHAR_BATTERY_LEVEL_UUID or something for example).

Secondarily, I don't know if we could beef up the logger function so that it could contain FILE, FUNCTION, and LINE and it could then produce an output that we could use with adb's logcat maybe?

weliem commented 4 months ago

Adding a list of known UUIDs could be useful. It does come with maintenance since the BT SiG keeps on adding new services.

As for adding file+linenumber could also be useful.

Feel free to raise a PR

bojennett commented 4 months ago

We figured the problem out on the device. We didn't allocate enough memory to the "WSF" section in firmware, and we had a lot of services and characteristics. We saw that at the end of service discovery, Linux sends a hash back and then it disconnects. Maybe the hash is failing because we're out of memory, Linux sees that and quits whereas iOS and Android doin't care, or maybe the write never finishes due to lack of memory and Linux times out waiting (this would explain the 30 - 40 second wait after connection), and again iOS/Android don't care.

Not sure the reasons for the differences, but as expected, the error was on our end in trying to optimize memory so that we had the most available for our application, and so we short-changed pieces of the underlying OS and it bit us, but for whatever reason, iOS/Android didn't care.

Man, that was driving us CRAZY

bojennett commented 4 months ago

So we had two underlying errors.

We set attributes on two of our characteristics that didn't line up. We said the device didn't have write permissions, but then a different attribute said it did. This seemed to confuse Linux to the point of failure.

The second was we didn't have enough memory allocated for how the service discovery was stored, and this caused a communication problem and Linux failed.

It's one of those situations where we were out of spec, but iOS and Android had decided to live with it to ensure the widest array of devices would work, but Linux is more of a stickler to the spec. As a guy who used to work for Intel, I'm familiar with "this is out of spec but we're 90% of the market, so we dictate what is in spec"

weliem commented 4 months ago

Well that's good news then! Yes, Linux is not as forgiving as iOS/Android. I guess it is good to test your device on Linux because once it works on Linux it'll work everywhere...

weliem commented 3 months ago

Closing the issue since it was not related to this library and has been resolved anyway.