Open agners opened 1 year ago
Bluetooth modem initialisation has changed recently and is now managed by the kernel, but from that point on it should be the same, so I'm surprised at the nature of the errors you and others are seeing.
I'm pleased to hear that you are using the recommended (and default) configuration of using UART0 (ttyAMA0) to drive the BT modem; it is the best option. The Pi 3 is the most troublesome because it doesn't connect up the flow-control signals (those pins were used for something else), and for that reason the baudrate is limited to 921600. However, that shouldn't have changed with the move to kernel modem initialisation.
Wait... tag 1.20230405 predates the big switchover, which went live three weeks later (2023-04-26), so something else must have changed between 5.15.84 and 6.1.21. In a way that's a relief, but it may make it harder to locate the breaking change.
I'm pleased to hear that you are using the recommended (and default) configuration of using UART0 (ttyAMA0) to drive the BT modem; it is the best option. The Pi 3 is the most troublesome because it doesn't connect up the flow-control signals (those pins were used for something else), and for that reason the baudrate is limited to 921600. However, that shouldn't have changed with the move to kernel modem initialisation.
Just double checked, hci0 is indeed running on ttyAMA0. Also, we are using 460800 as baudrate. It seems we've added a patch a while ago to lower the baudrate, and there was also a reference to the https://github.com/raspberrypi/firmware/issues/1150 issue.
That said, this is my test instance. It could be that other folks which report the problem (especially on RPi4/CM4) actually use a different config.txt
. I'll have to check.
That's definitely using the old hciattach utility - if the kernel set up the modem there would be no /dev/ttyAMA0 to see.
Is the WiFi connected and in use?
Is the WiFi connected and in use?
WiFi is not in use and disconnected.
Thanks, that means my current test should be at least somewhat representative.
Bluetooth modem initialisation has changed recently and is now managed by the kernel, but from that point on it should be the same, so I'm surprised at the nature of the errors you and others are seeing.
I see, this is the change you are referring to: https://github.com/raspberrypi/linux/commit/ec08e93837efa2e75e5cab4d80427fdf7ea23da1
Good to see we can push this out to the kernel, it will make userspace simpler. I'll definitely give this a try sometime in the future.
But I think the problem at hand is unrelated to the error we are seeing, as the HCI communication at this point is already under the kernels control. To me it looks like different parsing/error recovery behavior of the Bluetooth HCI handling logic in the kernel.
That said, I wonder why we even have these hci0: Frame reassembly failed
. Is the low-level UART communication issues or is this higher level buffer overflows (lack of flow control)?
But I think the problem at hand is unrelated to the error we are seeing, as the HCI communication at this point is already under the kernels control. To me it looks like different parsing/error recovery behavior of the Bluetooth HCI handling logic in the kernel.
Indeed.
That said, I wonder why we even have these hci0: Frame reassembly failed. Is the low-level UART communication issues or is this higher level buffer overflows (lack of flow control)?
If I knew that...
That said, I wonder why we even have these hci0: Frame reassembly failed. Is the low-level UART communication issues or is this higher level buffer overflows (lack of flow control)?
If I knew that...
From what I can tell the driver checks the overrun flag of the FIFO (see https://github.com/raspberrypi/linux/blob/1.20230405/drivers/tty/serial/amba-pl011.c#L334-L335), so I should see overrun counts in /proc/tty/driver/ttyAMA
if the latter is the case. Unfortunately, I just rebooted to try with 115200, so my system has cleared the stats again :cry: I'll check the value next time I encounter frame reassembly errors.
So with 115200 I did not get any overruns for 24h+. Today I switched back to 921600, and got two events. And it seems those are buffer overruns:
# dmesg | grep hci0
[42686.118707] Bluetooth: hci0: Frame reassembly failed (-84)
[42686.118948] Bluetooth: hci0: Frame reassembly failed (-84)
[53440.602688] Bluetooth: hci0: Frame reassembly failed (-84)
[53440.602927] Bluetooth: hci0: Frame reassembly failed (-84)
# cat /proc/tty/driver/ttyAMA
serinfo:1.0 driver revision:
0: uart:PL011 rev2 mmio:0x3F201000 irq:114 tx:197234 rx:3403475 oe:2 RTS|CTS|DTR
I see that the RPi config uses rather conservative values for the scheduler:
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_HZ=100
Maybe higher HZ and/or fully preemptive kernel helps here?
That said, it is still interesting that the 5.15 was able to recover from these overruns. So ideally we'd have a Bluetooth stack which can handle the reassembly errors :sweat_smile:
For testing I switched to a mainline kernel on the affected device. The mainline kernel has overruns as well, but it seems that the Bluetooth stack can recover in this situation:
# cat /proc/tty/driver/ttyAMA
serinfo:1.0 driver revision:
0: uart:PL011 rev2 mmio:0x3F201000 irq:99 tx:57297 rx:487185 oe:10 RTS|CTS|DTR
# uname -a
Linux homeassistant 6.1.28-v8 #1 SMP PREEMPT Fri May 12 11:49:03 UTC 2023 aarch64 HAOS
# dmesg | grep hci0
[ 9.028494] Bluetooth: hci0: BCM: chip id 94
[ 9.029017] Bluetooth: hci0: BCM: features 0x2e
[ 9.030785] Bluetooth: hci0: BCM43430A1
[ 9.030825] Bluetooth: hci0: BCM43430A1 (001.002.009) build 0000
[ 9.045975] Bluetooth: hci0: BCM43430A1 'brcm/BCM43430A1.hcd' Patch
[ 9.668550] Bluetooth: hci0: BCM: features 0x2e
[ 9.670367] Bluetooth: hci0: BCM43438A1 37.4MHz Raspberry Pi 3-0141
[ 9.670403] Bluetooth: hci0: BCM43430A1 (001.002.009) build 0508
[ 47.651404] Bluetooth: hci0: Opcode 0x c14 failed: -110
[ 47.651404] Bluetooth: hci0: command 0x0c14 tx timeout
[ 48.469660] Bluetooth: hci0: unexpected event for opcode 0x0c14
[ 48.471036] Bluetooth: hci0: Frame reassembly failed (-84)
[ 50.467406] Bluetooth: hci0: Opcode 0x c03 failed: -110
[ 547.339672] Bluetooth: hci0: Frame reassembly failed (-84)
[ 1266.027436] Bluetooth: hci0: Frame reassembly failed (-90)
[ 1266.027576] Bluetooth: hci0: Frame reassembly failed (-84)
[ 3305.180717] Bluetooth: hci0: Frame reassembly failed (-84)
[ 3305.180838] Bluetooth: hci0: Frame reassembly failed (-84)
[ 5532.156662] Bluetooth: hci0: Frame reassembly failed (-84)
[ 5532.156774] Bluetooth: hci0: Frame reassembly failed (-84)
[ 6435.178340] Bluetooth: hci0: Frame reassembly failed (-84)
[ 6435.178495] Bluetooth: hci0: Frame reassembly failed (-84)
[ 6675.408159] Bluetooth: hci0: Frame reassembly failed (-84)
[ 6675.408308] Bluetooth: hci0: Frame reassembly failed (-84)
[ 9848.851944] Bluetooth: hci0: Frame reassembly failed (-84)
I've not yet reviewed/checked downstream changes, but either there is an upstream fix between 6.1.21 and 6.1.28 or there is something broken in the downstream kernel.
I've not yet reviewed/checked downstream changes, but either there is an upstream fix between 6.1.21 and 6.1.28 or there is something broken in the downstream kernel.
rpi-update is now on 6.1.28 so testing that may narrow things down.
I've tested with the kernel tree at https://github.com/raspberrypi/linux/commit/cb9e651718d601c6f2ff1116b4c98c11063302f7. I still use the old initialization mechanism, however, since that is only initialization, it shouldn't matter really.
May 22 16:25:26 homeassistant kernel: Bluetooth: hci0: Frame reassembly failed (-84)
May 22 16:25:26 homeassistant kernel: Bluetooth: hci0: Frame reassembly failed (-84)
May 22 17:00:40 homeassistant kernel: Bluetooth: hci0: Frame reassembly failed (-84)
May 22 17:00:40 homeassistant kernel: Bluetooth: hci0: Frame reassembly failed (-84)
May 22 17:02:28 homeassistant kernel: Bluetooth: hci0: Frame reassembly failed (-84)
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: command 0x200c tx timeout
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: Opcode 0x200c failed: -110
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: Unable to disable scanning: -110
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: Opcode 0x200c failed: -16
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: Unable to disable scanning: -16
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: Unable to disable scanning: -16
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: Opcode 0x200c failed: -16
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: Unable to disable scanning: -16
May 22 17:02:30 homeassistant kernel: Bluetooth: hci0: stop background scanning failed: -16
May 22 17:02:40 homeassistant kernel: Bluetooth: hci0: Opcode 0x200c failed: -16
May 22 17:02:40 homeassistant kernel: Bluetooth: hci0: Unable to disable scanning: -16
May 22 17:02:40 homeassistant kernel: Bluetooth: hci0: Unable to disable scanning: -16
May 22 17:02:40 homeassistant kernel: Bluetooth: hci0: Opcode 0x200c failed: -16
May 22 17:02:40 homeassistant kernel: Bluetooth: hci0: Unable to disable scanning: -16
May 22 17:02:40 homeassistant kernel: Bluetooth: hci0: stop background scanning failed: -16
I agree that the initialisation method shouldn't affect the issue, but until I can reproduce and understand it I'm not going to rule it out.
Are you seeing overruns with those errors?
Are you seeing overruns with those errors?
Yes, the overruns strongly correlate with the Frame assembly failed
errors.
Describe the bug
In the latest Home Assistant OS release 10 we moved from the Raspberry Pi Linux kernel 5.15 to 6.1. Since the release 10 we got quite some reports about Bluetooth outages after a couple of hours runtime, specifically on Raspberry Pi devices. The outages on Raspberry Pi devices have the following kerne logs in common:
From this point onward it is no longer possible to use Bluetooth. A reboot recovers the system.
I've been able to reproduce the issue locally on the Raspberry Pi 3 Model B out of three Raspberry Pi devices (Raspberry Pi 3 Model B, 3+ Model B and 4). On the device which shows the problem, I've noticed that
Bluetooth: hci0: Frame reassembly failed (-84)
errors have been quite common even on Linux 5.15, but it seems that the older kernel was able to recover on its own (no successivecommand
/Opcode
etc. errors). On other devices, I've not seen the error even on Linux 5.15, so it seems this problem is device specific (only happens on some). However, from other people reports all device types seem affected (Pi 3/3+/Pi 4/CM4).We are using the full UART for Bluetooth by default.
Since Home Assistant OS also updated other components (like BlueZ) I built a OS dev version using Linux 5.15.84 (tag
1.20230106
), with all other components the latest version as used in HAOS 10.1. That build showed the reassembly errors too, but recovered just fine. So with this it is proven that the problem is a Linux 6.1 issue.This is essentially a upstream report of our Home Assistant OS issue https://github.com/home-assistant/operating-system/issues/2485.
Steps to reproduce the behaviour
Device (s)
Raspberry Pi 3 Mod. B, Raspberry Pi 3 Mod. B+, Raspberry Pi 4 Mod. B, Raspberry Pi CM4, Raspberry Pi CM4 Lite
System
Home Assistant OS 10.1 Linux 6.1.21 (built from the
1.20230405
tag)Logs
No response
Additional context
No response