raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.15k stars 1.68k forks source link

Bluetooth: hci0: Frame reassembly failed (-84) #1150

Open risa2000 opened 5 years ago

risa2000 commented 5 years ago

After the last system upgrade of Raspbian to:

pi@hass:~ $ uname -a
Linux hass 4.19.42-v7+ #1219 SMP Tue May 14 21:20:58 BST 2019 armv7l GNU/Linux

I am observing a problem with bluetooth, which, after some time, stops working. The kernel log shows repeating message

Bluetooth: hci0: Frame reassembly failed (-84)

or sometimes also a stack dump (I do not have one right now).

The RPi is 3B the dmesg log is here: dmesg.log

The HCI configuration is:

pi@hass:~ $ hciconfig -a
hci0:   Type: Primary  Bus: UART
        BD Address: B8:27:EB:A1:A8:DE  ACL MTU: 1021:8  SCO MTU: 64:1
        UP RUNNING
        RX bytes:1733551 acl:20475 sco:1 events:67235 errors:0
        TX bytes:387061 acl:5214 sco:0 commands:27972 errors:0
        Features: 0xbf 0xfe 0xcf 0xfe 0xdb 0xff 0x7b 0x87
        Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3
        Link policy: RSWITCH SNIFF
        Link mode: SLAVE ACCEPT
        Name: 'hass'
        Class: 0x000000
        Service Classes: Unspecified
        Device Class: Miscellaneous,
        HCI Version: 4.1 (0x7)  Revision: 0x168
        LMP Version: 4.1 (0x7)  Subversion: 0x2209
        Manufacturer: Broadcom Corporation (15)

Now my BT usage is a bit specific. The main role for this RPi is running homeassistant, which means (among others) having: USB flash stick (with f2fs for logging), Z-Wave USB stick for home automation, Zigbee USB stick for home automation, and also using built-in BT for tracking BT hygrothermo devices.

Apart from that, I also (ab)use the built-in BT for controlling Valve's lighthouses. Both the hygrothermo devices and the lighthouses use BTLE protocol. HT devices are read-out every 2 minutes, the lighthouses (when running) are talked to every 20 seconds.

When I do not run the lighthouses there is no communication with them and the error seems far less likely to happen. When I run a VR session, and run the lighthouse control, after some time, the BT becomes unresponsive, the errors are logged in the kernel log and the only resolution is a reboot.

Before the last system (and I assume also the firmware) upgrade, the system worked, in the exactly same configuration, fine, for several months.

pelwell commented 4 years ago

Just this evening I found a possible explanation for the main UART dropping bytes, even with flow control enabled. Assuming the fix works, I'd expect it resolve at least some of the issues people are seeing.

lucagiove commented 4 years ago

Just this evening I found a possible explanation for the main UART dropping bytes, even with flow control enabled. Assuming the fix works, I'd expect it resolve at least some of the issues people are seeing.

cool, looking forward to seeing the pull request :)

pelwell commented 4 years ago

A patch has just gone into rpi-4.19.y: https://github.com/raspberrypi/linux/commit/65aa6ec0faaa012508489886ac357cbb86cdb9a4 It shouldn't break anything, and I think there's a good chance the data loss is fixed (although there may be a better implementation - this is more of a workaround).

lucagiove commented 4 years ago

Cool thanks @pelwell I can test it! What the fastest way to run this kernel on hassos? Or easier with raspbian?

pelwell commented 4 years ago

There's likely to be a firmware build by the end of the day that will be available via rpi-update. Or you can build it yourself.

popcornmix commented 4 years ago

rpi-update contains the potential fix. Can you test it?

lucagiove commented 4 years ago

Unfortunately I'm abroad until Monday and as I said the environment where I can easily riproduce the bug is with home assistant distro hassos that does not have rpi-update I've to figure out how to update the firmware there. If someone have ideas..

talondnb commented 4 years ago
  $HCIATTACH /dev/serial1 bcm43xx 460800 noflow - $BDADDR

Did the change on HassOS with Raspberry 3 and seems to work

Can you please explain how this was done?

Xitro01 commented 4 years ago

Having this exact same issues, updated from 4.19.86-v7+ to 4.19.97-v7+ (fdb5c37e330e7cb3027ac4fcc5b1cd5f244b351f). The Frame reassembly failed issue is still there, it keeps spamming my syslog with these messages, but I'll check if it stops crashing the bluetooth adapter now.

Xitro01 commented 4 years ago

This problem is not solved yet with the latest firmware! Same issues occur, my bluetooth hardware crashed 8 hours ago.

lucagiove commented 4 years ago
  $HCIATTACH /dev/serial1 bcm43xx 460800 noflow - $BDADDR

Did the change on HassOS with Raspberry 3 and seems to work

Can you please explain how this was done?

For a permanent change you've to change the read only HassOS fs https://unix.stackexchange.com/questions/8907/modifying-a-squashfs#8925

For a quicker but temporary solution:

You might have to restart home assistant, make sure not to reboot the system otherwise everything get lost.

pelwell commented 4 years ago

It's disappointing that there appear to be Bluetooth issues beyond the UART data loss, but I had to eliminate that possibility first.

talondnb commented 4 years ago

I’ve upgraded to HassOS 3.9 and I seem to be fixed. No failures yet!

lachlan334 commented 4 years ago

I've been on HassOS 3.9 for a few days and I'm still seeing the issue. How long have you gone without error?

talondnb commented 4 years ago

I spoke too soon, literally just went to Unavailable! It actually went got a good 8 hours or so.

On Thu, 6 Feb 2020 at 3:47 pm, Lachlan notifications@github.com wrote:

I've been on HassOS 3.9 for a few days and I'm still seeing the issue. How long have you gone without error?

On Thu, 6 Feb 2020, 6:44 pm talondnb, notifications@github.com wrote:

I’ve upgraded to HassOS 3.9 and I seem to be fixed. No failures yet!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/raspberrypi/firmware/issues/1150?email_source=notifications&email_token=ABDE65STRSRSHHOLHIL57QTRBO5U7A5CNFSM4HTWOVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK6HTTA#issuecomment-582777292 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABDE65TUNDWIZR37URHOD7TRBO5U7ANCNFSM4HTWOVPQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/firmware/issues/1150?email_source=notifications&email_token=AA6CPW7X7JNZHB7QPFTQMGLRBO6ALA5CNFSM4HTWOVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK6H2MA#issuecomment-582778160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPWZ5WS7CZKBF2TYQ62LRBO6ALANCNFSM4HTWOVPQ .

-- Regards,

Andrew Munday

pelwell commented 4 years ago

Can I do a quick survey of failure scenarios? The following information would be helpful:

talondnb commented 4 years ago

Kernel version Linux a0d7b954-ssh 4.19.93-v7 #1 SMP Mon Feb 3 19:47:23 UTC 2020 armv7l Linux OS version (e.g. Raspbian Buster - see /etc/os-release if you aren't sure) alpine Relevant non-standard configuration (HostAP, pulseaudio, ofono etc.) N/A Bluetooth usage (what you are using it for, approximate data rate) Miflora sensor monitoring WiFi usage (onboard or external, approximate data rate, output of iwconfig wlan0) Not used, eth0 only used Approximate average time to failure Since HassOS 3.9 update, around 8 hours. Before this, within minutes.

pelwell commented 4 years ago

I've added Model of Pi to the list - I assume yours is a Zero W, @talondnb?

talondnb commented 4 years ago

I've added Model of Pi to the list - I assume yours is a Zero W, @talondnb?

It's a Pi 3B. Non-plus model.

lachlan334 commented 4 years ago

Model of Pi 3B+ Kernel version Linux a0d7b954-ssh 4.19.93-v7 #1 SMP Mon Feb 3 19:47:23 UTC 2020 armv7l Linux OS version Alpine 3.11 Relevant non-standard configuration Running hass.io Bluetooth usage Presence detection by tracking nearby Bluetooth devices WiFi usage None Approximate average time to failure Seems to be completely random. Could be 5 minutes, could be 5 hours.

pelwell commented 4 years ago

Thanks. Does "Presence detection by tracking nearby Bluetooth devices" cover regular Bluetooth, BLE or both?

johtajajake commented 4 years ago

Model: 3B Kernel: Linux hassio 4.19.93-v7 #1 SMP Sun Jan 12 16:02:44 UTC 2020 armv7l Hassio/OS OS: HassOS 3.8 Relevant non-standard configuration: Running hass.io, connected with USB CC2531 (zigbee2mqtt.io) Bluetooth usage: Presence detection of mobile phones. Bluetooth, not BLE (at least I think: "platform: bluetooth_tracker") WiFi usage: none Approximate average time to failure: Random. 5min-5h

rsmeral commented 4 years ago

Model: 3B Kernel: Linux pi 4.19.97-v7+ #1294 SMP Thu Jan 30 13:15:58 GMT 2020 armv7l GNU/Linux OS: Raspbian Buster Relevant non-standard configuration: N/A Bluetooth usage: Collecting BLE sensor data through EspruinoHub – listens to BLE advertise packets, and sends them to MQTT WiFi usage: none (turned off via dtoverlay=disable-wifi) Approximate average time to failure: 30min–12h

Screenshot 2020-02-06 at 22 00 25
lachlan334 commented 4 years ago

Thanks. Does "Presence detection by tracking nearby Bluetooth devices" cover regular Bluetooth, BLE or both?

Just regular Bluetooth

pelwell commented 4 years ago

I've got an installation of the hassio Home Assistant, since that seems like it might be a route to reproduce the failures. It booted up in Hebrew, but I've got past that now. Can someone give me a quick guide to configuring the presence detection?

johtajajake commented 4 years ago

I'm relying on my memory and 2-year old config, anyone, please add/correct.

Add to (existing) /config/configuration.yaml (e.g. to the end of the file):

- platform: bluetooth_tracker
  new_device_defaults:
    track_new_devices: true

Restart HA (in UI, Configuration --> Server control --> restart). That's it. Or the hard reboot in CLI "hassio host reboot" (reboot whole computer). I sometimes have to do this, the UI restart just never comes back up.

Optionally if the device doesn't appear shortly in your UI, you can also create "knowndevices.yaml" to /config folder (note mac address starts with "BT":

myiphone:
  mac: BT_11:22:33:44:55:66
  name: MyPrecious

--> the device tracker should appear in your UI. You can check the state in UI: Developer tools --> states. There should be a "device_tracker.myiphone", if you created the known.devices.yaml. Otherwise the myiphone part can be something else, can't remember. Possibly mac address.

I suppose you're familiar with yaml, but anyway the common note, every space matters. Copy the above as is.

lucagiove commented 4 years ago

Model of Pi Raspberry Pi 3 Model B Rev 1.2 Kernel version Linux hassio 4.19.93-v7 #1 SMP Sun Jan 12 16:02:44 UTC 2020 armv7l Hassio/OS OS version HassOS 3.8 Relevant non-standard configuration Running hass.io Bluetooth usage Reading Xiaomi BLE Temperature Humidity sensor WiFi usage Connected to wifi router Approximate average time to failure 2-4 hours more or less

Xitro01 commented 4 years ago

Model of Pi: 3B Kernel version: Linux pi 4.19.97-v7+ OS version: Raspbian Buster Relevant non-standard configuration: Running hass.io Bluetooth usage: Reading Xiaomi BLE Temperature Humidity sensor WiFi usage: None Approximate average time to failure: 4-8 hours Side note: Running this non-permanent-fix now: "nohup hciattach /dev/serial1 bcm43xx 460800 noflow - YOURMACADDRESS &", still got errors in my syslog but not as much anymore, and haven't crashed since days now.

pelwell commented 4 years ago

@johtajajake Thanks for the instructions. Unfortunately my installation (hassos_rpi3-3.9.img) doesn't have a "/config" directory, or anything similar:

# uname -a
Linux hassio 4.19.93-v7 #1 SMP Mon Feb 3 19:47:23 UTC 2020 armv7l Hassio/OS
# cat /etc/os-release
NAME=HassOS
VERSION="3.9 (RaspberryPi 3)"
ID=hassos
VERSION_ID=3.9
PRETTY_NAME="HassOS 3.9"
CPE_NAME=cpe:2.3:o:home_assistant:hassos:3.9:*:production:*:*:*:rpi3:*
HOME_URL=https://hass.io/
VARIANT="HassOS RaspberryPi 3"
VARIANT_ID=rpi3
johtajajake commented 4 years ago

That's strange. Did you follow the installation instructions in https://www.home-assistant.io/hassio/installation/ or some other way? See #7 in the instructions, even that's assuming there is the /config. Does the UI work? If yes, then the configuration.yaml must be somewhere.

pelwell commented 4 years ago

I did follow the instructions, but I think my installation was broken from the start - possibly a bad card write. I thought it was suspicious when the UI came up in Hebrew.

With a clean install of 3.10 the config directory is there and I've got my phone being detected by Bluetooth.

pelwell commented 4 years ago

Running Hassio on a 3B+ (which has flow control to the Bluetooth modem) I've found to be reliable (no problems in 24 hours), while on a 3B (with no flow control - this was the last design before we added the GPIO expander, freeing some pins on the SoC) I see the kind of instability that others have reported. I don't undertstand why Hassio is showing the problem more than Raspbian, but perhaps it is a scheduling issue to do with the kinds of workloads that Hassio requires.

I think Hassio should be modified to only use a baud rate of 460800 on a 3B, as that does make it much more reliable.

RoyTrenneman commented 4 years ago
  $HCIATTACH /dev/serial1 bcm43xx 460800 noflow - $BDADDR

Did the change on HassOS with Raspberry 3 and seems to work

Can you please explain how this was done?

  $HCIATTACH /dev/serial1 bcm43xx 460800 noflow - $BDADDR

Did the change on HassOS with Raspberry 3 and seems to work

Can you please explain how this was done?

For a permanent change you've to change the read only HassOS fs https://unix.stackexchange.com/questions/8907/modifying-a-squashfs#8925

For a quicker but temporary solution:

  • make sure to have ssh access to HassOS (not with the plugin)
  • login
  • ps aux | grep hciattach and copy the running command
  • killall hciattach
  • run the copied command replacing the baud rate with 460800, prefixing nohup and postfixing with &: nohup hciattach /dev/serial1 bcm43xx 460800 noflow - YOURMACADDRESS &

You might have to restart home assistant, make sure not to reboot the system otherwise everything get lost.

This workaround seems stable with 38400 speed limit . no Frame reassembly failed since 2 days instead of a few hours...

johtajajake commented 4 years ago

Running Hassio on a 3B+ (which has flow control to the Bluetooth modem) I've found to be reliable (no problems in 24 hours), while on a 3B (with no flow control - this was the last design before we added the GPIO expander, freeing some pins on the SoC) I see the kind of instability that others have reported. I don't undertstand why Hassio is showing the problem more than Raspbian, but perhaps it is a scheduling issue to do with the kinds of workloads that Hassio requires.

I think Hassio should be modified to only use a baud rate of 460800 on a 3B, as that does make it much more reliable.

I had RPi3B with Hassio and RPi3B+ with Hassbian. I switched them over. Now both have had stable BT for a couple of days. Which is nice. Of course doesn't solve the problem in hassio (I'd like to update hassbian to hassio). But a great workaround! Thanks!

mansig88 commented 4 years ago

hello, I have hassio over RPi3B, and I have an error to execute nohup process: ha > login ps aux | grep hciattach root 7586 0.0 0.0 3180 460 pts/0 S+ 12:19 0:00 grep hciattach killall hciattach killall: hciattach: no process killed nohup hciattach /dev/serial1 bcm43xx 38400 noflow - B8:27:EB:0A:xx:xx & nohup: can't open '/root/nohup.out': Read-only file system

[1]+ Done(127) nohup hciattach /dev/serial1 bcm43xx 38400 noflow - B8:27:EB:0A:xx:xx

after this I restart HomeAssistant, but bluetooth continue no working. Any ideas?

Thanks

RoyTrenneman commented 4 years ago

Hi, It seems hciattach didn't start. nohup tried to write on /root... only user root can write on /root... Try to restart hciattach from /home/pi and check if the process is running, then restart HA

mansig88 commented 4 years ago

Hi Roy, how can get access to modify ? Im root... Error

mansig88 commented 4 years ago

Hello again, I bouth TP-Link UB400, and I disable RPi3 bluetooth with: dtoverlay=pi3-miniuart-bt I restart all system, but the problems continue, any help please¿¿ image image Thanks!!

ashah976 commented 4 years ago

this is happening to my setup as well Rpi3 model B with buster lite using onboard bluetooth when streaming audio get the timeout from hciconfig -a then bluetooth peers disconnect (a speaker in my case) modying /usr/bin/btuart, for permanent change, and setting the baud rate to 500000 seems okay (so far) with the 460800 rate my audio streaming quality was poor.

mansig88 commented 4 years ago

how can I change this parameter?

image

thanks

pelwell commented 4 years ago

Use sudo to start your editor - "sudo nano /usr/bin/btuart", "sudo vi /usr/bin/btuart" etc.

Unless you are running with a read-only rootfs, in which case the answer will depend on the distribution you are running (Raspbian doesn't do this).

mansig88 commented 4 years ago

I'm using HASSos

ha > login sudo vi /usr/bin/btuart /bin/ash: sudo: not found

:(

pelwell commented 4 years ago

How to use HASSos is outside the scope of the issue.

mansig88 commented 4 years ago

image

Then I can't do nothing?

pelwell commented 4 years ago

@lucagiove From a quick search of the home-assistant issues page, you seem to have managed to edit /usr/bin/btuart. Can you explain to @mansig88 how you did that?

mansig88 commented 4 years ago

thanks @pelwell !!

J-Luc16 commented 4 years ago

Take care the modification of btuart I made last year on my PI has been cancelled by the update from Raspbian 9 to Raspbian 10. It seems that buster does not yet hold the correct code/config values.

But the modification of /usr/bin/btuart still works, hopefully ;-)

mansig88 commented 4 years ago

but how can I change on HassOS??

lucagiove commented 4 years ago

@mansig88 it's a bit advanced and the first update will overwrite the changes so not really worth. As soon as I've some spare time I'll try to see how can I submit a PR to change it in HassOS

mansig88 commented 4 years ago

thank you so much @lucagiove !!!