home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.46k stars 925 forks source link

USB devices not available after update to 12.3 on Yellow. Also preventing boot. #3347

Open mmarc opened 3 weeks ago

mmarc commented 3 weeks ago

Describe the issue you are experiencing

Updated Home Assistant OS from 12.2 to 12.3 on my Yellow and it did not come back online afterwards. After a manual reboot the Yellow is online again but all connected USB devices are missing.

When rebooting several times it seems there is a 50:50 chance it boots at all and if it boots, USB is missing.

Downgrade to 12.2 solves the issue.

What operating system image do you use?

yellow (Home Assistant Yellow)

What version of Home Assistant Operating System is installed?

12.3

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

Just update from 12.2 to 12.3

Anything in the Supervisor logs that might be useful for us?

Unfortunately too late since I already downgraded to 12.2 again.

Anything in the Host logs that might be useful for us?

Unfortunately too late since I already downgraded to 12.2 again.

System information

No response

Additional information

No response

agners commented 3 weeks ago

Hm, I have two Yellow's here, and double checked, both run on HAOS 12.3 successfully, with USB devices detected.

What (type of) device is it which is missing?

The ha logs commands have a boot parameter nowadays which allow to get logs from previous boot, e.g.

ha host logs --boot -1 --lines 10000
mmarc commented 3 weeks ago

Installed the update again. Host is pingable afterwards but connection on port 8123 is refused and SSH also not reachable. Powercycled Yellow and now online but this time USB devices (BLE dongle and Homematic RF dongle) are available.

  OS Version:               Home Assistant OS 12.3
  Home Assistant Core:      2024.5.2

  Home Assistant URL:       http://homeassistant.local:8123
  Observer URL:             http://homeassistant.local:4357
~ # lsusb
Bus 001 Device 001: ID 1d6b:0002
Bus 001 Device 004: ID 2fe3:000b
Bus 001 Device 002: ID 1a40:0101
Bus 001 Device 003: ID 1b1f:c020

Previously (in the error case) it only showed a single USB device, which was

Bus 001 Device 001: ID 1d6b:0002

if I remember correctly.

Attached the result of ha host logs --boot -1 --lines 20000 , not sure if there is something visible in there: boot.log.gz

agners commented 3 weeks ago

Host is pingable afterwards but connection on port 8123 is refused and SSH also not reachable.

Hm, sounds like Core did not get started then :thinking: ha supervisor logs might be helpful in this case.

Ideally the host dmesg would be helpful here, especially in the non working case. It seems that too much got logged already, the log is not 2000 lines, and the first entry is a cleanup entry from journald :cry:

Seems 2fe3:000b is a Zephyr dev device? We did move to Linux 6.6 for Yellow with this release, the first time. We previously had quite some problems with some USB bus enumeration changes, however, from what I can tell most of them are reverted for Yellow as well (references https://github.com/home-assistant/operating-system/issues/2995 and https://github.com/home-assistant/operating-system/pull/3224).

If you can reproduce the problem, can you use dmesg in the SSH/Terminal?

mmarc commented 3 weeks ago

2fe3:000b is a Nordics DK with HCI firmware for usage with BLE.

sairon commented 3 weeks ago

Coincidentally, I had an nRF DK with the HCI firmware lying around, so I tried booting my Yellow with that. However, out of ~30 boots so far, I encountered the issue once along the first couple of tries and I can't trigger it again. The cause seems to be the same as in #2257, the USB hub is not enumerated because of an unhandled interrupt. Just like in https://github.com/raspberrypi/linux/issues/5064, it is a dwc2 USB interrupt:

[    6.598626] dwc2 fe980000.usb: irq 41, io mem 0xfe980000
(...)
[    7.331480] irq 41: nobody cared (try booting with the "irqpoll" option)
[    7.338199] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C         6.6.28-haos-raspi #1
[    7.346551] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[    7.350598] usb 1-1: new high-speed USB device number 2 using dwc2
[    7.352982] Call trace:
[    7.361598]  dump_backtrace+0xa0/0x100
[    7.365349]  show_stack+0x20/0x38
[    7.368659]  dump_stack_lvl+0x48/0x60
[    7.372319]  dump_stack+0x18/0x28
[    7.375628]  __report_bad_irq+0x40/0xf0
[    7.379461]  note_interrupt+0x330/0x388
[    7.383292]  handle_irq_event+0xa4/0xc0
[    7.387126]  handle_fasteoi_irq+0xac/0x240
[    7.391219]  generic_handle_domain_irq+0x34/0x58
[    7.395834]  gic_handle_irq+0x4c/0xd8
[    7.399491]  call_on_irq_stack+0x24/0x58
[    7.403411]  do_interrupt_handler+0x88/0x98
[    7.407591]  el1_interrupt+0x34/0x68
[    7.411162]  el1h_64_irq_handler+0x18/0x28
[    7.415255]  el1h_64_irq+0x64/0x68
[    7.418651]  default_idle_call+0x5c/0x170
[    7.422657]  do_idle+0x204/0x238
[    7.425884]  cpu_startup_entry+0x40/0x50
[    7.429804]  rest_init+0xec/0xf8
[    7.433028]  arch_call_rest_init+0x18/0x20
[    7.437122]  start_kernel+0x528/0x670
[    7.440780]  __primary_switched+0xbc/0xd0
[    7.444787] handlers:
[    7.447053] [<0000000048434357>] dwc2_handle_common_intr [dwc2]
[    7.452996] [<00000000d0dace6f>] dwc2_hsotg_irq [dwc2]
[    7.458147] [<00000000a7d505ef>] usb_hcd_irq
[    7.462417] Disabling IRQ #41

Attaching the full dmesg for reference: yellow-usb-fail-dmesg.txt

@mmarc can you try connecting your Yellow to a PC with USB-C connector switched to the USB-UART mode (see Linux/Mac or Windows instructions) and checking the boot log and dmesg directly there?

(Update 15 boots later - the issue occurred again with the same stack trace. rmmod dwc2 && modprobe dwc2 made the hub and attached device available again.)

bkvargyas commented 3 weeks ago

I have two yellows, both with the same z-wave stick inserted. Have not updated the 2nd one yet, but the first one had the same issues here as described. Luckly for me, I have remote PoE power cycle capability, and was able to get it back online after a power cycle, with the z-wave stick. I have 3 more yellows in a box, I just need to assemble and test in the lab.