home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.61k stars 941 forks source link

Watchdog can freeze ODROID-M1 and ODROID-XU4 (maybe others) #2675

Open agners opened 11 months ago

agners commented 11 months ago

Describe the issue you are experiencing

Since #2628 watchdog is enabled in systemd.

It seems that when stressing the system with stress-ng, it can trigger a watchdog reset by systemd. Typically, this should lead to a reboot. However, it seems that certain embedded boards have bugs in their watchdog implementation or elsewhere, which causes the system to not reboot but just freeze.

This has been observed with ODROID-M1 and ODROID-XU4

What operating system image do you use?

odroid-m1 (Hardkernel ODROID-M1)

What version of Home Assistant Operating System is installed?

11.0.dev20230803

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Boot the board with the dev version
  2. Use stress-ng --all 4 to stress the system a lot
  3. Wait for 5-10 minutes until a the board stops responding

A easier way is to use a stable release (e.g. 10.4) which does not have systemd watchdog enabled:

  1. Use cat /dev/watchdog to start the watchdog (this will lead to cat: read error: Invalid argument, but at this point the watchdog has been started because the device file has been opened.
  2. Wait for a freeze to happen

Anything in the Supervisor logs that might be useful for us?

No

Anything in the Host logs that might be useful for us?

Some board warn that the watchdog continues to run after catting the file:

Aug 08 21:03:35 ha-shelf2-om1 kernel: watchdog: watchdog0: watchdog did not stop!


### System information

_No response_

### Additional information

_No response_
mingzhangqun commented 11 months ago

Hi, everyone. I've tried enable BR2_INIT_BUSYBOX and disable BR2_INIT_SYSTEMD, the watchdog freeze also. So I guess it has nothing to do with systemd.

watchdog.txt

mingzhangqun commented 11 months ago

Using the official source code (linux 5.10) the watchdog works well. I'm trying the newest official sdk (linux 6.1).

mingzhangqun commented 11 months ago

I've tried official sdk(linux 6.1), the watchdog works well. I replaced the kernel partition(dd if=kernel.squash of=/dev/sdx2) with haos, the watchdog works too. kernel_img.zip

github-actions[bot] commented 8 months ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.