Closed gibman closed 2 years ago
As you noted, It looks like the Docker daemon itself gets into trouble:
Mar 17 19:30:20 homeassistant dockerd[358]: fatal error: schedule: spinning with local work
...
Mar 17 19:30:20 homeassistant dockerd[358]: runtime stack:
Mar 17 19:30:20 homeassistant dockerd[358]: runtime.throw(0x30ad682, 0x22)
Mar 17 19:30:20 homeassistant dockerd[358]: runtime/panic.go:1117 +0x72
Mar 17 19:30:20 homeassistant dockerd[358]: runtime.schedule()
Mar 17 19:30:20 homeassistant dockerd[358]: runtime/proc.go:3129 +0x4cd
Mar 17 19:30:20 homeassistant dockerd[358]: runtime.park_m(0xc0014cc600)
Mar 17 19:30:20 homeassistant dockerd[358]: runtime/proc.go:3318 +0x9d
Mar 17 19:30:20 homeassistant dockerd[358]: runtime.mcall(0x800000)
Mar 17 19:30:20 homeassistant dockerd[358]: runtime/asm_amd64.s:327 +0x5b
Mar 17 19:30:20 homeassistant dockerd[358]: goroutine 1 [chan receive, 112 minutes]:
Mar 17 19:30:20 homeassistant dockerd[358]: main.(*DaemonCli).start(0xc0001bc840, 0xc000098900, 0x0, 0x0)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/cmd/dockerd/daemon.go:249 +0xc53
Mar 17 19:30:20 homeassistant dockerd[358]: main.runDaemon(...)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/cmd/dockerd/docker_unix.go:13
Mar 17 19:30:20 homeassistant dockerd[358]: main.newDaemonCommand.func1(0xc00021c2c0, 0xc0008cc2d0, 0x0, 0x9, 0x0, 0x0)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/cmd/dockerd/docker.go:34 +0x7d
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/spf13/cobra.(*Command).execute(0xc00021c2c0, 0xc00004e0b0, 0x9, 0x9, 0xc00021c2c0, 0xc00004e0b0)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/spf13/cobra/command.go:850 +0x472
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/spf13/cobra.(*Command).ExecuteC(0xc00021c2c0, 0x0, 0x0, 0x10)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/spf13/cobra/command.go:958 +0x375
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/spf13/cobra.(*Command).Execute(...)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/spf13/cobra/command.go:895
Mar 17 19:30:20 homeassistant dockerd[358]: main.main()
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/cmd/dockerd/docker.go:97 +0x185
Mar 17 19:30:20 homeassistant dockerd[358]: goroutine 42 [select]:
Mar 17 19:30:20 homeassistant dockerd[358]: go.opencensus.io/stats/view.(*worker).start(0xc000308280)
Mar 17 19:30:20 homeassistant dockerd[358]: go.opencensus.io/stats/view/worker.go:154 +0xcd
Mar 17 19:30:20 homeassistant dockerd[358]: created by go.opencensus.io/stats/view.init.0
Mar 17 19:30:20 homeassistant dockerd[358]: go.opencensus.io/stats/view/worker.go:32 +0x57
Mar 17 19:30:20 homeassistant dockerd[358]: goroutine 21 [select, 112 minutes]:
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/libcontainerd/supervisor.(*remote).monitorDaemon(0xc0000e8800, 0x343adb8, 0xc00022a600)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/libcontainerd/supervisor/remote_daemon.go:318 +0xe89
Mar 17 19:30:20 homeassistant dockerd[358]: created by github.com/docker/docker/libcontainerd/supervisor.Start
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/libcontainerd/supervisor/remote_daemon.go:90 +0x430
Mar 17 19:30:20 homeassistant dockerd[358]: goroutine 22 [syscall, 112 minutes]:
Mar 17 19:30:20 homeassistant dockerd[358]: syscall.Syscall6(0xf7, 0x1, 0x171, 0xc000587c88, 0x1000004, 0x0, 0x0, 0x0, 0x0, 0x0)
Mar 17 19:30:20 homeassistant dockerd[358]: syscall/asm_linux_amd64.s:43 +0x5
Mar 17 19:30:20 homeassistant dockerd[358]: os.(*Process).blockUntilWaitable(0xc000238cf0, 0x0, 0x0, 0x0)
Mar 17 19:30:20 homeassistant dockerd[358]: os/wait_waitid.go:32 +0x9e
Mar 17 19:30:20 homeassistant dockerd[358]: os.(*Process).wait(0xc000238cf0, 0x0, 0x0, 0x0)
Mar 17 19:30:20 homeassistant dockerd[358]: os/exec_unix.go:22 +0x39
Mar 17 19:30:20 homeassistant dockerd[358]: os.(*Process).Wait(...)
Mar 17 19:30:20 homeassistant dockerd[358]: os/exec.go:129
Mar 17 19:30:20 homeassistant dockerd[358]: os/exec.(*Cmd).Wait(0xc00011a160, 0x4, 0xc000587e60)
Mar 17 19:30:20 homeassistant dockerd[358]: os/exec/exec.go:507 +0x65
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/libcontainerd/supervisor.(*remote).startContainerd.func1(0xc00011a160, 0xc0000e8800)
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/libcontainerd/supervisor/remote_daemon.go:214 +0x45
Mar 17 19:30:20 homeassistant dockerd[358]: created by github.com/docker/docker/libcontainerd/supervisor.(*remote).startContainerd
Mar 17 19:30:20 homeassistant dockerd[358]: github.com/docker/docker/libcontainerd/supervisor/remote_daemon.go:212 +0x2f4
It feels to me as if there is some hardware issue (like failing disk), but since you already tried that (and also there is no issue in dmesg
), that seems unlikely.
Can you try OS 8.0 (on stable channel as of today) and see if your problems persist?
hi Stefan.
Thanks for taking a glance at this.
I bought a new NUC setup (Gigabyte BRIX GB-BMCE-4500C - N4500 chipset). No problems since then.
So it must have been a H/W problem on my previous setup (Supermicro X10SBA - J1900 chipset). Probably the motherboard itself.
I think we can safely close this issue.
Describe the issue you are experiencing
I've been plagued with consistent crashes since half a year now. The system enters a state in where it becomes unreachable. no pings. no UI. nothing.
I have to reboot it. Then it works for a few hours.. Sometimes a week.
The home assistant OS runs on a NUC intel system consisting of:
(This system previously ran a windows10 setup without any issues.)
Along the way I have been steadily updating the OS as well as the home assistant itself. Sadly the problem still remains.
I've tried the following hardware tricks.
1) tried another power supply 2) tried other RAM modules 3) tried another SSD drive 4) Installed the home assistant OS on my powerful office PC (usually runs windows), restoring the backup from the NUC install.
Software-wise I have tried the following.. 1) Reinstalled home assistant OS on the same NUC, restoring using backup. 2) remove HACS + custom_components 3) removed each of the addons one at a time 4) removed each of the integrations one at a time 5) tried to rebuild the system from scratch.
home infrastructure is: router: basic setup of openwrt along with a few unify APs. 10x shelly devices 10x esphome devices with sensors around 25 zigbee batt. operated sensors 2x harmony remote controls several android TV/nvidia shields rm4pro denon AVR roborock vacuums nanoleafs LEDs zha (conbee2)
The problem seems to persist regardless of what I do.
What operating system image do you use?
generic-x86-64 (Generic UEFI capable x86-64 systems)
What version of Home Assistant Operating System is installed?
7.4
Did you upgrade the Operating System.
No
Steps to reproduce the issue
No idea exactly what triggers the problem.
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System Health information
System Health
Home Assistant Community Store
GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4941 Installed Version | 1.23.0 Stage | running Available Repositories | 1071 Downloaded Repositories | 9Home Assistant Supervisor
host_os | Home Assistant OS 7.4 -- | -- update_channel | beta supervisor_version | supervisor-2022.03.5 docker_version | 20.10.9 disk_total | 118.5 GB disk_used | 11.8 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | AppDaemon (0.8.2), Duck DNS (1.14.0), ESPHome (2022.2.6), Mosquitto broker (6.0.1), NGINX Home Assistant SSL proxy (3.1.1), Samba share (9.5.1), UniFi Network Application (2.1.0), SSH & Web Terminal (10.1.0), Loki (1.9.5), Promtail (1.9.3)Lovelace
dashboards | 2 -- | -- resources | 5 views | 20 mode | yamlAdditional information
Here are some of the logs I retrieved using "journalctl -b -1" I have attached the complete log as a zip file.
logs.zip
Mar 17 19:30:20 homeassistant dockerd[358]: fatal error: schedule: spinning with local work Mar 17 19:30:20 homeassistant ghcr.io/home-assistant/generic-x86-64-homeassistant:2022.3.3/homeassistant[358]: 2022-03-17 20:30:20 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=light.gang_lys, old_state=<state light.gang_lys=on; supported_color_modes=['brightness'], color_mode=brightness, brightness=225, entity_id=['light.gang_lamper', 'light.gang_spots'], icon=mdi:lightbulb-group, friendly_name=Gang lys, supported_features=32 @ 2022-03-17T20:28:48.480609+01:00>, new_state=<state light.gang_lys=on; supported_color_modes=['brightness'], color_mode=brightness, brightness=227, entity_id=['light.gang_lamper', 'light.gang_spots'], icon=mdi:lightbulb-group,