Closed hasadi-nyu closed 2 years ago
I have the same issue. I'm also on Proxmox 7.1-5. One thing that is different, is that Hass lasts a bit longer for me (around 12h before crashing). I also can't access the VM via the console and I always have to stop it to gain control again.
I've just started debugging what I think may be the same issue. HA goes offline for a minute or two but then recovers (maybe 1 or 2 times a day, hours between the issue happening). On one occasion it completely crashed and I needed to reboot the host. I am running on an i5 NUC11 and Proxmox (pve-manager: 7.1-6). I also did the same thing and created a new VM but the issue still exists. No other VM on the machine has the issue.
The offending error in the host log is a ata1.00 failed command: READ FPDMA QUEUED
. I just reset the machine before coming across this thread so I will post the full log next time it crashes.
There are Proxmox instructions on the Home Assistant Community Forum thread Installing Home Assistant OS using Proxmox 7.
Home Assistant OS for virtual machines (ova) is using a rather recent vanilla Linux kernel. It's somewhat strange that this would cause such issues. If its that particular version which somehow causes problems, it might be worth testing an older version or the recently released pre-release 7.0.rc1 (see https://github.com/home-assistant/operating-system/releases).
The error happened again today. The offending part of the log (which repeats for ~30seconds before recovering):
[76408.902869] ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen
[76408.903306] ata1.00: failed command: WRITE FPDMA QUEUED
[76408.903585] ata1.00: cmd 61/08:00:00:0b:a3/00:00:00:00:00/40 tag 0 ncq dma 4096 out
[76408.903585] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[76408.904484] ata1.00: status: { DRDY }
I also noticed a similar log in proxmox so my issue may be different to the one posted and possibly on the proxmox side. Strange it has only just started happening in the last weeks.
[Nov30 13:08] nvme nvme0: I/O 232 QID 6 timeout, aborting
[ +0.000010] nvme nvme0: I/O 256 QID 8 timeout, aborting
[ +0.000003] nvme nvme0: I/O 257 QID 8 timeout, aborting
[ +0.000002] nvme nvme0: I/O 258 QID 8 timeout, aborting
[ +0.000002] nvme nvme0: I/O 259 QID 8 timeout, aborting
[ +0.000002] nvme nvme0: I/O 260 QID 8 timeout, aborting
[ +0.000002] nvme nvme0: I/O 261 QID 8 timeout, aborting
[ +0.000003] nvme nvme0: I/O 262 QID 8 timeout, aborting
[Nov30 13:09] nvme nvme0: I/O 232 QID 6 timeout, reset controller
[ +0.032113] nvme nvme0: Abort status: 0x371
[ +0.000002] nvme nvme0: Abort status: 0x371
[ +0.000001] nvme nvme0: Abort status: 0x371
[ +0.000004] nvme nvme0: Abort status: 0x371
[ +0.000001] nvme nvme0: Abort status: 0x371
[ +0.000000] nvme nvme0: Abort status: 0x371
[ +0.000001] nvme nvme0: Abort status: 0x371
[ +0.000000] nvme nvme0: Abort status: 0x371
[ +0.038337] nvme nvme0: Shutdown timeout set to 10 seconds
[ +0.001521] nvme nvme0: 8/0/0 default/read/poll queues
I have followed the guide @agners suggested but using the original Whiskerz007 script. I will start a test with the new pre-release 7.0.rc1 before looking deeper into proxmox setup (ssd config etc -GRUB_CMDLINE_LINUX_DEFAULT="quiet nvme_core.default_ps_max_latency_us=0 pcie_aspm=off").
Maybe a hardware problem? I guess the reason that you only see it with HA could be due to its activity (writing database entries, logs etc all the time). I think discussing this with other Proxmox users on Community is probably the better approach.
For me and the original issue, after an update, it seems the issue has gone away. Previously, it would consistently fail after 40 seconds of the VM being ordered to start and I would have to restart the Node to run into the issue again.
Now, with PVE at pve-manager/7.1-6/4e61e21c (running kernel: 5.13.19-1-pve), I no longer am running into the issue.
Thanks for the update. I'll consider it a PVE issue then.
Describe the issue you are experiencing
I have installed a hass os VM on my proxmox server (SimplyNUC Ruby R8).
Within the past ~4 days, on startup of the VM, the hass OS VM crashes within 40 seconds (crashes not only in that I can't access it via a web browser but also going to the VM console on proxmox also failes to connect/freezes).
I have no issues with other VMs within my proxmox which is why I'm going to the HASS github instead of the proxmox github.
As a troubleshooting step, I used whiskerz's script (https://github.com/whiskerz007/proxmox_hassos_install) to create a brand new VM and install a fresh version of hass OS. Within about 40 seconds again (even on the install parts of the OS launch) the VM crashes. No amount of restarting, unplug/plug, etc. will resolve it. I can't get access to logs as it fails before a successful bootup. I can't determine the root cause and I'm not sure where to go.
My proxmox version: pve-manager/7.1-5/6fe299a0 (running kernel: 5.13.19-1-pve)
Shot in the dark, trying to see if others are seeing this issue or maybe its something else? I didn't find a previous issue through keyword searches that fits the bill for the issue I've been having recently. Apologies if this has been brought up already.
What operating system image do you use?
ova (for Virtual Machines)
What version of Home Assistant Operating System is installed?
6.6
Did you upgrade the Operating System.
No
Steps to reproduce the issue
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System Health information
No response
Additional information
No response