Closed agners closed 1 year ago
Updating the Proxmox kernel can solve the issue, see https://github.com/home-assistant/operating-system/issues/1705#issuecomment-1411406865
Also setting VirtIO SCSI Single / iothread=1 / aio=threads on all our KVM guests, see https://github.com/home-assistant/operating-system/issues/1705#issuecomment-1418808236.
Updating the Proxmox kernel can solve the issue, see #1705 (comment)
Also setting VirtIO SCSI Single / iothread=1 / aio=threads on all our KVM guests, see #1705 (comment).
I did both, issue is gone.
Proxmox has an opt-in for a Linux 6.1 Kernel, the development of edge Kernels will temporally pause with the 6.0 kernel.
https://forum.proxmox.com/threads/opt-in-linux-6-1-kernel-for-proxmox-ve-7-x-available.119483/
Switching to the pve 6.1 Kernel re-introduced the issue again. So far it seems, that only the 6.0.15-edge kernel fixed the problem and not the VirtIO SCSI Single / iothread=1 / aio=threads
settings.
Switching to the pve 6.1 Kernel re-introduced the issue again. So far it seems, that only the 6.0.15-edge kernel fixed the problem and not the
VirtIO SCSI Single / iothread=1 / aio=threads
settings.
I was wrong, no matter which kernel or VirtIO settings, the hassos vm crashed soon after start with version 9.5
I did a downgrade to 9.4 and it's stable now for a while running with the edge kernel.
I'm facing same problem (latest HAOS). I made issue to Proxmox forum: https://forum.proxmox.com/threads/host-hangs-up-after-12-24-hours-unless-rcu_sched-kthread-gets-sufficient-cpu-time-oom-is-now-expected-behavior.122347/
Started the VM 14 hours ago with Linux intelnuc 6.1.10-1-pve and VirtIO SCSI Single / iothread=1 / aio=threads settings. Let's see...
HAOS 9.5 mainly comes with a new stable kernel release 5.15.90 (HAOS 9.4 was using 5.15.80). The current development build contains a newer kernel 5.15.93, that might be worth a try. It is typically fairly safe to upgrade to development builds and downgrade back to stable builds, but I still recommend taking a snapshot :smile:
ha supervisor options --channel dev
ha supervisor reload
ha supervisor update
ha os update
And to downgrade:
ha su options --channel stable
ha supervisor reload
ha os update --version 9.5
I'm seeing this same issue in a VM on an M2 Mac using UTM. Happens every couple days and throws off the VM's time/date (Sept 2059)
ha › (30157.0233571 rcu: INFO: rcu_preempt self-detected stall on CPU
[30157.0283551 reu: 02-..: (2 ticks this GP) Idle-361/1/0x4000000000000002 softirq=283006/283006 fuss1.
(30157.0302231 reu: reu preempt kthread starved for 288245252820 Jlfries? g745309 fOx0 RCULGP_MAIT_14S(5) -›state-0x0 -›epu-3 (30157.0308861 rcu: oUnless rcu_preempt kthread gets sufficient CPU tine,
00M is now expected behauior.
[30157.031037] rcu: RCU grace-period kthread stack dump:
(30153.036108] rcu: Stack dump where RU GP kthread last ran:
HAOS 9.5 mainly comes with a new stable kernel release 5.15.90 (HAOS 9.4 was using 5.15.80). The current development build contains a newer kernel 5.15.93, that might be worth a try. It is typically fairly safe to upgrade to development builds and downgrade back to stable builds, but I still recommend taking a snapshot 😄
ha supervisor options --channel dev ha supervisor reload ha supervisor update ha os update
And to downgrade:
ha su options --channel stable ha supervisor reload ha os update --version 9.5
I tried to update the latest dev os, which was 10.0, but HA didn't start with that version. All I got were OutOfMemory exceptions.
@hprotzek how much memory do you allocate to the system?
@hprotzek how much memory do you allocate to the system?
4GB ballooning disabled
This issue is very strange, I have 2 identical hardware setups, HP t630, running both same Proxmox version with HomeAssistant. One installation runs fine, the other is having this issue with 9.5 All other vm's and containers are working fine. On the faulty one I also get sometimes these errors, but HA runs with 9.4 stable even with this
752.6925871 xhci_ hed 0000:02:1b.O: ERROR Transfer event IRB DMA ptr not part of current TD ep index 2 comp_code 4
@agners, I also updated to 10.0dev version. But get no IP address assigned. Via the CLI in proxmox console I got to set the IPaddress.
But I now have issues inv 10.0dev that my Mosquitto-broker wont start anymore (because of 'Error: Unable to create websockets listener on port 1884'). Since this is crucial for me, I will downgrade back to 9.5
Got same error in HA OS 9.4 also for first time. Uptime was 42 days. Previously tested HA OS 10dev. It worked but crashed for same problem fast.
Have also the same problems on version 9.5 afther 2 hours and a fresh install. Now i updated to os version 11.0.dev20230328 and waiting for troubles ;-)
Same problem here. Use the 9.5 version.
getting this too. im on haos 9.5 proxmox 7.4-3
Have also the same problems on version 9.5 afther 2 hours and a fresh install. Now i updated to os version 11.0.dev20230328 and waiting for troubles ;-)
Same here... the stable 10.0 came through and only gave me the same issues. Now I'm trying the 11.0.dev20230420. Fingers crossed!
Update: 2 days later and the 11.0 dev version has the same issue. 🥴
@agners do you have a'y idea in which version this will be solved? Or do I need to update proxmox?
Edit: Found a real solution by intalling intel-microcode. See my later post.
Because coming home in the dark drove me crazy, I made a script that resets the haos vm if it misses a certain number of pings. Have it running on the proxmox host that haos is running on.
#!/bin/bash
FILE=errors.txt
TARGET=10.20.0.11
VMID=102
FAILLEVEL=20
ERRORCOUNTER=0
pinginterval=1
touch $FILE
while true;
do
DATE=$(date '+%d/%m/%Y %H:%M:%S')
ping -c 1 $TARGET &> /dev/null
if [[ $? -ne 0 ]]; then
if [[ $ERRORCOUNTER -eq 0 ]]; then
echo $DATE $TARGET "down">> $FILE
fi
#sed '${s/$/%/}' $FILE
let ERRORCOUNTER++
if [[ $ERRORCOUNTER -eq $FAILLEVEL ]]; then
echo $DATE $TARGET "- Reset " $VMID >> $FILE
qm reset $VMID
fi
else
if [[ $ERRORCOUNTER > 0 ]]; then
echo $DATE $TARGET "up again (" $ERRORCOUNTER "missed pings)" >> $FILE
fi
ERRORCOUNTER=0
fi
sleep $pinginterval
done
My VM was still on 2 cores. I just updated to 1 core. "Start the clock" 🤞
I'm having the same issue on proxmox 7.4-3 and HA OS 10.3.
Any update on this issue?
I solved my problems on my Proxmox cluster. In my test environment I ran into similar problems with OPNsense. In the end, the installation of the intel microcode on the Proxmox host proved to be the solution. I have an Intel N5105 processor that caused the problems. See below for more information.
https://forum.opnsense.org/index.php?topic=32406.msg156769#msg156769 https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/page-30
Interesting, microcode update. It also seems related to cpuidle issues, I guess that can influence timers/timing and cause such RCU issues indeed :thinking:
@polter05 can you try this fix on your end?
It's four days now and still no freezing vm's. Before updating the microcode I had to reset the haos vm 2+ times a day.
Fixed by installing the microcode on my Topton N6005
Step 1: Add the following to the file /etc/apt/sources.list deb http://ftp.se.debian.org/debian bullseye main contrib non-free deb http://ftp.se.debian.org/debian bullseye-updates main contrib non-free
Step 2: apt update
Step 3: apt-get install intel-microcode
Step 4: Reboot the Proxmox system
Ok thanks for the information.
So I assume then that this can also be resolved that way for @polter05. Since there wasn't a change in OS I mark it as won't fix.
Originally posted by @polter05 in https://github.com/home-assistant/operating-system/issues/1705#issuecomment-1410238093
Home Assistant OS 9.5
Note: This is an issue which tracks CPU stalls on Proxmox. It is similar to #1705 which tracks such issues on VirtualBox, but likely a different culprit (since Proxmox and VirtualBox are different Hypervisors!). Please post to the appropriate issue.