jiangcuo / Proxmox-Port

Proxmox VE arm64 riscv64 loongarch64
GNU Affero General Public License v3.0
681 stars 32 forks source link

Random crashes #115

Closed Toastyyy3 closed 2 weeks ago

Toastyyy3 commented 1 month ago

Describe the bug Every once in a while Proxmox crashes. I can't access any VM, LXC or the host itself afterwards until I cut power and turn it back on. Sometimes it runs smoothly for 2 weeks and then crashes and sometimes (like this week) it crashes twice a week.

To Reproduce No reproducible behavior observed. Times differ from mid night to mid day, journals don't show any errors. Maybe some of the containers corrupt the host? Raspberry Pi is located in a media shelf nearby the router, modem, Playstation 5 and Nintendo Switch. Or maybe something interferes with the Pi and leads to bit flips or something which then leads to a crash?

VM: Home Assistant LXC:

Another o

Expected behavior Not crash unpredictably.

ENV (please complete the following information):

proxmox-ve: 8.1.0 (running kernel: 6.1.0-21-arm64) pve-manager: 8.1.7 (running version: 8.1.7/ee1c3736ef6a6541) proxmox-kernel-helper: 8.1.0 ceph-fuse: 17.2.6-pve1+3 corosync: 3.1.7-pve3 criu: 3.17.1-2 glusterfs-client: 10.3-5 ifupdown: residual config ifupdown2: 3.2.0-1+pmx8 libjs-extjs: 7.0.0-4 libknet1: 1.28-pve1 libproxmox-acme-perl: 1.5.0 libproxmox-backup-qemu0: 1.4.0 libproxmox-rs-perl: 0.3.1 libpve-access-control: 8.1.3 libpve-apiclient-perl: 3.3.1 libpve-cluster-api-perl: 8.0.5 libpve-cluster-perl: 8.0.5 libpve-common-perl: 8.1.1 libpve-guest-common-perl: 5.0.6 libpve-http-server-perl: 5.0.5 libpve-network-perl: 0.9.6 libpve-rs-perl: 0.8.7 libpve-storage-perl: 8.1.2 libspice-server1: 0.15.1-1 lvm2: 2.03.16-2 lxc-pve: 5.0.2-4 lxcfs: 5.0.3-pve4 novnc-pve: 1.4.0-3 proxmox-backup-client: 3.1.2-1 proxmox-backup-file-restore: 3.0.4-1 proxmox-kernel-helper: 8.1.0 proxmox-mail-forward: 0.2.0 proxmox-mini-journalreader: 1.4.0 proxmox-offline-mirror-helper: 0.6.2 proxmox-widget-toolkit: 4.1.5 pve-cluster: 8.0.5 pve-container: 5.0.9 pve-docs: 8.1.4 pve-edk2-firmware: not correctly installed pve-firewall: 5.0.3 pve-firmware: 3.8-1 pve-ha-manager: 4.0.3 pve-i18n: 3.2.1 pve-qemu-kvm: 8.1.5-4 pve-xtermjs: 5.3.0-3 qemu-server: 8.1.1+port2 smartmontools: 7.3-pve1 spiceterm: 3.3.0 swtpm: 0.8.0+pve1 vncterm: 1.8.0 zfsutils-linux: 2.2.3-pve1

Additional context I saved the journal of every crash beginning from the 10th of June 2024. So if required I can post them, however there are no errors shown or something. Two examples can be found below.

The last few lines of the most recent crash:

Details

Jul 27 19:01:50 rpi4 pveproxy[1004819]: worker exit Jul 27 19:01:51 rpi4 pveproxy[1196]: worker 1004819 finished Jul 27 19:01:51 rpi4 pveproxy[1196]: starting 1 worker(s) Jul 27 19:01:51 rpi4 pveproxy[1196]: worker 1034404 started Jul 27 19:17:01 rpi4 CRON[1038081]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jul 27 19:17:01 rpi4 CRON[1038082]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Jul 27 19:17:01 rpi4 CRON[1038081]: pam_unix(cron:session): session closed for user root Jul 27 20:09:42 rpi4 systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories... Jul 27 20:09:42 rpi4 systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully. Jul 27 20:09:42 rpi4 systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories. Jul 27 20:09:42 rpi4 systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully. Jul 27 20:17:01 rpi4 CRON[1052544]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jul 27 20:17:01 rpi4 CRON[1052545]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Jul 27 20:17:01 rpi4 CRON[1052544]: pam_unix(cron:session): session closed for user root Jul 27 20:28:38 rpi4 pveproxy[1034404]: worker exit Jul 27 20:28:38 rpi4 pveproxy[1196]: worker 1034404 finished Jul 27 20:28:38 rpi4 pveproxy[1196]: starting 1 worker(s) Jul 27 20:28:38 rpi4 pveproxy[1196]: worker 1055337 started Jul 27 20:31:53 rpi4 pveproxy[1032014]: worker exit Jul 27 20:31:53 rpi4 pveproxy[1196]: worker 1032014 finished Jul 27 20:31:53 rpi4 pveproxy[1196]: starting 1 worker(s) Jul 27 20:31:53 rpi4 pveproxy[1196]: worker 1056100 started Jul 27 20:35:12 rpi4 pveproxy[1028805]: worker exit Jul 27 20:35:12 rpi4 pveproxy[1196]: worker 1028805 finished Jul 27 20:35:12 rpi4 pveproxy[1196]: starting 1 worker(s) Jul 27 20:35:12 rpi4 pveproxy[1196]: worker 1056899 started Jul 27 20:51:18 rpi4 pvestatd[1162]: auth key pair too old, rotating.. Jul 27 21:00:06 rpi4 pvescheduler[1062911]: starting task UPID:rpi4:00103800:0191B298:66A543B6:vzdump::root@pam: Jul 27 21:00:06 rpi4 pvescheduler[1062912]: INFO: starting new backup job: vzdump --mailnotification failure --compress zstd --storage local --mode snapshot --all 1 --notes-template '{{guestname}}' --quiet 1 --prune-backups 'keep-last=5' Jul 27 21:00:07 rpi4 pvescheduler[1062912]: INFO: Starting Backup of VM 100 (qemu) Jul 27 21:08:40 rpi4 pvescheduler[1062912]: INFO: Finished Backup of VM 100 (00:08:34) Jul 27 21:08:40 rpi4 pvescheduler[1062912]: INFO: Starting Backup of VM 101 (lxc)

The last few lines of another crash:

Details

Jun 27 21:00:13 rpi4 pvescheduler[3900953]: starting task UPID:rpi4:003B861A:05367079:667DB6BD:vzdump::root@pam: Jun 27 21:00:13 rpi4 pvescheduler[3900954]: INFO: starting new backup job: vzdump --prune-backups 'keep-last=5' --quiet 1 --storage local --compress zstd --notes-template '{{guestname}}' --all 1 --mailnotification failure --mode snapshot Jun 27 21:00:13 rpi4 pvescheduler[3900954]: INFO: Starting Backup of VM 100 (qemu) Jun 27 21:07:35 rpi4 pvescheduler[3900954]: INFO: Finished Backup of VM 100 (00:07:22) Jun 27 21:07:35 rpi4 pvescheduler[3900954]: INFO: Starting Backup of VM 101 (lxc) Jun 27 21:08:32 rpi4 pvescheduler[3900954]: INFO: Finished Backup of VM 101 (00:00:57) Jun 27 21:08:32 rpi4 pvescheduler[3900954]: INFO: Starting Backup of VM 102 (lxc) Jun 27 21:11:17 rpi4 pvescheduler[3900954]: INFO: Finished Backup of VM 102 (00:02:45) Jun 27 21:11:17 rpi4 pvescheduler[3900954]: INFO: Starting Backup of VM 103 (lxc) Jun 27 21:12:10 rpi4 pvescheduler[3900954]: INFO: Finished Backup of VM 103 (00:00:53) Jun 27 21:12:10 rpi4 pvescheduler[3900954]: INFO: Starting Backup of VM 104 (lxc) Jun 27 21:13:45 rpi4 pvescheduler[3900954]: INFO: Finished Backup of VM 104 (00:01:35) Jun 27 21:13:45 rpi4 pvescheduler[3900954]: INFO: Starting Backup of VM 105 (lxc) Jun 27 21:16:29 rpi4 pvescheduler[3900954]: INFO: Finished Backup of VM 105 (00:02:44) Jun 27 21:16:29 rpi4 pvescheduler[3900954]: INFO: Starting Backup of VM 106 (lxc) Jun 27 21:17:02 rpi4 CRON[3905541]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 27 21:17:02 rpi4 CRON[3905542]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Jun 27 21:17:02 rpi4 CRON[3905541]: pam_unix(cron:session): session closed for user root Jun 27 21:18:03 rpi4 pvescheduler[3900954]: INFO: Finished Backup of VM 106 (00:01:34) Jun 27 21:18:03 rpi4 pvescheduler[3900954]: INFO: Backup job finished successfully Jun 27 21:22:04 rpi4 pveproxy[3878370]: worker exit Jun 27 21:22:05 rpi4 pveproxy[1084]: worker 3878370 finished Jun 27 21:22:05 rpi4 pveproxy[1084]: starting 1 worker(s) Jun 27 21:22:05 rpi4 pveproxy[1084]: worker 3906891 started Jun 27 21:46:29 rpi4 pveproxy[3885612]: worker exit Jun 27 21:46:29 rpi4 pveproxy[1084]: worker 3885612 finished Jun 27 21:46:29 rpi4 pveproxy[1084]: starting 1 worker(s) Jun 27 21:46:29 rpi4 pveproxy[1084]: worker 3913339 started Jun 27 21:58:05 rpi4 pveproxy[3891616]: worker exit Jun 27 21:58:05 rpi4 pveproxy[1084]: worker 3891616 finished Jun 27 21:58:05 rpi4 pveproxy[1084]: starting 1 worker(s) Jun 27 21:58:05 rpi4 pveproxy[1084]: worker 3916398 started Jun 27 22:17:01 rpi4 CRON[3921398]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 27 22:17:01 rpi4 CRON[3921399]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Jun 27 22:17:01 rpi4 CRON[3921398]: pam_unix(cron:session): session closed for user root Jun 27 23:17:01 rpi4 CRON[3937313]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 27 23:17:01 rpi4 CRON[3937314]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Jun 27 23:17:01 rpi4 CRON[3937313]: pam_unix(cron:session): session closed for user root Jun 27 23:19:24 rpi4 pveproxy[3916398]: worker exit Jun 27 23:19:24 rpi4 pveproxy[1084]: worker 3916398 finished Jun 27 23:19:24 rpi4 pveproxy[1084]: starting 1 worker(s) Jun 27 23:19:24 rpi4 pveproxy[1084]: worker 3937974 started Jun 27 23:19:39 rpi4 pveproxy[3913339]: worker exit Jun 27 23:19:39 rpi4 pveproxy[1084]: worker 3913339 finished Jun 27 23:19:39 rpi4 pveproxy[1084]: starting 1 worker(s) Jun 27 23:19:39 rpi4 pveproxy[1084]: worker 3938018 started Jun 27 23:25:58 rpi4 pveproxy[3906891]: worker exit Jun 27 23:25:58 rpi4 pveproxy[1084]: worker 3906891 finished Jun 27 23:25:58 rpi4 pveproxy[1084]: starting 1 worker(s) Jun 27 23:25:58 rpi4 pveproxy[1084]: worker 3939690 started

Maybe someone can push me in the right direction...? Grateful for any kind of help!

jiangcuo commented 1 month ago

Didn't see any useful information

napper306 commented 1 month ago

Have you tried plugging the Pi into a monitor after it becomes unavailable to see if it has actually crashed or if only network access is a problem? Other possibilities: Resource allocation - too much RAM and/or CPU being assigned to VMs/containers, system seizing up from not having adequate resources available. Drive issue, such as - USB can't keep up with I/O required or drive has bad sectors. Power issues. Things connected via USB drawing more power than Pi can safely provide.

Toastyyy3 commented 2 weeks ago

Sorry, I wanted to reply sooner.

So I believe I fixed it and it wasn‘t a port issue. I installed the proxmox port on top of raspi debian and then I found somewhere (unfortunately can‘t find the post anymore) that something in the debian kernel had to be disabled to run proxmox stably I believe.

In other words, when installing Proxmox on top of Debian, something in the kernel or similar has to be disabled.

So issue closed, thanks for the help anyways!