Open krzysiek2788 opened 4 years ago
Hi, I've got a similar error too
could someone help me solve it? Thanks!
I had the same issue trying to install hassos this evening, using this script. While I don't know what caused the problem, updating Proxmox and rebooting the host solved the issue. Here is where I found the soltuion (I have no other hasos installs and no issues with my Proxmox setup). The install went ahead flawlessly and I am now in the Home Assistant web UI.
Is this still a problem?
I think it might still be an issue. I ran up a fresh proxmox install today, so the server has been rebooted. When using the script I get the following (data is zfs storage):
[INFO] Using 'data' for storage location. [INFO] Container ID is 104. Getting URL for latest Home Assistant disk image... Downloading disk image... Extracting disk image... Creating VM... [ERROR] 4@135 Unknown failure occured. #
If I use local-lvm, all is well
@dmshimself I am unable to reproduce your results. Please run the following command in a Proxmox shell and post the entire output.
bash -xc "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh)"
Thanks for that and here is the output:
root@pve:~# bash -xc "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh)"
Please run the following and post the output.
pvesm alloc data 100 vm-100-disk-0 128; echo $?
A colleague re-installedanother fresh install, so the sorage usage changed. However I re-ran the first script and this second one, so the results are still consistent.
root@pve:~# bash -xc "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh)"
pvesm alloc local-zfs 100 vm-100-disk-0 128; echo $? successfully created 'local-zfs:vm-100-disk-0' 0
I'm seeing same problem, but only when I have multiple VMs running that utilize same ZFS backed datastore. It seems that others having same problem are also using ZFS and some have eventually succeeded after tricks like reboots or retrying several times.
My theory is that everyone experiencing this problem is using ZFS for datastore and have either slow disk subsystem or enough load on disks causing "pvesm alloc" call to fail during "zfs create" step. Proxmox enforces 5 second timeout for zfs create which is simply too low for some of us.
After removing /dev/null redirection of pvesm output I get following error message: "command 'zfs create -s -V 1024k rpool/data/vm-119-disk-0' failed: got timeout"
Trying to run "zfs create -s -V 1024k rpool/data/vm-119-disk-0" manually fails as volume is actually created just fine, simply not fast enough for pvesm to be happy.
Extending timeout from 5 seconds to 15 seconds by patching Proxmox code makes this work (for me) but is obviously not a good solution as it will be overwritten by Proxmox updates. Waiting fix from Proxmox might take awhile as this problem was first reported over 5 years ago.
If Whiskerz is willing to maintain workaround for Proxmox side bug this could be solved as part of install.sh. If "pvesm alloc" returns error code 4 then enter loop that sleeps for couple seconds, check if new volume was created successfully despite pvesm error, repeat for n times if necessary and only if it still doesn't exist then bail out with error. Otherwise continue to next step of install.
As said this is actually old and known bug: https://forum.proxmox.com/threads/zfs-plugin-timeouts.18882/#post-110786
https://github.com/proxmox/pve-storage/blob/74b724a6993fbe859a76d46dd5f2f9be395a2002/PVE/Storage/ZFSPoolPlugin.pm#L311 https://github.com/proxmox/pve-storage/blob/74b724a6993fbe859a76d46dd5f2f9be395a2002/PVE/Storage/ZFSPoolPlugin.pm#L168
zfs_create_zvol doesn't specify timeout when calling zfs_request which causes timeout to be 5 seconds. It's not enough in all cases which triggers timeout.
Very useful information - many thanks
Try running the following and compare to the original script. Please report your findings.
bash -c "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/patch-slow-zfs/install.sh)"
It worked perfectly. I put the server under as much load as I could. Lots of IO on the same zfs pool, lots of CPU use and it sales through. Specifically I got:
Getting URL for latest Home Assistant disk image... Downloading disk image... Extracting disk image... Creating VM... Adding serial port and configuring console... Installing 'kpartx'... [INFO] Completed Successfully! New VM ID is 120.
Unfortunately that doesn't solve problem since both "pvesm alloc" and "qm importdisk" call "zfs create" with same 5 second timeout.
One can simulate how script behaves with slow zfs disk subsystem by playing with timeout setting on ZFSPoolPlugin.pm and setting it to something really low such as 100ms.
--- /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm.orig 2020-07-14 14:58:24.000000000 +0300
+++ /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm 2020-08-24 21:46:18.028085724 +0300
@@ -173,7 +173,7 @@
my $msg = '';
my $output = sub { $msg .= "$_[0]\n" };
- $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 5 if !$timeout;
+ $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 0.1 if !$timeout;
run_command($cmd, errmsg => "zfs error", outfunc => $output, timeout => $timeout);
hi,
same problem here some time on a zfs install.
i have try : bash -c "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/patch-slow-zfs/install.sh)"
but without success.
Finally, i got a successfull install with a simple sleep 15
between qm create
and pvesm alloc
Hello, same problem on my zfs with running another VM
Did like below:
wget https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh
to download script
firstly I got an error: Unable to handle file extension 'zip', so I did:
apt install unzip
sed '/*"gz") gunzip -f $FILE;;/a\ \ *"zip") unzip -o $FILE;;' install.sh > temp.sh
mv temp.sh install.sh
chmod 755 install.sh
Later I added sleep 30
between qm create
and pvesm alloc
type nano install.sh
and add sleep 30
between qm create
and pvesm alloc
in "VM Create" section
run script ./install.sh
My output: Before script modification:
root@proxmox:~# bash -c "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/patch-slow-zfs/install.sh)" [INFO] Using 'local-zfs' for storage location. [INFO] Container ID is 103. Getting URL for latest Home Assistant disk image... Downloading disk image... Extracting disk image... Creating VM... [ERROR] 4@138 Unknown failure occured.
After script modification:
root@proxmox:~# nano install.sh
root@proxmox:~# ./install.sh
[INFO] Using 'local-zfs' for storage location.
[INFO] Container ID is 103.
Getting URL for latest Home Assistant disk image...
Downloading disk image...
Extracting disk image...
Archive: haos_ova-6.2.vmdk.zip
inflating: haos_ova-6.2.vmdk
Creating VM...
Adding serial port and configuring console...
Installing 'kpartx'...
[INFO] Completed Successfully! New VM ID is 103.
Hi I was trying to use your script but I got this error below:
Any idea why?
@edit:
It looks like that thise line is causing an error: