whiskerz007 / proxmox_hassos_install

MIT License
887 stars 192 forks source link

Error Creating VM #61

Open krzysiek2788 opened 4 years ago

krzysiek2788 commented 4 years ago

Hi I was trying to use your script but I got this error below:

image

Any idea why?

@edit:

It looks like that thise line is causing an error: image

alesp24 commented 4 years ago

Hi, I've got a similar error too

image

could someone help me solve it? Thanks!

Swell61 commented 4 years ago

I had the same issue trying to install hassos this evening, using this script. While I don't know what caused the problem, updating Proxmox and rebooting the host solved the issue. Here is where I found the soltuion (I have no other hasos installs and no issues with my Proxmox setup). The install went ahead flawlessly and I am now in the Home Assistant web UI.

whiskerz007 commented 4 years ago

Is this still a problem?

dmshimself commented 4 years ago

I think it might still be an issue. I ran up a fresh proxmox install today, so the server has been rebooted. When using the script I get the following (data is zfs storage):

[INFO] Using 'data' for storage location. [INFO] Container ID is 104. Getting URL for latest Home Assistant disk image... Downloading disk image... Extracting disk image... Creating VM... [ERROR] 4@135 Unknown failure occured. #

If I use local-lvm, all is well

whiskerz007 commented 4 years ago

@dmshimself I am unable to reproduce your results. Please run the following command in a Proxmox shell and post the entire output. bash -xc "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh)"

dmshimself commented 4 years ago

Thanks for that and here is the output:

root@pve:~# bash -xc "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh)"

whiskerz007 commented 4 years ago

Please run the following and post the output.

pvesm alloc data 100 vm-100-disk-0 128; echo $?
dmshimself commented 4 years ago

A colleague re-installedanother fresh install, so the sorage usage changed. However I re-ran the first script and this second one, so the results are still consistent.

root@pve:~# bash -xc "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh)"

pvesm alloc local-zfs 100 vm-100-disk-0 128; echo $? successfully created 'local-zfs:vm-100-disk-0' 0

HorstBoy commented 4 years ago

I'm seeing same problem, but only when I have multiple VMs running that utilize same ZFS backed datastore. It seems that others having same problem are also using ZFS and some have eventually succeeded after tricks like reboots or retrying several times.

My theory is that everyone experiencing this problem is using ZFS for datastore and have either slow disk subsystem or enough load on disks causing "pvesm alloc" call to fail during "zfs create" step. Proxmox enforces 5 second timeout for zfs create which is simply too low for some of us.

After removing /dev/null redirection of pvesm output I get following error message: "command 'zfs create -s -V 1024k rpool/data/vm-119-disk-0' failed: got timeout"

Trying to run "zfs create -s -V 1024k rpool/data/vm-119-disk-0" manually fails as volume is actually created just fine, simply not fast enough for pvesm to be happy.

Extending timeout from 5 seconds to 15 seconds by patching Proxmox code makes this work (for me) but is obviously not a good solution as it will be overwritten by Proxmox updates. Waiting fix from Proxmox might take awhile as this problem was first reported over 5 years ago.

If Whiskerz is willing to maintain workaround for Proxmox side bug this could be solved as part of install.sh. If "pvesm alloc" returns error code 4 then enter loop that sleeps for couple seconds, check if new volume was created successfully despite pvesm error, repeat for n times if necessary and only if it still doesn't exist then bail out with error. Otherwise continue to next step of install.

As said this is actually old and known bug: https://forum.proxmox.com/threads/zfs-plugin-timeouts.18882/#post-110786

https://github.com/proxmox/pve-storage/blob/74b724a6993fbe859a76d46dd5f2f9be395a2002/PVE/Storage/ZFSPoolPlugin.pm#L311 https://github.com/proxmox/pve-storage/blob/74b724a6993fbe859a76d46dd5f2f9be395a2002/PVE/Storage/ZFSPoolPlugin.pm#L168

zfs_create_zvol doesn't specify timeout when calling zfs_request which causes timeout to be 5 seconds. It's not enough in all cases which triggers timeout.

dmshimself commented 4 years ago

Very useful information - many thanks

whiskerz007 commented 4 years ago

Try running the following and compare to the original script. Please report your findings.

bash -c "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/patch-slow-zfs/install.sh)"
dmshimself commented 4 years ago

It worked perfectly. I put the server under as much load as I could. Lots of IO on the same zfs pool, lots of CPU use and it sales through. Specifically I got:

Getting URL for latest Home Assistant disk image... Downloading disk image... Extracting disk image... Creating VM... Adding serial port and configuring console... Installing 'kpartx'... [INFO] Completed Successfully! New VM ID is 120.

HorstBoy commented 4 years ago

Unfortunately that doesn't solve problem since both "pvesm alloc" and "qm importdisk" call "zfs create" with same 5 second timeout.

One can simulate how script behaves with slow zfs disk subsystem by playing with timeout setting on ZFSPoolPlugin.pm and setting it to something really low such as 100ms.

--- /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm.orig  2020-07-14 14:58:24.000000000 +0300
+++ /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm       2020-08-24 21:46:18.028085724 +0300
@@ -173,7 +173,7 @@
     my $msg = '';
     my $output = sub { $msg .= "$_[0]\n" };

-    $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 5 if !$timeout;
+    $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 0.1 if !$timeout;

     run_command($cmd, errmsg => "zfs error", outfunc => $output, timeout => $timeout);
Psychoses commented 3 years ago

hi,

same problem here some time on a zfs install.

i have try : bash -c "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/patch-slow-zfs/install.sh)" but without success.

Finally, i got a successfull install with a simple sleep 15 between qm create and pvesm alloc

mmychu commented 3 years ago

Hello, same problem on my zfs with running another VM

Did like below: wget https://github.com/whiskerz007/proxmox_hassos_install/raw/master/install.sh to download script

firstly I got an error: Unable to handle file extension 'zip', so I did:

apt install unzip sed '/*"gz") gunzip -f $FILE;;/a\ \ *"zip") unzip -o $FILE;;' install.sh > temp.sh mv temp.sh install.sh chmod 755 install.sh

Later I added sleep 30 between qm create and pvesm alloc

type nano install.sh and add sleep 30 between qm create and pvesm alloc in "VM Create" section run script ./install.sh

My output: Before script modification:

root@proxmox:~# bash -c "$(wget -qLO - https://github.com/whiskerz007/proxmox_hassos_install/raw/patch-slow-zfs/install.sh)" [INFO] Using 'local-zfs' for storage location. [INFO] Container ID is 103. Getting URL for latest Home Assistant disk image... Downloading disk image... Extracting disk image... Creating VM... [ERROR] 4@138 Unknown failure occured.

After script modification:

root@proxmox:~# nano install.sh root@proxmox:~# ./install.sh [INFO] Using 'local-zfs' for storage location. [INFO] Container ID is 103. Getting URL for latest Home Assistant disk image... Downloading disk image... Extracting disk image... Archive: haos_ova-6.2.vmdk.zip inflating: haos_ova-6.2.vmdk
Creating VM... Adding serial port and configuring console... Installing 'kpartx'... [INFO] Completed Successfully! New VM ID is 103.