home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.85k stars 965 forks source link

VMware disk size is not dynamically shrinking #1578

Closed bartgrefte closed 2 years ago

bartgrefte commented 3 years ago

Hardware Environment

Home Assistant OS release:

version core-2021.10.0
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.9.7
os_name Linux
os_version 5.10.62
arch x86_64
timezone Europe/Amsterdam
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 6.4 -- | -- update_channel | stable supervisor_version | supervisor-2021.09.6 docker_version | 20.10.7 disk_total | 30.8 GB disk_used | 4.8 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.1.3), File editor (5.3.3), SQLite Web (3.1.0), Mosquitto broker (6.0.1), Glances (0.13.0)
Lovelace dashboards | 1 -- | -- resources | 0 views | 4 mode | storage

Description of problem: VMWare Tools is missing or incomplete in the OVA image of HASS OS, specifically the tool vmware-toolbox-cmd, used to reclaim unused space and return it to the host, is missing. Due to the absence of this tool, the virtual disk file keeps on growing even though HASS OS is actually using less than half of the size of the virtual disk on the host.

Currently, HASS OS is using up about 3.7GB (according to Glances) or 4.8GB according to the info page, while the virtual disk on the host has grown over time to about 8.5GB in size.

If this was any other Linux VM, I could run "sudo vmware-toolbox-cmd disk shrink /" to reclaim unused space to the host, but this does not work on HASS OS, this tool doesn't seem to be present.

agners commented 3 years ago

Why does every virtualization solution needs to reinvent the wheel :angry:

The (Linux) block subsystem has mechanism to inform the underlying block device about free space called "discard" or "trim". The ext4 driver supports this, and blocks should be regularly freed also through the fstrim.timer which runs weekly.

Are you sure this standard mechanism is not supported by VMware? Can you maybe enable it in some settings? What kind of harddrive type/controller are you using?

bartgrefte commented 3 years ago

"Are you sure this standard mechanism is not supported by VMware?" No idea, I only know to run "sudo vmware-toolbox-cmd disk shrink /" inside a VM in the case of Linux, in the case of Windows there's the "clean up disks" option in the VMWare Workstation VM->manage menu to click after shutdown VM. As for how the vmware-toolbox-cmd command actually does it's thing.... my knowledge about VM's doesn't go that far.

"Can you maybe enable it in some settings?" As far as I know, there are no settings related to this. It's just installing the VMWare Tools, run the vmware-toolbox-cmd command inside the Linux VM, wait a couple of seconds for it to finish, that's it.

"What kind of harddrive type/controller are you using?" SCSI, which is the default selection when adding a virtual hard drive. The same type was selected when I created an Ubuntu VM, though there vmware-toolbox-cmd is available.

bartgrefte commented 2 years ago

Is there any news about a potential solution?

agners commented 2 years ago

So VMware Tools are a commercial set of application, we cannot include it into Home Assitsant OS by default. There are open-vm-tools, but I am not sure if they would work. Also, integrating it needs to be done carefully so that it won't interfere with other supported VM platforms. At this point I don't have plans to work on it.

Furthermore, it seems very cumbersome that this solution needs manually running the vmware-toolbox-cmd.

I still wonder if it isn't possible to make the "industry standard" fstrim work with VMware. We already have an automated mechanism which runs fstrim every few days or so (fstrim.timer). Fstrim zeros out all unused blocks on the disk which should allow VMware to shrink the underlaying disk. It works flawlessly on KVM and other virtualization platforms.

Just googled a bit on my own, and at least these two resources seem to suggest that it does work: https://www.codyhosterman.com/2017/03/in-guest-unmap-fix-in-esxi-6-5-part-ii-linux/ https://communities.vmware.com/t5/VMware-Workstation-Pro/Trim-support-in-virtual-machines-of-VMware-Workstation-Player/td-p/2305963

Maybe you need to add disk.scsiUnmapAllowed in your virtual machine settings or use the "Clean up disk" command they are talking about.

bartgrefte commented 2 years ago

The clean up disk command is only visible with Windows VM's, which I mentioned earlier ;) and the compact disk command doesn't seem to have any effect.

I've added disk.scsiUnmapAllowed = "TRUE" to see if it does anything, currently HASS OS uses up 4.46G in the VM, 9.87GB on the host.

@agners Is there any way to manually trigger the fstrim.timer? Tried to find it in the webinterface,, no luck.

bartgrefte commented 2 years ago

Still no change, not even after shutting down the VM and running defragment and compact again (thinking this would help), the VM still uses more than twice the storage on the host than in the VM...

Just tried: "C:\Program Files (x86)\VMware\VMware Workstation\vmware-vdiskmanager.exe" -k haos_ova-6.2.vmdk The output indicated the vmdk was being shrunk, but the size did not change.

bartgrefte commented 2 years ago

@agners Is there any way to manually trigger the fstrim.timer?

agners commented 2 years ago

@bartgrefte yes, use login to get OS shell access, then run systemctl start fstrim.service. Use systemctl status fstrim.service to verify that it successfully run.

agners commented 2 years ago

The output indicated the vmdk was being shrunk, but the size did not change.

You mean the file size? How did you check the file size? The file might be sparse, and not show the effective size on disk. E.g. ls -lh might show a different size than du -h.

bartgrefte commented 2 years ago

@agners Ran the commands, it says successfully run, but only 13.4MB trimmed on /tmp and 28.9MB on /var. Nothing about other directories and nothing close to the difference between storage usage inside the VM and on the host.

"You mean the file size? How did you check the file size? The file might be sparse, and not show the effective size on disk. E.g. ls -lh might show a different size than du -h."

I checked the webinterface (domainname:8123/hassio/system ) which says "19.5 %" used, 19.5 % of 32GB (max size in VM settings) = 6.24GB and domainname:8123/config/info says (in Dutch) total storage space: 30.8 GB and used storage space 6.2 GB. The actual size of VMDK is currently 12GB in Windows explorer, df -h on HASS console confirms the 6.24GB from the webinterface:

~ $ df -h Filesystem Size Used Available Use% Mounted on overlay 30.8G 6.2G 23.3G 21% / tmpfs 480.5M 0 480.5M 0% /sys/fs/cgroup devtmpfs 478.7M 0 478.7M 0% /dev tmpfs 480.5M 0 480.5M 0% /dev/shm /dev/sda8 30.8G 6.2G 23.3G 21% /ssl /dev/sda8 30.8G 6.2G 23.3G 21% /backup /dev/sda8 30.8G 6.2G 23.3G 21% /share /dev/sda8 30.8G 6.2G 23.3G 21% /data /dev/sda8 30.8G 6.2G 23.3G 21% /config /dev/sda8 30.8G 6.2G 23.3G 21% /addons /dev/sda8 30.8G 6.2G 23.3G 21% /media /dev/sda8 30.8G 6.2G 23.3G 21% /etc/asound.conf /dev/sda8 30.8G 6.2G 23.3G 21% /run/audio tmpfs 192.2M 956.0K 191.3M 0% /run/dbus /dev/sda8 30.8G 6.2G 23.3G 21% /etc/hosts /dev/sda8 30.8G 6.2G 23.3G 21% /etc/resolv.conf /dev/sda8 30.8G 6.2G 23.3G 21% /etc/hostname tmpfs 480.5M 0 480.5M 0% /dev/shm /dev/sda8 30.8G 6.2G 23.3G 21% /etc/pulse/client.conf tmpfs 480.5M 0 480.5M 0% /proc/asound tmpfs 480.5M 0 480.5M 0% /proc/acpi devtmpfs 478.7M 0 478.7M 0% /proc/kcore devtmpfs 478.7M 0 478.7M 0% /proc/keys devtmpfs 478.7M 0 478.7M 0% /proc/timer_list tmpfs 480.5M 0 480.5M 0% /proc/scsi tmpfs 480.5M 0 480.5M 0% /sys/firmware ~ $

And here's ls -lh $ ls -lh total 104K drwxr-xr-x 2 root root 4.0K Aug 15 12:12 addons drwxr-xr-x 2 root root 4.0K Nov 6 17:40 backup drwxr-xr-x 1 root root 4.0K Apr 28 2021 bin drwxr-xr-x 8 root root 4.0K Nov 14 18:23 config drwxr-xr-x 4 root root 4.0K Nov 14 18:23 data drwxr-xr-x 16 root root 3.2K Nov 14 18:22 dev drwxr-xr-x 1 root root 4.0K Nov 14 18:23 etc drwxr-xr-x 2 root root 4.0K Apr 14 2021 home -rwxr-xr-x 1 root root 389 Oct 20 2020 init drwxr-xr-x 1 root root 4.0K Apr 30 2021 lib drwxr-xr-x 2 root root 4.0K Apr 28 2021 libexec drwxr-xr-x 2 root root 4.0K Aug 15 12:12 media drwxr-xr-x 2 root root 4.0K Apr 14 2021 mnt drwxr-xr-x 2 root root 4.0K Apr 14 2021 opt dr-xr-xr-x 236 root root 0 Nov 14 18:23 proc drwxr-xr-x 1 root root 4.0K Nov 14 18:23 root drwxr-xr-x 1 root root 4.0K Nov 14 18:23 run drwxr-xr-x 2 root root 4.0K Apr 14 2021 sbin drwxr-xr-x 2 root root 4.0K Aug 15 12:12 share drwxr-xr-x 2 root root 4.0K Apr 14 2021 srv drwxr-xr-x 2 root root 4.0K Aug 15 18:47 ssl dr-xr-xr-x 13 root root 0 Nov 14 18:23 sys drwxrwxrwt 1 root root 4.0K Nov 14 18:23 tmp drwxr-xr-x 1 root root 4.0K May 26 2021 usr drwxr-xr-x 1 root root 4.0K Apr 30 2021 var / $

The output of du -h is thousands of lines long, saved it to https://www.ravenslair.nl/files/hass_du-h.log

agners commented 2 years ago

You wrote:

Just tried: "C:\Program Files (x86)\VMware\VMware Workstation\vmware-vdiskmanager.exe" -k haos_ova-6.2.vmdk The output indicated the vmdk was being shrunk, but the size did not change.

My question is, how did you check the size of that file? In Linux, there is a difference between file size as and effective disk size. The latter can be much smaller if the file has "holes" in it (sparse files). From Wikipedia it seems that Windows supports that too. Check the effective size, it might be smaller than what you see in your file explorer.

bartgrefte commented 2 years ago

@agners I checked the size in the explorer window and properties window of the file.

On the properties window it says Size 11,5 GB (12.395.675.648 bytes) Size on disk 11,5 GB (12.396.593.152 bytes)

agners commented 2 years ago

Ok, yeah then it seems that VMware does not free up that space. I'd ask in a VMware forum, since they seem to support unmap/fstrim support, but it seems not to work entirly? Fstrim works fine on KVM/Proxmox and other virtualization environment, so it seems something VMware related.

bartgrefte commented 2 years ago

@agners Sorry for the delay, been a busy week. Just opened a topic on VMWare's forum: https://communities.vmware.com/t5/VMware-Workstation-Pro/disk-size-is-not-dynamically-shrinking/m-p/2881580#M172880 Now let's hope someone can figure this out.

bartgrefte commented 2 years ago

@agners Late update (put this aside for a while) and finally some progress :)

The VMDK has shrunk from 13.5GB to 6.9GB by doing cat /dev/zero > zero.fill;sync;sleep 1;sync;rm -f zero.fill, followed by vmware-vdiskmanager -k "C:\Users\Bart\Documents\Virtual Machines\Home Assistant OS 64-bit\haos_ova-6.2.vmdk" after I shutdown the VM.

So.... since that works, any idea why the build in method doesn't seem to do what it's supposed to in VMware Workstation?

agners commented 2 years ago

Moderator wila says at the above mentioned thread:

Whereas fstrim only works for virtual disks that support the discard command.

So I think it should work with fstrim, especially with SCSI disks. Maybe it did not work initially (before SCSI or disk.scsiUnmapAllowed was set)? Since trim is incremental, it could be that whatever blocks have been written before those changes did not got freed up until you zero'ed out and deleted the blocks explicitly with above method.

github-actions[bot] commented 2 years ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.