Open luckylinux opened 7 months ago
"TRIM"
It should be disabled on the host because it's ZFS on top of LUKS. That's the default behavior from what I understood at least.
However systemctl status fstrim.timer
reports on the Host:
● fstrim.timer - Discard unused blocks once a week
Loaded: loaded (/lib/systemd/system/fstrim.timer; enabled; preset: enabled)
Active: active (waiting) since Fri 2024-04-05 10:29:41 CEST; 4 days ago
Trigger: Mon 2024-04-15 01:37:09 CEST; 5 days left
Triggers: ● fstrim.service
Docs: man:fstrim
Apr 05 10:29:41 pve16 systemd[1]: Started fstrim.timer - Discard unused blocks once a week.
On the guest you might be right though
I always have SSD Emulation + Discard + IO thread enabled on all of my VMs.
But zpool get autotrim
returns off
both for the Host and the Guest VM.
systemctl status fstrim.timer
reports on the Guest VM:
○ fstrim.timer - Discard unused blocks once a week
Loaded: loaded (/lib/systemd/system/fstrim.timer; disabled; preset: enabled)
Active: inactive (dead)
Trigger: n/a
Triggers: ● fstrim.service
Docs: man:fstrim
Any other command to check ?
You misunderstood.
fstrim doesn't do anything with ZFS, and absent autotrim, it's not going to issue such requests without an explicit zpool trim
in the guest, leaving the space that was freed in the guest still marked in use on the host.
And that's ZFS-specific ? I mean the ext4 partition for / on top the the ZVOL (like many other containers I have) do not really have this problem.
Somewhere I think I read that zpool trim
is kinda dangerous concerning data loss. Isn't it ?
Yes, the command zpool trim
is ZFS specific.
There was an uncommon race with data mangling using any kind of TRIM that was fixed in 2.2 and 2.1.14.
I wouldn't suggest using any FS that you don't want to use TRIM with inside a VM if you're worried about the space usage when things are freed and not deleted on the host.
What do you mean exactly by your latest statement ? That I should run zpool trim
or that I should not be running ZFS on top of ZVOL ?
Thanks.
Hopefully there won't be any regression of that bug :D.
However nothing seems to be happening.
Issued zpool trim zdata
on the Guest and zpool status -t
reports on the Guest:
pool: zdata
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:07 with 0 errors on Sun Mar 10 00:24:09 2024
config:
NAME STATE READ WRITE CKSUM
zdata ONLINE 0 0 0
PODMAN ONLINE 0 0 0 (100% trimmed, completed at Wed 10 Apr 2024 12:52:57 AM CEST)
errors: No known data errors
Host zfs list | grep "vm-103-disk-1"
on the Host:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-103-disk-1 67.8G 274G 989M -
Granted it could be running in the background. But for now there is absolutely no change.
You may note that there are 3 columns there, and "referenced" changed pretty substantially.
True. So I just need to destroy the old snapshots of that dataset on the Host.
Yep.
Now zfs list | grep "vm-103-disk-1"
yields:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-103-disk-1 989M 312G 989M -
Maybe the root filesystem of the Guest VM (ext4) has autotrim enabled by default then ?
That could explain the behavior ...
For reference, you could have seen if that was going to happen before doing it with zfs destroy -nv [list of snapshots]
or by looking at the 4 different "usedby" properties which sum to USED.
zpool trim
is for ZFS. The root FS is ext4, so fstrim will do as you expect.
Or if it's safe enough enable ZFS autotrim on the Guest VM ?
zpool set autotrim=on zdata
The bug wasn't specific to automatic or manual trim, afair, so autotrim should be no less safe than manual trim.
There was an uncommon race with data mangling using any kind of TRIM that was fixed in 2.2 and 2.1.14.
There seems to be a new issue in 2.2.3: #16056
Just so you're aware
Kind of.
Note that #16056, from my quick reading, seems to be the result of using hardware that lied and gave an invalid value for how big TRIM can be, combined with a failure in error case handling for that since that should not really ever happen. #16070 fixes the latter, but it's not entirely clear what the right thing to do if I'm right about the former is.
System information
Host (Proxmox VE):
Guest VM (Debian GNU/Linux KVM):
Describe the problem you're observing
I am facing some EXTREME overhead when creating a ZFS zpool in a Proxmox VE VM Guest on top of a ZVOL on the Host.
The ZFS Pool on the Host System is also sitting on top of a LUKS / Cryptsetup Full-Disk-Encryption, although I do NOT think this is relevant for the issue described here (since the Host ZFS Pool is sitting on top of the DMCrypt / LUKS Device).
zfs list
on the host:zpool get all rpool
Host Pool Properties:zfs get all rpool/data/vm-103-disk-1
Host ZFS Properties:vm-103-disk-0
is ext4 on top of ZVOL for comparison purposes.vm-103-disk-1
is the ZFS Pool on top of ZVOL which has the issue.df -ah
for / Guest VM filesystemvm-103-disk-0
(for comparison purposes) which is based on:So 5.7G used on the Guest and 8.27G on the Host. Overhead is roughly 45% (8.27G/5.7G - 1)*100%.
zfs list
on the Guest VMvm-103-disk-1
Container Data Storage (Podman):Overhead is roughly 8350% (67.6G/0.8G - 1)*100% !!!
zpool get all zdata
Guest VM Pool Properties:zfs get all zdata
Guest ZFS Properties:Note: it is possible that this issue is caused by block size / volblocksize or similar parameter, since Podman / Docker containers could generate lots of small files.
That does not sound like much though ... On the GUEST recordsize is set to 128K, that's probably a bit high, isn't it ?
Regardless, just because of the number of files, that would yield: 14475 x 128K = 1852800K = 1852.8 M = 1.85 G
So probably it's causing some overhead inside the guest, but not on the level of the overhead between guest and host ...
Describe how to reproduce the problem
I don't think that it's necessary to have a VM to replicate this.
Probably just on one system (Host) it's sufficient to create a ZFS Pool on top of the ZVOL.
I disabled compression on the Guest level since it would only cause additional CPU load for no apparent benefit. Therefore compression should NOT be the cause of this huge overhead.
Notes
The idea of having a ZFS Pool on top of the ZVOL is have better control over ZFS Snapshots. In this case, after many other things will have been configured correctly, the snapshot & backup plan of
rpool/data/vm-103-disk-1
could be performed by the Guest, as opposed to the host for many other VMs.This can avoid backing up non-useful Data (such as Container Images or Container Storage) and only backup Useful / Critical Data (Containers Configuration, Secrets, Data, Certificates, Volumes, ...) thus saving a lot on Disk Space on the Backup Server.
Include any warning/errors/backtraces from the system logs