Closed FransUrbo closed 7 years ago
That don't seem to work. After a while, the server just dies… I don't know if it's the "stress" (my SAS controllers haven't been rock-solid - the driver isn't really any good).
I have to try to copy the data onto another ZVOL and destroy this one...
Looking through a bunch of my ZVOLs, I noticed that there's several that show the same strange behavior:
Negotia, which is NOT a VM, have /dev/sdc
via iSCSI:
Negotia:~# df -h -text4
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 111G 15G 91G 14% /Machines
/dev/sdc1 504G 8.7G 470G 2% /Machines/Machines
However, the ZVOL shows:
celia# zfs list share/VirtualMachines/Negotia/Machines
NAME USED AVAIL REFER MOUNTPOINT
share/VirtualMachines/Negotia/Machines 83.5G 266G 83.5G -
Getting another list:
celia# zfs list -oname,volsize,used -tvolume -r share/VirtualMachines | egrep 'G$' | sort -n -k3 | tail
share/VirtualMachines/Ubuntu/Utopic/Server 15G 16.3G
share/VirtualMachines/Windows/7 15G 16.4G
share/VirtualMachines/Ubuntu/Vivid/Server 15G 17.0G
share/VirtualMachines/Ubuntu/Saucy/Server 15G 17.9G
share/VirtualMachines/Windows/Vista 15G 19.3G
share/VirtualMachines/Debian/Lenny/32_Source 25G 23.1G
share/VirtualMachines/Debian/Sid/Src 25G 23.3G
share/VirtualMachines/Debian/Lenny/32_Source2 40G 30.5G
share/VirtualMachines/Ubuntu/Trusty/Server 15G 51.7G
share/VirtualMachines/Negotia/Machines 512G 83.5G
All these (except the last one) is base installed only. Nothing fancy… The majority of them have gone over the volsize
.
All my ZVOLs uses volblocksize=512
, except share/VirtualMachines/Negotia/Machines
which uses 8K
.
But going back to the original ZVOL (share/VirtualMachines/Ubuntu/Trusty/Server
), I created a new ZVOL (with the extension .new
), mounted that in the VM, partitioned it (GPT) and put ext3 on it. Mounted it and copied it (using find -mount | cpio -vpmd --preserve-modification-time
), it is now STILL bigger than what the VM thinks:
UbuntuTrustyServer:/usr/src# df -h -t ext4 -t ext3
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 15G 2.1G 12G 15% /
/dev/sdc1 50G 838M 46G 2% /usr/src
/dev/sdb1 14G 2.3G 11G 18% /mnt
(sda1
is the 'original', 'weird' ZVOL and sdb
is the new).
Notice that that mismatches to - sda1
and sdb1
should be identical now!
and the zvol:
celia# zfs list -oname,used share/VirtualMachines/Ubuntu/Trusty/Server share/VirtualMachines/Ubuntu/Trusty/Server.new
NAME USED
share/VirtualMachines/Ubuntu/Trusty/Server 51.9G
share/VirtualMachines/Ubuntu/Trusty/Server.new 4.11G
Almost twice of what the VM thinks...
I decided to destroy share/VirtualMachines/Ubuntu
recursively and start over… Seems like the simplest to do at this point.
I suspect volblocksize=512 leads to metadata inflation, but a factor of 4 for metadata (in case the zvol has been written with non-zero data completely once) seems a bit excessive to me.
My guess is that this is caused by the large number of indirect blocks required to manage a 512b ZVOL. But I'd need to dig in to it to say more for certain.
My guess is that this is caused by the large number of indirect blocks req
"indirect blocks req"?
Sorry about that, I accidentally hit commit mid sentence. I've updated the comment.
I just hit this on current HEAD without additional patches using the 9999 gentoo ebuild and 4k volblocksize This is how it looks after overwriting it with 16GB of 50% random 50% zero data chunks using fio
NAME PROPERTY VALUE SOURCE
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new type volume -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new creation fre mar 13 1:38 2015 -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new used 26.0G -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new available 2.41T -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new referenced 26.0G -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new compressratio 2.65x -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new reservation none default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new volsize 16G local
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new volblocksize 4K -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new checksum on default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new compression lz4 local
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new readonly off default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new copies 1 default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new refreservation none default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new primarycache all default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new secondarycache all default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new usedbysnapshots 0 -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new usedbydataset 26.0G -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new usedbychildren 0 -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new usedbyrefreservation 0 -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new logbias latency default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new dedup off default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new mlslabel none default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new sync disabled local
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new refcompressratio 2.65x -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new written 26.0G -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new logicalused 16.0G -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new logicalreferenced 16.0G -
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new snapdev hidden default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new context none default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new fscontext none default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new defcontext none default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new rootcontext none default
stuff_array/servers/iscsi_images/hydrar-desktop-swap-new redundant_metadata all default
Close as stale.
If it's actual - feel free to reopen.
Hmm. I went into the same trouble. Raidz2 with 8x8TB. Physical Sector size of HDDs: 4096 bytes ZFS is version 0.7.9 from Promox VE
I created a volume with 10TB. The volume is ext4 formatted and mounted. I simple dd /dev/urandom into an single file and after a while the volumes looks like this:
pool1/test type volume -
pool1/test creation Mon Aug 6 18:26 2018 -
pool1/test used 10.6T -
pool1/test available 39.9T -
pool1/test referenced 71.9G -
pool1/test compressratio 1.00x -
pool1/test reservation none default
pool1/test volsize 10T local
pool1/test volblocksize 4K -
pool1/test checksum on default
pool1/test compression off local
pool1/test readonly off default
pool1/test createtxg 977 -
pool1/test copies 1 default
pool1/test refreservation 10.6T local
pool1/test guid 98032921592636052 -
pool1/test primarycache all default
pool1/test secondarycache all default
pool1/test usedbysnapshots 0B -
pool1/test usedbydataset 71.9G -
pool1/test usedbychildren 0B -
pool1/test usedbyrefreservation 10.6T -
pool1/test logbias latency default
pool1/test dedup off default
pool1/test mlslabel none default
pool1/test sync standard default
pool1/test refcompressratio 1.00x -
pool1/test written 71.9G -
pool1/test logicalused 33.7G -
pool1/test logicalreferenced 33.7G -
pool1/test volmode default default
pool1/test snapshot_limit none default
pool1/test snapshot_count none default
pool1/test snapdev hidden default
pool1/test context none default
pool1/test fscontext none default
pool1/test defcontext none default
pool1/test rootcontext none default
pool1/test redundant_metadata all default
Writing into a dataset however works as it should:
NAME PROPERTY VALUE SOURCE
pool1/test2 type filesystem -
pool1/test2 creation Mon Aug 6 18:31 2018 -
pool1/test2 used 22.5G -
pool1/test2 available 29.3T -
pool1/test2 referenced 22.5G -
pool1/test2 compressratio 1.00x -
pool1/test2 mounted yes -
pool1/test2 quota none default
pool1/test2 reservation none default
pool1/test2 recordsize 128K default
pool1/test2 mountpoint /pool1/test2 default
pool1/test2 sharenfs off default
pool1/test2 checksum on default
pool1/test2 compression off local
pool1/test2 atime on default
pool1/test2 devices on default
pool1/test2 exec on default
pool1/test2 setuid on default
pool1/test2 readonly off default
pool1/test2 zoned off default
pool1/test2 snapdir hidden default
pool1/test2 aclinherit restricted default
pool1/test2 createtxg 1038 -
pool1/test2 canmount on default
pool1/test2 xattr on default
pool1/test2 copies 1 default
pool1/test2 version 5 -
pool1/test2 utf8only off -
pool1/test2 normalization none -
pool1/test2 casesensitivity sensitive -
pool1/test2 vscan off default
pool1/test2 nbmand off default
pool1/test2 sharesmb off default
pool1/test2 refquota none default
pool1/test2 refreservation none default
pool1/test2 guid 3232922925337121303 -
pool1/test2 primarycache all default
pool1/test2 secondarycache all default
pool1/test2 usedbysnapshots 0B -
pool1/test2 usedbydataset 22.5G -
pool1/test2 usedbychildren 0B -
pool1/test2 usedbyrefreservation 0B -
pool1/test2 logbias latency default
pool1/test2 dedup off default
pool1/test2 mlslabel none default
pool1/test2 sync standard default
pool1/test2 dnodesize legacy default
pool1/test2 refcompressratio 1.00x -
pool1/test2 written 22.5G -
pool1/test2 logicalused 22.5G -
pool1/test2 logicalreferenced 22.5G -
pool1/test2 volmode default default
pool1/test2 filesystem_limit none default
pool1/test2 snapshot_limit none default
pool1/test2 filesystem_count none default
pool1/test2 snapshot_count none default
pool1/test2 snapdev hidden default
pool1/test2 acltype off default
pool1/test2 context none default
pool1/test2 fscontext none default
pool1/test2 defcontext none default
pool1/test2 rootcontext none default
pool1/test2 relatime off default
pool1/test2 redundant_metadata all default
pool1/test2 overlay off default
Does anybody have an idea about this?
Best regards Manfred
@mheubach please write the exact command you used to create zvol.
zfs create -V 10T -o volblocksize=4k -o compression=off pool1/test
pool has been created with this command:
zpool create -o ashift=12 -O compression=on pool1 raidz2 sda sdb sdc sdd sde sdf sdg sdh
@mheubach the accounting looks as expected, what exactly is your question?
Also, know that raidz2 with ashift=12 and volblocksize=4k is not space efficient
In the meantime I read about raidz2 space efficiency. I will play around a bit. The idea behind volblocksize=4k was to use the same blocksize with ext4. I expected some overhead but that the zvol consumes twice as much data as actually written to it came as a surprise. I will verify this against a pool with raidz instead of raidz2.
Ok. I made some tests. Pool is raidz2 again (8 disks). I have 3 ZVOLs with different volblocksizes. volblocksize=128k produces nearly no overhead but will for sure cause a lot of IO when copy on write comes into action. Smaller volblocksizes cause more overhead. The overhead is huge with volblocksize < 16k, drops immediately to about 7% with volblocksize=16, 32 or 64k and to 0 with volblocksize=128k. Is there any magic mathematics for calculation the best volblocksize depending on ashift, number of discs, raid level, ...? My use case here is archiving of already compressed data over a "slow" WAN link. So performance and zfs compression is not my concern and I will opt for smallest overhead. Anyway for me this behaviour is at least peculiar :-)
NAME VOLBLOCK LREFER REFER
pool1/test5 8K 16.7G 35.7G (213% overhead)
pool1/test2 16K 89.4G 95.6G (6,93% overhead
pool1/test3 32K 89.9G 96.0G (6,78% overhead)
pool1/test4 64K 50.1G 53.5G (6,78% overhead)
pool1/test 128K 60.7G 60.7G (0% overhead)
200% overhead is correct and predictable.
I accidentally noticed this a couple of minutes ago. I've created the zvol as a sparse volume, with volsize=15G. But the USED entry say it's 67.2GB. Which should be impossible...
on the host:
So the host thinks it only uses 2.3G, but ZFS think it's using 67G!
This is a ZVOL, shared via iSCSI (SCST) to another host, where it's connected to VBox which then 'shares' it to the host as a SATA device…
Writing zeros to a file on the host makes the USED to drop and with a little luck it's going to be ok once I've filled the disk (on the host). But there seems to be a bug/issue here somewhere…
This is kernel 3.18.0-rc3 with spl/zfs GIT master (with a bunch of other bits and pieces added - https://github.com/FransUrbo/zfs/blob/FAVORITES/README.turbo)