Open seonwoolee opened 6 years ago
Something like that happened to me at some point due to either one of 2 things.
1- Snapshots 2- Weird incompressible data or super compressible data
The problem might be due to wrong calculation being done of used blocks vs referenced blocks, I think zdb gives more details about this.
One inconsistency in ZFS is that metadata is always compressed, but not counted as part of the ratio.
Probably a duplicate of https://github.com/zfsonlinux/zfs/issues/3641 (compression ratio is incorrect if ashift != 9)
zfs get all
, zpool list -v
?Hi. Sorry, I`m not sure i may interrupt talk, but have very same issue on arch with shifted (ashift=12)
[root@big ~]# ls -lah /bigdata/test/
total 1.5K
drwxr-xr-x 2 root root 3 Mar 3 11:14 .
drwxr-xrwx 4 root root 4 Mar 3 11:13 ..
-rw-r--r-- 1 root root 4.0G Mar 3 11:14 zero.fil
[root@big ~]# df -h /bigdata/test/
Filesystem Size Used Avail Use% Mounted on
bigdata/test 139G 128K 139G 1% /bigdata/test
[root@big ~]# du -h /bigdata/test/
1.0K /bigdata/test/
>[root@big ~]# uname -a
Linux big 4.20.12-arch1-1-ARCH #1 SMP PREEMPT Sat Feb 23 15:11:34 UTC 2019 x86_64 GNU/Linux
>[root@big ~]# pacman -Qs zfs
local/spl-linux 0.7.12_4.20.12.arch1.1-1 (archzfs-linux)
Solaris Porting Layer kernel modules.
local/zfs-linux 0.7.12_4.20.12.arch1.1-1 (archzfs-linux)
Kernel modules for the Zettabyte File System.
local/zfs-linux-headers 0.7.12_4.20.12.arch1.1-1
Kernel headers for the Zettabyte File System.
local/zfs-utils 0.7.12-1
Userspace utilities for the Zettabyte File System.
> [root@big ~]# zpool list -v
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
bigdata 1,98T 1,78T 208G - 2% 89% 1.00x ONLINE -
sda1 1,98T 1,78T 208G - 2% 89%
>[root@big ~]# zfs get all
NAME PROPERTY VALUE SOURCE
bigdata type filesystem -
bigdata creation Sat Mar 2 14:16 2019 -
bigdata used 1.79T -
bigdata available 139G -
bigdata referenced 96K -
bigdata compressratio 1.00x -
bigdata mounted yes -
bigdata quota none default
bigdata reservation none default
bigdata recordsize 128K default
bigdata mountpoint /bigdata default
bigdata sharenfs off default
bigdata checksum on default
bigdata compression lz4 local
bigdata atime on local
bigdata devices on default
bigdata exec on default
bigdata setuid on default
bigdata readonly off default
bigdata zoned off default
bigdata snapdir hidden default
bigdata aclinherit restricted default
bigdata createtxg 1 -
bigdata canmount on default
bigdata xattr on default
bigdata copies 1 default
bigdata version 5 -
bigdata utf8only off -
bigdata normalization none -
bigdata casesensitivity sensitive -
bigdata vscan off default
bigdata nbmand off default
bigdata sharesmb off default
bigdata refquota none default
bigdata refreservation none default
bigdata guid 5773117973477919529 -
bigdata primarycache all default
bigdata secondarycache all default
bigdata usedbysnapshots 0B -
bigdata usedbydataset 96K -
bigdata usedbychildren 1.79T -
bigdata usedbyrefreservation 0B -
bigdata logbias latency default
bigdata dedup off default
bigdata mlslabel none default
bigdata sync standard default
bigdata dnodesize legacy default
bigdata refcompressratio 1.00x -
bigdata written 96K -
bigdata logicalused 1.80T -
bigdata logicalreferenced 40K -
bigdata volmode default default
bigdata filesystem_limit none default
bigdata snapshot_limit none default
bigdata filesystem_count none default
bigdata snapshot_count none default
bigdata snapdev hidden default
bigdata acltype off default
bigdata context none default
bigdata fscontext none default
bigdata defcontext none default
bigdata rootcontext none default
bigdata relatime on local
bigdata redundant_metadata all default
bigdata overlay off default
bigdata/fs1 type filesystem -
bigdata/fs1 creation Sat Mar 2 14:34 2019 -
bigdata/fs1 used 1.79T -
bigdata/fs1 available 139G -
bigdata/fs1 referenced 1.79T -
bigdata/fs1 compressratio 1.00x -
bigdata/fs1 mounted yes -
bigdata/fs1 quota none default
bigdata/fs1 reservation none default
bigdata/fs1 recordsize 128K default
bigdata/fs1 mountpoint /bigdata/fs1 default
bigdata/fs1 sharenfs off default
bigdata/fs1 checksum on default
bigdata/fs1 compression lz4 inherited from bigdata
bigdata/fs1 atime on inherited from bigdata
bigdata/fs1 devices on default
bigdata/fs1 exec on default
bigdata/fs1 setuid on default
bigdata/fs1 readonly off default
bigdata/fs1 zoned off default
bigdata/fs1 snapdir hidden default
bigdata/fs1 aclinherit restricted default
bigdata/fs1 createtxg 222 -
bigdata/fs1 canmount on default
bigdata/fs1 xattr on default
bigdata/fs1 copies 1 default
bigdata/fs1 version 5 -
bigdata/fs1 utf8only off -
bigdata/fs1 normalization none -
bigdata/fs1 casesensitivity sensitive -
bigdata/fs1 vscan off default
bigdata/fs1 nbmand off default
bigdata/fs1 sharesmb off default
bigdata/fs1 refquota none default
bigdata/fs1 refreservation none default
bigdata/fs1 guid 14545282741868106458 -
bigdata/fs1 primarycache all default
bigdata/fs1 secondarycache all default
bigdata/fs1 usedbysnapshots 0B -
bigdata/fs1 usedbydataset 1.79T -
bigdata/fs1 usedbychildren 0B -
bigdata/fs1 usedbyrefreservation 0B -
bigdata/fs1 logbias latency default
bigdata/fs1 dedup off default
bigdata/fs1 mlslabel none default
bigdata/fs1 sync standard default
bigdata/fs1 dnodesize legacy default
bigdata/fs1 refcompressratio 1.00x -
bigdata/fs1 written 1.79T -
bigdata/fs1 logicalused 1.80T -
bigdata/fs1 logicalreferenced 1.80T -
bigdata/fs1 volmode default default
bigdata/fs1 filesystem_limit none default
bigdata/fs1 snapshot_limit none default
bigdata/fs1 filesystem_count none default
bigdata/fs1 snapshot_count none default
bigdata/fs1 snapdev hidden default
bigdata/fs1 acltype off default
bigdata/fs1 context none default
bigdata/fs1 fscontext none default
bigdata/fs1 defcontext none default
bigdata/fs1 rootcontext none default
bigdata/fs1 relatime on inherited from bigdata
bigdata/fs1 redundant_metadata all default
bigdata/fs1 overlay off default
bigdata/test type filesystem -
bigdata/test creation Sun Mar 3 11:13 2019 -
bigdata/test used 96K -
bigdata/test available 139G -
bigdata/test referenced 96K -
bigdata/test compressratio 1.00x -
bigdata/test mounted yes -
bigdata/test quota none default
bigdata/test reservation none default
bigdata/test recordsize 128K default
bigdata/test mountpoint /bigdata/test default
bigdata/test sharenfs off default
bigdata/test checksum on default
bigdata/test compression lz4 inherited from bigdata
bigdata/test atime on inherited from bigdata
bigdata/test devices on default
bigdata/test exec on default
bigdata/test setuid on default
bigdata/test readonly off default
bigdata/test zoned off default
bigdata/test snapdir hidden default
bigdata/test aclinherit restricted default
bigdata/test createtxg 35464 -
bigdata/test canmount on default
bigdata/test xattr on default
bigdata/test copies 1 default
bigdata/test version 5 -
bigdata/test utf8only off -
bigdata/test normalization none -
bigdata/test casesensitivity sensitive -
bigdata/test vscan off default
bigdata/test nbmand off default
bigdata/test sharesmb off default
bigdata/test refquota none default
bigdata/test refreservation none default
bigdata/test guid 5721573719522729350 -
bigdata/test primarycache all default
bigdata/test secondarycache all default
bigdata/test usedbysnapshots 0B -
bigdata/test usedbydataset 96K -
bigdata/test usedbychildren 0B -
bigdata/test usedbyrefreservation 0B -
bigdata/test logbias latency default
bigdata/test dedup off default
bigdata/test mlslabel none default
bigdata/test sync standard default
bigdata/test dnodesize legacy default
bigdata/test refcompressratio 1.00x -
bigdata/test written 96K -
bigdata/test logicalused 40K -
bigdata/test logicalreferenced 40K -
bigdata/test volmode default default
bigdata/test filesystem_limit none default
bigdata/test snapshot_limit none default
bigdata/test filesystem_count none default
bigdata/test snapshot_count none default
bigdata/test snapdev hidden default
bigdata/test acltype off default
bigdata/test context none default
bigdata/test fscontext none default
bigdata/test defcontext none default
bigdata/test rootcontext none default
bigdata/test relatime on inherited from bigdata
bigdata/test redundant_metadata all default
bigdata/test overlay off default
[root@big ~]#
And what`s non-defaults (used https://wiki.archlinux.org/index.php/ZFS#Database)
seems that`s all
thanks
See also https://github.com/zfsonlinux/zfs/issues/8462, which may be the cause of this, depending on how the data was created.
Sorry for the late response
[seonwoo@seonwoo-nas ~]$ zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
master 21.8T 15.3T 6.48T - - 7% 70% 1.00x ONLINE -
raidz2 21.8T 15.3T 6.48T - - 7% 70.2% - ONLINE
ata-TOSHIBA_HDWQ140_2898K1IBFPBE - - - - - - - - ONLINE
wwn-0x50014ee20f0fab61 - - - - - - - - ONLINE
wwn-0x50014ee2b9baa063 - - - - - - - - ONLINE
wwn-0x50014ee264651ffa - - - - - - - - ONLINE
wwn-0x50014ee264651abd - - - - - - - - ONLINE
wwn-0x50014ee264b0b3cb - - - - - - - - ONLINE
[seonwoo@seonwoo-nas ~]$ zfs get all master
NAME PROPERTY VALUE SOURCE
master type filesystem -
master creation Fri Jul 6 13:01 2018 -
master used 10.2T -
master available 3.86T -
master referenced 6.27M -
master compressratio 1.04x -
master mounted yes -
master quota none default
master reservation none default
master recordsize 1M local
master mountpoint /mnt/master local
master sharenfs rw=@192.168.1.0/24 local
master checksum on default
master compression lz4 received
master atime off received
master devices on default
master exec on default
master setuid on default
master readonly off default
master zoned off default
master snapdir hidden received
master aclinherit restricted default
master createtxg 1 -
master canmount on default
master xattr sa received
master copies 1 default
master version 5 -
master utf8only off -
master normalization none -
master casesensitivity sensitive -
master vscan off default
master nbmand off default
master sharesmb off local
master refquota none default
master refreservation none default
master guid 17594656326077232978 -
master primarycache all default
master secondarycache all default
master usedbysnapshots 0B -
master usedbydataset 6.27M -
master usedbychildren 10.2T -
master usedbyrefreservation 0B -
master logbias latency default
master objsetid 54 -
master dedup off default
master mlslabel none default
master sync standard default
master dnodesize legacy default
master refcompressratio 1.16x -
master written 6.27M -
master logicalused 10.6T -
master logicalreferenced 1.12M -
master volmode default default
master filesystem_limit none default
master snapshot_limit none default
master filesystem_count none default
master snapshot_count none default
master snapdev hidden default
master acltype posixacl received
master context none default
master fscontext none default
master defcontext none default
master rootcontext none default
master relatime off default
master redundant_metadata all default
master overlay off default
master encryption off default
master keylocation none default
master keyformat none default
master pbkdf2iters 0 default
master special_small_blocks 0 default
[seonwoo@seonwoo-nas ~]$ sudo zdb master
Cached configuration:
version: 5000
name: 'master'
state: 0
txg: 7351461
pool_guid: 5334150587146154780
errata: 0
hostname: 'seonwoo-nas'
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 5334150587146154780
create_txg: 4
children[0]:
type: 'raidz'
id: 0
guid: 1269166408861884573
nparity: 2
metaslab_array: 256
metaslab_shift: 37
ashift: 12
asize: 24004625694720
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 129
children[0]:
type: 'disk'
id: 0
guid: 12177181254910515140
path: '/dev/disk/by-id/ata-TOSHIBA_HDWQ140_2898K1IBFPBE-part1'
devid: 'ata-TOSHIBA_HDWQ140_2898K1IBFPBE-part1'
phys_path: 'pci-0000:00:17.0-ata-6'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 130
children[1]:
type: 'disk'
id: 1
guid: 16243617482609432507
path: '/dev/disk/by-id/wwn-0x50014ee20f0fab61-part1'
devid: 'usb-WD_My_Book_25EE_574343374B3041483130454B-0:0-part1'
phys_path: 'pci-0000:00:14.0-usb-0:5:1.0-scsi-0:0:0:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 131
children[2]:
type: 'disk'
id: 2
guid: 4932413843320470326
path: '/dev/disk/by-id/wwn-0x50014ee2b9baa063-part1'
devid: 'usb-WD_My_Book_25EE_574343374B32415359384458-0:0-part1'
phys_path: 'pci-0000:00:14.0-usb-0:4:1.0-scsi-0:0:0:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 132
children[3]:
type: 'disk'
id: 3
guid: 2471234805693787932
path: '/dev/disk/by-id/wwn-0x50014ee264651ffa-part1'
devid: 'usb-WD_My_Book_25EE_574343374B34415254353655-0:0-part1'
phys_path: 'pci-0000:00:14.0-usb-0:1:1.0-scsi-0:0:0:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 133
children[4]:
type: 'disk'
id: 4
guid: 11109201892835897328
path: '/dev/disk/by-id/wwn-0x50014ee264651abd-part1'
devid: 'usb-WD_My_Book_25EE_574343374B344152544E594C-0:0-part1'
phys_path: 'pci-0000:00:14.0-usb-0:3:1.0-scsi-0:0:0:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 134
children[5]:
type: 'disk'
id: 5
guid: 13835483519592227450
path: '/dev/disk/by-id/wwn-0x50014ee264b0b3cb-part1'
devid: 'usb-WD_My_Book_25EE_574343374B3745444658345A-0:0-part1'
phys_path: 'pci-0000:00:14.0-usb-0:6:1.0-scsi-0:0:0:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 135
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
I thought this behavior was expected. The compressratio
appears to be based off psize without accounting for padding. For example, if you're using ashift=12, the smallest possible block size on disk is 4K but a block might compress to 1K or 512B. The overhead is particularly bad with small blocks on raidz pools.
Moreover, the used
space calculation assumes 128k blocks. If you use 1M blocks which are more efficient, you'll see that used
is smaller than logicalused
even on incompressible data or on datasets withcompression=off
. The difference roughly matches the expected difference in overhead between 128K and 1M blocks on an ashift=12 pool with my pool geometry. See RAID-Z parity cost
As an example, here is the same dataset on three different pools. The average psize size is around 15K after compression but asize is 16.5K. The first is a mirror, the second is a pool that consists of 2 12-disk raidz2 vdevs, and the third is a pool with a single 10-disk raidz2 vdev. All use ashift=12. The logical dataset size is not identical because the pool is active and the snapshots on the backup pools were taken at different times today. Still, the difference in effective compression is obvious.
Mirror:
rpool-server used 99.8G -
rpool-server logicalused 146G -
rpool-server compressratio 1.64x -
24 disk, 2 raidz-2 vdevs:
data/zsimplesnap/192.168.1.200/rpool-server used 132G -
data/zsimplesnap/192.168.1.200/rpool-server logicalused 142G -
data/zsimplesnap/192.168.1.200/rpool-server compressratio 1.63x -
10 disks, single raidz2 vdev
bigbackup/simplesnap/10.0.0.1/rpool-server used 122G -
bigbackup/simplesnap/10.0.0.1/rpool-server logicalused 145G -
bigbackup/simplesnap/10.0.0.1/rpool-server compressratio 1.64x -
Stats for rpool-server:
bp count: 6238210
ganged count: 1754
bp logical: 179034959872 avg: 28699
bp physical: 95566884352 avg: 15319 compression: 1.87
bp allocated: 102940639232 avg: 16501 compression: 1.74
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 102940639232 used: 12.82%
Here's the logical and used size of an incompressible video file utilizing 1M blocks. Note that dsize
is smaller than logicalsize
- This ratio is consistent and matches the expected overhead differential between 1M and 128K blocks on this pool geometry.
Dataset data/Movies [ZPL], ID 2725, cr_txg 8330397, 15.5T, 24861 objects
Object lvl iblk dblk dsize dnsize lsize %full type
29241 3 128K 1M 51.7G 512 53.7G 100.00 ZFS plain file
Guyz, thanks for your ansewers, #8462 is about snapshots, i didnt use snapshots. I also create second pool on standart disk, without ashift, and on it also got wrong compressratio... (1) on test dataset with one file with only zeroes...
maybe I-m doing something wrong, anyway it`s close to topic scenario (LZ4 compression) Thanks
If you don’t specify an ashift value, ZFS will use the reported physical block size of the disks in the pool. Most hard drives larger than 4TB now use a physical block size of 4K, and SSDs generally report 4K even if they use a larger physical block size as they are designed to perform well with 4K reads. In this case, ZFS would use ashift=12. If you have native 512 byte disks, it will use ashift=9. Setting ashift manually is recommended as some disks inaccurately report 512 byte native blocks when they use 4K blocks.
Also, a string of zeroes isn’t compressed when compression is enabled. It is stored as a hole.
I thought this behavior was expected. The
compressratio
appears to be based off psize without accounting for padding. For example, if you're using ashift=12, the smallest possible block size on disk is 4K but a block might compress to 1K or 512B. The overhead is particularly bad with small blocks on raidz pools.You might think it's expected behavior, but at the end of the day, as an end user of ZFS, it doesn't make sense to me for
compressratio
to report a value that doesn't equate toused/logicalused
.
Whoops. Because account metadata (and maybe some other things, I don't understand the inner workings of ZFS as well as I would like) is included in used
. Duh.
The intent of compressratio
is to express the benefit received by enabling compression. This is different than used / logicalused
- if you prefer that metric you may use it and ignore compressratio
.
The difficulty of compressratio
is with its interaction with RAIDZ, which needs to allocate additional space for parity and padding. The compressratio
currently ignores this RAIDZ space, which can be significant. Using the spreadsheet data in https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz, we can see that with 6-wide RAIDZ2, ashift=12, the allocated size is the following, in 4K sectors:
lsize or psize | asize |
---|---|
12 | 18 |
11 | 18 |
10 | 18 |
9 | 15 |
8 | 12 |
7 | 12 |
6 | 12 |
5 | 9 |
4 | 6 |
3 | 6 |
2 | 6 |
1 | 3 |
So for example, if you have a 8-sector (32K) logical block which compresses to 6 sectors (24K), we will allocate 12 sectors (48K) whether we used compression or not. So we should say that this block's compression ratio is 12/12=1.0x, but since the ratio ignores RAIDZ, it says the ratio is 8/6=1.33x.
Not that the compressratio does take into account ashift padding (with the exception of compressed receive - see https://github.com/zfsonlinux/zfs/issues/8462), so if there is no RAIDZ, and a 32K logical block compresses to 29K, ZFS realizes that it actually still needs 32K on disk and reports a ratio of 1.0x.
Also, a string of zeroes isn’t compressed when compression is enabled. It is stored as a hole
oh, will knew. So what`s best content to test compressratio?
The difficulty of compressratio is with its interaction with RAIDZ, which needs to allocate additional space for parity and padding
as for me, i have a simple volume, not raidz. It`s just my first use of zfs (and also first issue I comment on github)
anyway for me it`s a game to try new f/s, so I not really worried about that issue, just trying to help if my issue will be helpful for fixing some wrong states.
will see comparison beetween used/logicalused and maybe it`ll be solution for me.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
Oh wow it is now 90 days, great!
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I've been saying for a long time that the compression should be benefiting the machine administrators rather than end users - IMHO the quota should cap data before compression, then the tools such as du
/df
could just display the logical sizes and most of such questions/misunderstandings would be gone (we're getting such pretty regularly in our setup). I get that it would be a huge departure from how things were being done and it'd probably be a breaking change for a lot of setups, but the piece of mind into the future, after we get over those initial issues... one can only dream, huh? :D
@snajpa Similar to logicalreferenced and logicaused I am thinking about logicalquota, but I haven't looked there recently and can't say why it is not there. Though it would bloat this already complicated area even more.
@amotin it's a good idea though
System information
Describe the problem you're observing
compressratio isn't correct for some of my datasets. EDIT: I'm using lz4 compression.
For example, master/Root-Banana-Pi is logically using 5.06GB and is actually using 4.13G. That should be a compressratio of 1.23x, but it is reporting 1.54x. The compressratio is also incorrect for master/Root-Desktop, master/Root-NAS, master/Root-Toshiba, and master/Root-Vultr. The strangest one is master/Root-Vultr which is logically using 1.28G, actually using 1.38G (so compression is actually causing it to take more space - not sure how that's happening), but it's reporting a compressratio of 2.01x!
It is notable that the compressratio is only wrong for the master/Root-* datasets. These are rsync backups for the root filesystem for my different computers (made daily), except for Root-Toshiba, which is done via zfs send since it runs ZFS as the root filesystem. They all run Arch Linux (the Banana Pi runs the ARM version). There might be something about that kind of data that causes the compressratio to be incorrect?
Describe how to reproduce the problem
Unsure. These datasets were built up over time.
Include any warning/errors/backtraces from the system logs
Not sure which if any system logs would be relevant.