openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.72k stars 1.76k forks source link

Reported compressratio is incorrect #7639

Open seonwoolee opened 6 years ago

seonwoolee commented 6 years ago

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version N/A, rolling release
Linux Kernel 4.16.13-2-ARCH
Architecture x86_64
ZFS Version 0.7.0-1433_g1fac63e56
SPL Version 0.7.0-1433_g1fac63e56

Describe the problem you're observing

compressratio isn't correct for some of my datasets. EDIT: I'm using lz4 compression.

zfs get used,logicalused,compressratio -t filesystem
master                       used           8.53T  -
master                       logicalused    8.69T  -
master                       compressratio  1.02x  -
master/My Documents          used           226G   -
master/My Documents          logicalused    236G   -
master/My Documents          compressratio  1.05x  -
master/My Documents-Toshiba  used           240K   -
master/My Documents-Toshiba  logicalused    54K    -
master/My Documents-Toshiba  compressratio  1.00x  -
master/My Documents/Photo    used           128G   -
master/My Documents/Photo    logicalused    135G   -
master/My Documents/Photo    compressratio  1.05x  -
master/Photo RAW             used           419G   -
master/Photo RAW             logicalused    430G   -
master/Photo RAW             compressratio  1.02x  -
master/Root-Banana-Pi        used           4.13G  -
master/Root-Banana-Pi        logicalused    5.06G  -
master/Root-Banana-Pi        compressratio  1.54x  -
master/Root-Desktop          used           64.7G  -
master/Root-Desktop          logicalused    75.6G  -
master/Root-Desktop          compressratio  1.58x  -
master/Root-NAS              used           53.6G  -
master/Root-NAS              logicalused    65.8G  -
master/Root-NAS              compressratio  1.74x  -
master/Root-Toshiba          used           6.98G  -
master/Root-Toshiba          logicalused    8.32G  -
master/Root-Toshiba          compressratio  1.70x  -
master/Root-Toshiba/home     used           112M   -
master/Root-Toshiba/home     logicalused    117M   -
master/Root-Toshiba/home     compressratio  1.13x  -
master/Root-Vultr            used           1.38G  -
master/Root-Vultr            logicalused    1.28G  -
master/Root-Vultr            compressratio  2.01x  -
master/TV                    used           4.16T  -
master/TV                    logicalused    4.23T  -
master/TV                    compressratio  1.01x  -

For example, master/Root-Banana-Pi is logically using 5.06GB and is actually using 4.13G. That should be a compressratio of 1.23x, but it is reporting 1.54x. The compressratio is also incorrect for master/Root-Desktop, master/Root-NAS, master/Root-Toshiba, and master/Root-Vultr. The strangest one is master/Root-Vultr which is logically using 1.28G, actually using 1.38G (so compression is actually causing it to take more space - not sure how that's happening), but it's reporting a compressratio of 2.01x!

It is notable that the compressratio is only wrong for the master/Root-* datasets. These are rsync backups for the root filesystem for my different computers (made daily), except for Root-Toshiba, which is done via zfs send since it runs ZFS as the root filesystem. They all run Arch Linux (the Banana Pi runs the ARM version). There might be something about that kind of data that causes the compressratio to be incorrect?

Describe how to reproduce the problem

Unsure. These datasets were built up over time.

Include any warning/errors/backtraces from the system logs

Not sure which if any system logs would be relevant.

msLinuxNinja commented 6 years ago

Something like that happened to me at some point due to either one of 2 things.

1- Snapshots 2- Weird incompressible data or super compressible data

The problem might be due to wrong calculation being done of used blocks vs referenced blocks, I think zdb gives more details about this.

DeHackEd commented 6 years ago

One inconsistency in ZFS is that metadata is always compressed, but not counted as part of the ratio.

ahrens commented 5 years ago

Probably a duplicate of https://github.com/zfsonlinux/zfs/issues/3641 (compression ratio is incorrect if ashift != 9)

ahrens commented 5 years ago

3641 was actually fixed before 0.7.0, so it can't be the same as that. @seonwoolee can you share some more data about these filesystems? E.g. zfs get all, zpool list -v?

iam468 commented 5 years ago

Hi. Sorry, I`m not sure i may interrupt talk, but have very same issue on arch with shifted (ashift=12)

[root@big ~]# ls -lah /bigdata/test/
total 1.5K
drwxr-xr-x 2 root root    3 Mar  3 11:14 .
drwxr-xrwx 4 root root    4 Mar  3 11:13 ..
-rw-r--r-- 1 root root 4.0G Mar  3 11:14 zero.fil
[root@big ~]# df -h /bigdata/test/
Filesystem      Size  Used Avail Use% Mounted on
bigdata/test    139G  128K  139G   1% /bigdata/test
[root@big ~]# du -h /bigdata/test/
1.0K    /bigdata/test/

>[root@big ~]# uname -a
Linux big 4.20.12-arch1-1-ARCH #1 SMP PREEMPT Sat Feb 23 15:11:34 UTC 2019 x86_64 GNU/Linux

>[root@big ~]# pacman -Qs zfs
local/spl-linux 0.7.12_4.20.12.arch1.1-1 (archzfs-linux)
    Solaris Porting Layer kernel modules.
local/zfs-linux 0.7.12_4.20.12.arch1.1-1 (archzfs-linux)
    Kernel modules for the Zettabyte File System.
local/zfs-linux-headers 0.7.12_4.20.12.arch1.1-1
    Kernel headers for the Zettabyte File System.
local/zfs-utils 0.7.12-1
    Userspace utilities for the Zettabyte File System.

> [root@big ~]# zpool list -v
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
bigdata  1,98T  1,78T   208G         -     2%    89%  1.00x  ONLINE  -
  sda1  1,98T  1,78T   208G         -     2%    89%

>[root@big ~]# zfs get all
NAME          PROPERTY              VALUE                  SOURCE
bigdata       type                  filesystem             -
bigdata       creation              Sat Mar  2 14:16 2019  -
bigdata       used                  1.79T                  -
bigdata       available             139G                   -
bigdata       referenced            96K                    -
bigdata       compressratio         1.00x                  -
bigdata       mounted               yes                    -
bigdata       quota                 none                   default
bigdata       reservation           none                   default
bigdata       recordsize            128K                   default
bigdata       mountpoint            /bigdata               default
bigdata       sharenfs              off                    default
bigdata       checksum              on                     default
bigdata       compression           lz4                    local
bigdata       atime                 on                     local
bigdata       devices               on                     default
bigdata       exec                  on                     default
bigdata       setuid                on                     default
bigdata       readonly              off                    default
bigdata       zoned                 off                    default
bigdata       snapdir               hidden                 default
bigdata       aclinherit            restricted             default
bigdata       createtxg             1                      -
bigdata       canmount              on                     default
bigdata       xattr                 on                     default
bigdata       copies                1                      default
bigdata       version               5                      -
bigdata       utf8only              off                    -
bigdata       normalization         none                   -
bigdata       casesensitivity       sensitive              -
bigdata       vscan                 off                    default
bigdata       nbmand                off                    default
bigdata       sharesmb              off                    default
bigdata       refquota              none                   default
bigdata       refreservation        none                   default
bigdata       guid                  5773117973477919529    -
bigdata       primarycache          all                    default
bigdata       secondarycache        all                    default
bigdata       usedbysnapshots       0B                     -
bigdata       usedbydataset         96K                    -
bigdata       usedbychildren        1.79T                  -
bigdata       usedbyrefreservation  0B                     -
bigdata       logbias               latency                default
bigdata       dedup                 off                    default
bigdata       mlslabel              none                   default
bigdata       sync                  standard               default
bigdata       dnodesize             legacy                 default
bigdata       refcompressratio      1.00x                  -
bigdata       written               96K                    -
bigdata       logicalused           1.80T                  -
bigdata       logicalreferenced     40K                    -
bigdata       volmode               default                default
bigdata       filesystem_limit      none                   default
bigdata       snapshot_limit        none                   default
bigdata       filesystem_count      none                   default
bigdata       snapshot_count        none                   default
bigdata       snapdev               hidden                 default
bigdata       acltype               off                    default
bigdata       context               none                   default
bigdata       fscontext             none                   default
bigdata       defcontext            none                   default
bigdata       rootcontext           none                   default
bigdata       relatime              on                     local
bigdata       redundant_metadata    all                    default
bigdata       overlay               off                    default
bigdata/fs1   type                  filesystem             -
bigdata/fs1   creation              Sat Mar  2 14:34 2019  -
bigdata/fs1   used                  1.79T                  -
bigdata/fs1   available             139G                   -
bigdata/fs1   referenced            1.79T                  -
bigdata/fs1   compressratio         1.00x                  -
bigdata/fs1   mounted               yes                    -
bigdata/fs1   quota                 none                   default
bigdata/fs1   reservation           none                   default
bigdata/fs1   recordsize            128K                   default
bigdata/fs1   mountpoint            /bigdata/fs1           default
bigdata/fs1   sharenfs              off                    default
bigdata/fs1   checksum              on                     default
bigdata/fs1   compression           lz4                    inherited from bigdata
bigdata/fs1   atime                 on                     inherited from bigdata
bigdata/fs1   devices               on                     default
bigdata/fs1   exec                  on                     default
bigdata/fs1   setuid                on                     default
bigdata/fs1   readonly              off                    default
bigdata/fs1   zoned                 off                    default
bigdata/fs1   snapdir               hidden                 default
bigdata/fs1   aclinherit            restricted             default
bigdata/fs1   createtxg             222                    -
bigdata/fs1   canmount              on                     default
bigdata/fs1   xattr                 on                     default
bigdata/fs1   copies                1                      default
bigdata/fs1   version               5                      -
bigdata/fs1   utf8only              off                    -
bigdata/fs1   normalization         none                   -
bigdata/fs1   casesensitivity       sensitive              -
bigdata/fs1   vscan                 off                    default
bigdata/fs1   nbmand                off                    default
bigdata/fs1   sharesmb              off                    default
bigdata/fs1   refquota              none                   default
bigdata/fs1   refreservation        none                   default
bigdata/fs1   guid                  14545282741868106458   -
bigdata/fs1   primarycache          all                    default
bigdata/fs1   secondarycache        all                    default
bigdata/fs1   usedbysnapshots       0B                     -
bigdata/fs1   usedbydataset         1.79T                  -
bigdata/fs1   usedbychildren        0B                     -
bigdata/fs1   usedbyrefreservation  0B                     -
bigdata/fs1   logbias               latency                default
bigdata/fs1   dedup                 off                    default
bigdata/fs1   mlslabel              none                   default
bigdata/fs1   sync                  standard               default
bigdata/fs1   dnodesize             legacy                 default
bigdata/fs1   refcompressratio      1.00x                  -
bigdata/fs1   written               1.79T                  -
bigdata/fs1   logicalused           1.80T                  -
bigdata/fs1   logicalreferenced     1.80T                  -
bigdata/fs1   volmode               default                default
bigdata/fs1   filesystem_limit      none                   default
bigdata/fs1   snapshot_limit        none                   default
bigdata/fs1   filesystem_count      none                   default
bigdata/fs1   snapshot_count        none                   default
bigdata/fs1   snapdev               hidden                 default
bigdata/fs1   acltype               off                    default
bigdata/fs1   context               none                   default
bigdata/fs1   fscontext             none                   default
bigdata/fs1   defcontext            none                   default
bigdata/fs1   rootcontext           none                   default
bigdata/fs1   relatime              on                     inherited from bigdata
bigdata/fs1   redundant_metadata    all                    default
bigdata/fs1   overlay               off                    default
bigdata/test  type                  filesystem             -
bigdata/test  creation              Sun Mar  3 11:13 2019  -
bigdata/test  used                  96K                    -
bigdata/test  available             139G                   -
bigdata/test  referenced            96K                    -
bigdata/test  compressratio         1.00x                  -
bigdata/test  mounted               yes                    -
bigdata/test  quota                 none                   default
bigdata/test  reservation           none                   default
bigdata/test  recordsize            128K                   default
bigdata/test  mountpoint            /bigdata/test          default
bigdata/test  sharenfs              off                    default
bigdata/test  checksum              on                     default
bigdata/test  compression           lz4                    inherited from bigdata
bigdata/test  atime                 on                     inherited from bigdata
bigdata/test  devices               on                     default
bigdata/test  exec                  on                     default
bigdata/test  setuid                on                     default
bigdata/test  readonly              off                    default
bigdata/test  zoned                 off                    default
bigdata/test  snapdir               hidden                 default
bigdata/test  aclinherit            restricted             default
bigdata/test  createtxg             35464                  -
bigdata/test  canmount              on                     default
bigdata/test  xattr                 on                     default
bigdata/test  copies                1                      default
bigdata/test  version               5                      -
bigdata/test  utf8only              off                    -
bigdata/test  normalization         none                   -
bigdata/test  casesensitivity       sensitive              -
bigdata/test  vscan                 off                    default
bigdata/test  nbmand                off                    default
bigdata/test  sharesmb              off                    default
bigdata/test  refquota              none                   default
bigdata/test  refreservation        none                   default
bigdata/test  guid                  5721573719522729350    -
bigdata/test  primarycache          all                    default
bigdata/test  secondarycache        all                    default
bigdata/test  usedbysnapshots       0B                     -
bigdata/test  usedbydataset         96K                    -
bigdata/test  usedbychildren        0B                     -
bigdata/test  usedbyrefreservation  0B                     -
bigdata/test  logbias               latency                default
bigdata/test  dedup                 off                    default
bigdata/test  mlslabel              none                   default
bigdata/test  sync                  standard               default
bigdata/test  dnodesize             legacy                 default
bigdata/test  refcompressratio      1.00x                  -
bigdata/test  written               96K                    -
bigdata/test  logicalused           40K                    -
bigdata/test  logicalreferenced     40K                    -
bigdata/test  volmode               default                default
bigdata/test  filesystem_limit      none                   default
bigdata/test  snapshot_limit        none                   default
bigdata/test  filesystem_count      none                   default
bigdata/test  snapshot_count        none                   default
bigdata/test  snapdev               hidden                 default
bigdata/test  acltype               off                    default
bigdata/test  context               none                   default
bigdata/test  fscontext             none                   default
bigdata/test  defcontext            none                   default
bigdata/test  rootcontext           none                   default
bigdata/test  relatime              on                     inherited from bigdata
bigdata/test  redundant_metadata    all                    default
bigdata/test  overlay               off                    default
[root@big ~]#

And what`s non-defaults (used https://wiki.archlinux.org/index.php/ZFS#Database)

  1. , ashift=12
  2. zfs set relatime=on
  3. compression = on, lz4

seems that`s all

thanks

ahrens commented 5 years ago

See also https://github.com/zfsonlinux/zfs/issues/8462, which may be the cause of this, depending on how the data was created.

seonwoolee commented 5 years ago

Sorry for the late response

[seonwoo@seonwoo-nas ~]$ zpool list -v 
NAME                                   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
master                                21.8T  15.3T  6.48T        -         -     7%    70%  1.00x    ONLINE  -
  raidz2                              21.8T  15.3T  6.48T        -         -     7%  70.2%      -  ONLINE  
    ata-TOSHIBA_HDWQ140_2898K1IBFPBE      -      -      -        -         -      -      -      -  ONLINE  
    wwn-0x50014ee20f0fab61                -      -      -        -         -      -      -      -  ONLINE  
    wwn-0x50014ee2b9baa063                -      -      -        -         -      -      -      -  ONLINE  
    wwn-0x50014ee264651ffa                -      -      -        -         -      -      -      -  ONLINE  
    wwn-0x50014ee264651abd                -      -      -        -         -      -      -      -  ONLINE  
    wwn-0x50014ee264b0b3cb                -      -      -        -         -      -      -      -  ONLINE  
[seonwoo@seonwoo-nas ~]$ zfs get all master
NAME    PROPERTY              VALUE                  SOURCE
master  type                  filesystem             -
master  creation              Fri Jul  6 13:01 2018  -
master  used                  10.2T                  -
master  available             3.86T                  -
master  referenced            6.27M                  -
master  compressratio         1.04x                  -
master  mounted               yes                    -
master  quota                 none                   default
master  reservation           none                   default
master  recordsize            1M                     local
master  mountpoint            /mnt/master            local
master  sharenfs              rw=@192.168.1.0/24     local
master  checksum              on                     default
master  compression           lz4                    received
master  atime                 off                    received
master  devices               on                     default
master  exec                  on                     default
master  setuid                on                     default
master  readonly              off                    default
master  zoned                 off                    default
master  snapdir               hidden                 received
master  aclinherit            restricted             default
master  createtxg             1                      -
master  canmount              on                     default
master  xattr                 sa                     received
master  copies                1                      default
master  version               5                      -
master  utf8only              off                    -
master  normalization         none                   -
master  casesensitivity       sensitive              -
master  vscan                 off                    default
master  nbmand                off                    default
master  sharesmb              off                    local
master  refquota              none                   default
master  refreservation        none                   default
master  guid                  17594656326077232978   -
master  primarycache          all                    default
master  secondarycache        all                    default
master  usedbysnapshots       0B                     -
master  usedbydataset         6.27M                  -
master  usedbychildren        10.2T                  -
master  usedbyrefreservation  0B                     -
master  logbias               latency                default
master  objsetid              54                     -
master  dedup                 off                    default
master  mlslabel              none                   default
master  sync                  standard               default
master  dnodesize             legacy                 default
master  refcompressratio      1.16x                  -
master  written               6.27M                  -
master  logicalused           10.6T                  -
master  logicalreferenced     1.12M                  -
master  volmode               default                default
master  filesystem_limit      none                   default
master  snapshot_limit        none                   default
master  filesystem_count      none                   default
master  snapshot_count        none                   default
master  snapdev               hidden                 default
master  acltype               posixacl               received
master  context               none                   default
master  fscontext             none                   default
master  defcontext            none                   default
master  rootcontext           none                   default
master  relatime              off                    default
master  redundant_metadata    all                    default
master  overlay               off                    default
master  encryption            off                    default
master  keylocation           none                   default
master  keyformat             none                   default
master  pbkdf2iters           0                      default
master  special_small_blocks  0                      default
[seonwoo@seonwoo-nas ~]$ sudo zdb master

Cached configuration:
        version: 5000
        name: 'master'
        state: 0
        txg: 7351461
        pool_guid: 5334150587146154780
        errata: 0
        hostname: 'seonwoo-nas'
        com.delphix:has_per_vdev_zaps
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 5334150587146154780
            create_txg: 4
            children[0]:
                type: 'raidz'
                id: 0
                guid: 1269166408861884573
                nparity: 2
                metaslab_array: 256
                metaslab_shift: 37
                ashift: 12
                asize: 24004625694720
                is_log: 0
                create_txg: 4
                com.delphix:vdev_zap_top: 129
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 12177181254910515140
                    path: '/dev/disk/by-id/ata-TOSHIBA_HDWQ140_2898K1IBFPBE-part1'
                    devid: 'ata-TOSHIBA_HDWQ140_2898K1IBFPBE-part1'
                    phys_path: 'pci-0000:00:17.0-ata-6'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 130
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 16243617482609432507
                    path: '/dev/disk/by-id/wwn-0x50014ee20f0fab61-part1'
                    devid: 'usb-WD_My_Book_25EE_574343374B3041483130454B-0:0-part1'
                    phys_path: 'pci-0000:00:14.0-usb-0:5:1.0-scsi-0:0:0:0'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 131
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 4932413843320470326
                    path: '/dev/disk/by-id/wwn-0x50014ee2b9baa063-part1'
                    devid: 'usb-WD_My_Book_25EE_574343374B32415359384458-0:0-part1'
                    phys_path: 'pci-0000:00:14.0-usb-0:4:1.0-scsi-0:0:0:0'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 132
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 2471234805693787932
                    path: '/dev/disk/by-id/wwn-0x50014ee264651ffa-part1'
                    devid: 'usb-WD_My_Book_25EE_574343374B34415254353655-0:0-part1'
                    phys_path: 'pci-0000:00:14.0-usb-0:1:1.0-scsi-0:0:0:0'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 133
                children[4]:
                    type: 'disk'
                    id: 4
                    guid: 11109201892835897328
                    path: '/dev/disk/by-id/wwn-0x50014ee264651abd-part1'
                    devid: 'usb-WD_My_Book_25EE_574343374B344152544E594C-0:0-part1'
                    phys_path: 'pci-0000:00:14.0-usb-0:3:1.0-scsi-0:0:0:0'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 134
                children[5]:
                    type: 'disk'
                    id: 5
                    guid: 13835483519592227450
                    path: '/dev/disk/by-id/wwn-0x50014ee264b0b3cb-part1'
                    devid: 'usb-WD_My_Book_25EE_574343374B3745444658345A-0:0-part1'
                    phys_path: 'pci-0000:00:14.0-usb-0:6:1.0-scsi-0:0:0:0'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 135
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data
jwittlincohen commented 5 years ago

I thought this behavior was expected. The compressratioappears to be based off psize without accounting for padding. For example, if you're using ashift=12, the smallest possible block size on disk is 4K but a block might compress to 1K or 512B. The overhead is particularly bad with small blocks on raidz pools.

Moreover, the used space calculation assumes 128k blocks. If you use 1M blocks which are more efficient, you'll see that used is smaller than logicalused even on incompressible data or on datasets withcompression=off. The difference roughly matches the expected difference in overhead between 128K and 1M blocks on an ashift=12 pool with my pool geometry. See RAID-Z parity cost

As an example, here is the same dataset on three different pools. The average psize size is around 15K after compression but asize is 16.5K. The first is a mirror, the second is a pool that consists of 2 12-disk raidz2 vdevs, and the third is a pool with a single 10-disk raidz2 vdev. All use ashift=12. The logical dataset size is not identical because the pool is active and the snapshots on the backup pools were taken at different times today. Still, the difference in effective compression is obvious.

Mirror:

rpool-server                                                     used           99.8G  -
rpool-server                                                     logicalused    146G   -
rpool-server                                                     compressratio  1.64x  -

24 disk, 2 raidz-2 vdevs:

data/zsimplesnap/192.168.1.200/rpool-server                      used           132G   -
data/zsimplesnap/192.168.1.200/rpool-server                      logicalused    142G   -
data/zsimplesnap/192.168.1.200/rpool-server                      compressratio  1.63x  -

10 disks, single raidz2 vdev

bigbackup/simplesnap/10.0.0.1/rpool-server                           used           122G   -
bigbackup/simplesnap/10.0.0.1/rpool-server                           logicalused    145G   -
bigbackup/simplesnap/10.0.0.1/rpool-server                           compressratio  1.64x  -

Stats for rpool-server:

        bp count:         6238210
        ganged count:        1754
        bp logical:    179034959872      avg:  28699
        bp physical:   95566884352      avg:  15319     compression:   1.87
        bp allocated:  102940639232      avg:  16501     compression:   1.74
        bp deduped:             0    ref>1:      0   deduplication:   1.00
        SPA allocated: 102940639232     used: 12.82%

Here's the logical and used size of an incompressible video file utilizing 1M blocks. Note that dsizeis smaller than logicalsize - This ratio is consistent and matches the expected overhead differential between 1M and 128K blocks on this pool geometry.

Dataset data/Movies [ZPL], ID 2725, cr_txg 8330397, 15.5T, 24861 objects
    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
     29241    3   128K     1M  51.7G     512  53.7G  100.00  ZFS plain file
iam468 commented 5 years ago

Guyz, thanks for your ansewers, #8462 is about snapshots, i didnt use snapshots. I also create second pool on standart disk, without ashift, and on it also got wrong compressratio... (1) on test dataset with one file with only zeroes...

maybe I-m doing something wrong, anyway it`s close to topic scenario (LZ4 compression) Thanks

jwittlincohen commented 5 years ago

If you don’t specify an ashift value, ZFS will use the reported physical block size of the disks in the pool. Most hard drives larger than 4TB now use a physical block size of 4K, and SSDs generally report 4K even if they use a larger physical block size as they are designed to perform well with 4K reads. In this case, ZFS would use ashift=12. If you have native 512 byte disks, it will use ashift=9. Setting ashift manually is recommended as some disks inaccurately report 512 byte native blocks when they use 4K blocks.

Also, a string of zeroes isn’t compressed when compression is enabled. It is stored as a hole.

seonwoolee commented 5 years ago

I thought this behavior was expected. The compressratioappears to be based off psize without accounting for padding. For example, if you're using ashift=12, the smallest possible block size on disk is 4K but a block might compress to 1K or 512B. The overhead is particularly bad with small blocks on raidz pools.

You might think it's expected behavior, but at the end of the day, as an end user of ZFS, it doesn't make sense to me for compressratio to report a value that doesn't equate to used/logicalused.

seonwoolee commented 5 years ago

Whoops. Because account metadata (and maybe some other things, I don't understand the inner workings of ZFS as well as I would like) is included in used. Duh.

ahrens commented 5 years ago

The intent of compressratio is to express the benefit received by enabling compression. This is different than used / logicalused - if you prefer that metric you may use it and ignore compressratio.

The difficulty of compressratio is with its interaction with RAIDZ, which needs to allocate additional space for parity and padding. The compressratio currently ignores this RAIDZ space, which can be significant. Using the spreadsheet data in https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz, we can see that with 6-wide RAIDZ2, ashift=12, the allocated size is the following, in 4K sectors:

lsize or psize asize
12 18
11 18
10 18
9 15
8 12
7 12
6 12
5 9
4 6
3 6
2 6
1 3

So for example, if you have a 8-sector (32K) logical block which compresses to 6 sectors (24K), we will allocate 12 sectors (48K) whether we used compression or not. So we should say that this block's compression ratio is 12/12=1.0x, but since the ratio ignores RAIDZ, it says the ratio is 8/6=1.33x.

Not that the compressratio does take into account ashift padding (with the exception of compressed receive - see https://github.com/zfsonlinux/zfs/issues/8462), so if there is no RAIDZ, and a 32K logical block compresses to 29K, ZFS realizes that it actually still needs 32K on disk and reports a ratio of 1.0x.

iam468 commented 5 years ago

Also, a string of zeroes isn’t compressed when compression is enabled. It is stored as a hole

oh, will knew. So what`s best content to test compressratio?

The difficulty of compressratio is with its interaction with RAIDZ, which needs to allocate additional space for parity and padding

as for me, i have a simple volume, not raidz. It`s just my first use of zfs (and also first issue I comment on github)

anyway for me it`s a game to try new f/s, so I not really worried about that issue, just trying to help if my issue will be helpful for fixing some wrong states.

will see comparison beetween used/logicalused and maybe it`ll be solution for me.

stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

InfamousUser commented 3 years ago

Oh wow it is now 90 days, great!

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

snajpa commented 1 year ago

I've been saying for a long time that the compression should be benefiting the machine administrators rather than end users - IMHO the quota should cap data before compression, then the tools such as du/df could just display the logical sizes and most of such questions/misunderstandings would be gone (we're getting such pretty regularly in our setup). I get that it would be a huge departure from how things were being done and it'd probably be a breaking change for a lot of setups, but the piece of mind into the future, after we get over those initial issues... one can only dream, huh? :D

amotin commented 1 year ago

@snajpa Similar to logicalreferenced and logicaused I am thinking about logicalquota, but I haven't looked there recently and can't say why it is not there. Though it would bloat this already complicated area even more.

snajpa commented 1 year ago

@amotin it's a good idea though