kilobyte / compsize

btrfs: find compression type/ratio on a file or set of files
Other
343 stars 23 forks source link

compsize shows 1,6gb uncompressed despite compress-force=lzo is set on the btrfs fs #24

Closed devZer0 closed 5 years ago

devZer0 commented 5 years ago

being curious - how can compsize show "none compressed" data of 1,6G when btrfs is mounted with compress-force=lzo ?

https://btrfs.wiki.kernel.org/index.php/Compression#Can_I_force_compression_on_a_file_without_using_the_compress_mount_option.3F

"There is a simple decision logic: if the first portion of data being compressed is not smaller than the original, the compression of the file is disabled -- unless the filesystem is mounted with -o compress-force. In that case it'll be compressed always regardless of the compressibility of the file. This is not optimal and subject to optimizations and further development. "

# mount|grep btrfs
/dev/sdb1 on /btrfspool type btrfs (rw,relatime,compress-force=lzo,nossd,space_cache,subvolid=5,subvol=/,_netdev)

# compsize /btrfspool/backup/adminstation
Processed 164408 files, 167122 regular extents (167122 refs), 90204 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       66%      4.1G         6.2G         6.2G
none       100%      1.6G         1.6G         1.6G
lzo         54%      2.5G         4.5G         4.5G
kilobyte commented 5 years ago

The compression is always tried, but is not actually written to the disk unless it saves at least a page. Otherwise, any incompressible extent would take more than without compression: any algorithm that can possibly reduce a block will also make some blocks larger, by at least a single bit. One extra bit → one extra byte → one extra page.

As you can seem, most of your files get compressed (4.5G vs 1.6G), which looks plausible and ok.

You can run compsize on individual files to see if they indeed look incompressible.

devZer0 commented 5 years ago

yes, but if there is automatic detection if a file should be compressed or not - what's the purpose of "compress-force" then when it doesn't work. at least kernel.org website is providing misleading information then...

kilobyte commented 5 years ago

Without -force, once the file is marked as incompressible, any further writes won't even try compression. And yeah, the docs are somewhat confusing.

devZer0 commented 5 years ago

i can't follow - what is the purpose of compress-force then when it does not force compression ? what makes it different from compress= option ?

compress-force= control BTRFS file data compression. Type may be specified as "zlib" "lzo" or "no" (for no compression, used for remounting). If no type is specified, zlib is used. If compress-force is specified, all files will be compressed, whether or not they compress well. If compression is enabled, nodatacow and nodatasum are disabled.

kilobyte commented 5 years ago

The difference is that -force removes the heurestic that if the file's first extent is incompressible, no further compression is even attempted (be it for subsequent extents or rewrites).

But, any individual extent that fails compression is still written as uncompressed instead of keeping the failed compressed version. Ie, it takes 128KB rather than 128KB+1B which would be stored as 132KB.

Try this:

dd if=/dev/urandom of=foo bs=1024 count=128
xz -9 <foo >foo.xz
gzip -9 <foo >foo.gz
lzop <foo >foo.lzo
ls -l foo*

in every case the file's length increases. Storing it compressed would take more space and pointlessly waste CPU cycles upon reading.

In other words, compress-force forces compressing (to see if it helps) but not storing.

devZer0 commented 5 years ago

then documentation is misleading and compsize told the truth... will file an issue there...

florensie commented 2 years ago

Is it possible to show uncompressed and uncompressable (attempted) usage separately? Does the filesystem keep that sort of info?

kilobyte commented 2 years ago

Even if such info is kept, it is not exposed to userspace.

Zygo commented 2 years ago

The info is not kept, but there are clues left behind.

Failing compression with compress-force often produces uncompressed extents with length exactly 512K. A larger size of extent indicates compression was not attempted at all.

When heuristics indicated compressibility but compression was not successful, btrfs will set the NOCOMPR flag, which is visible in lsattr as the m attribute.