kilobyte / compsize

btrfs: find compression type/ratio on a file or set of files
Other
344 stars 23 forks source link

Is the extent number accurate for telling file's fragmentation for compressed files? #47

Closed asyncth closed 2 years ago

asyncth commented 2 years ago

There is an entry in the FAQ list of Fedora's Btrfs compression initiative which says that when using filefrag on a compressed file, some of the reported extents are potentially in reality contiguous on the disk, which means that filefrag is not a reliable way to tell file's fragmentation.

I've noticed that compsize also reports much less extents for uncompressed files than for compressed files, which means that compsize is likely also affected by the issue above, but I just wanted to confirm anyway.

kilobyte commented 2 years ago

compsize reports the count of extents, be they contiguous on the disk or not.

The data returned by BTRFS_IOC_TREE_SEARCH_V2 includes information about how the extents are placed inside btrfs' linear address space, it would be possible to count and display that. The linear address space doesn't reflect the physical placement even on a single disk, but as block group sizes are in the gigabytes, the inaccuracy is ok for most purposes.

kilobyte commented 2 years ago

Hmm, the data doesn't include enough information for inline extents. They tend to be the only extent in a file, though.

kilobyte commented 2 years ago

If you want frag count to be added, please tell me:

Zygo commented 2 years ago

Inline extents are always separate fragments. They are stored in metadata blocks so they can never be contiguous with any data block. It is possible for a file to have an inline extent and regular extents, but there must be a hole in between of at least one byte (if there isn't, then it gets written as a normal extent).

Why would a file in one piece count as 2 fragments?

kilobyte commented 2 years ago

Duh, I meant whether a file in two pieces counts as 1 or 2. Ie, a count of discontiguous blocks vs a measure of unoptimal fragmentation only.

kilobyte commented 2 years ago

(that's mostly asking what colour to paint a bike shed, but I can't decide 😉)

Zygo commented 2 years ago

OK, two pieces is a more sensible question ;)

[edit to match the behavior of filefrag discovered below]

Two extent references A and B are contiguous if the logical end of A is the same as the logical start of B, and the physical end of A is the same as the physical start of B. If there's a hole between A and B, then use the hole's logical end instead of A's logical end (i.e. skip over the hole without breaking contiguity).

So these are discontiguous:

That would most closely match filefrag behavior, except it would account for the compressed and uncompressed length of a compressed extent when calculating contiguity.

Zygo commented 2 years ago

There are some corner cases, like what if extent B is overwriting part of extent A in the middle? That could look like:

In that case I'd count it as 3 fragments (and so does filefrag) since a naive sequential read would seek 3 times as it reads the first part of A, all of extent B, then the second part of A. Note that compsize would count that as 2 extents and 3 refs.

kilobyte commented 2 years ago

I'd count a hole as contiguous — anyone reading the file would request extent B immediately after A, as far as disk head movements go.

Zygo commented 2 years ago

true, but filefrag doesn't count them that way.

Zygo commented 2 years ago

welp, I'm wrong about that...

# > test ; dd bs=4k seek=10 count=8 if=/boot/vmlinuz conv=notrunc of=test; dd bs=4k seek=100 count=8 if=/boot/vmlinuz conv=notrunc of=test; sync -f .; filefrag -v test
8+0 records in
8+0 records out
32768 bytes (33 kB, 32 KiB) copied, 0.000127871 s, 256 MB/s
8+0 records in
8+0 records out
32768 bytes (33 kB, 32 KiB) copied, 7.809e-05 s, 420 MB/s
Filesystem type is: 9123683e
File size of test is 442368 (108 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:       10..      17: 1924070403..1924070410:      8:         10:
   1:      100..     107: 1924070411..1924070418:      8:             last,eof
test: 1 extent found
Zygo commented 2 years ago

The extent total counts the physical blocks on both sides of the hole as one extent for the total number of extents, but it writes two separate extent records in the -v output.

asyncth commented 2 years ago

If you want frag count to be added

Not really, just wanted to confirm if compsize can be used to tell compressed file's fragmentation. Definitely wouldn't mind it though.

asyncth commented 2 years ago

Ran the latest commit on a freshly defragmented file, the number of fragments seems to match the number of extents, does this mean that they're not contiguous?

kilobyte commented 2 years ago

In this particular case, it only means that I'm an idiot :/ Please pull for a less buggy version.

It's still an initial stab that doesn't understand partial extents.

asyncth commented 2 years ago

Pulled, it works, but shows that a file has about 7x more fragments on SSD than a file with the exact same contents on HDD, even after defragging both of them (the file was copied from HDD to SSD). Probably not a bug, but some kind of behavior that Btrfs has?

asyncth commented 2 years ago

I guess this can be closed now?