SEARCH_V2 buffer is too big, slows compsize down

Zygo commented 3 years ago

I found that compsize was slower than btrfs fi du, which shouldn't happen because btrfs fi du uses FIEMAP while compsize uses TREE_SEARCH_V2.

# time btrfs fi du -s .
     Total   Exclusive  Set shared  Filename
   6.92GiB   351.20MiB     4.67GiB  .

real    0m2.414s
user    0m0.177s
sys     0m2.189s
# time compsize .
Processed 96083 files, 170199 regular extents (202723 refs), 39927 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       52%      3.6G         6.9G         7.0G
none       100%      2.2G         2.2G         2.0G
zlib        32%      1.3G         4.1G         4.5G
zstd        14%       75M         505M         478M

real    0m7.719s
user    0m0.105s
sys     0m7.532s

Some stracing found this:

ioctl(6, BTRFS_IOC_TREE_SEARCH_V2, {key={tree_id=0, min_objectid=17466447, max_objectid=17466447, ..., buf_size=16777216} => {key={nr_items=170},

This is zeroing out a 16 MB memory buffer to read one inode, and it slows compsize down to the point where even glacial FIEMAP calls can catch up.

Changing the buffer size to 64K makes it run much faster:

# time compsize-64k .
Processed 96083 files, 170199 regular extents (202723 refs), 39927 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       52%      3.6G         6.9G         7.0G
none       100%      2.2G         2.2G         2.0G
zlib        32%      1.3G         4.1G         4.5G
zstd        14%       75M         505M         478M

real    0m1.089s
user    0m0.102s
sys     0m0.950s

but the loop retry condition in do_file has to be fixed. I would suggest setting max_object = min_objectid + 1, then bail out of the loop as soon as you see an objectid > the inode you're looking at, or zero nr_items on ioctl return. Then it runs even faster:

# time compsize-64k-plus-one .
Processed 96083 files, 170199 regular extents (202723 refs), 39927 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       52%      3.6G         6.9G         7.0G
none       100%      2.2G         2.2G         2.0G
zlib        32%      1.3G         4.1G         4.5G
zstd        14%       75M         505M         478M

real    0m0.933s
user    0m0.086s
sys     0m0.800s

64K is the largest possible metadata page size. TREE_SEARCH_V2 will never need space for an item that large, so you don't have to worry about an item being too large for the buffer.

kilobyte commented 3 years ago

The first part, reducing buffer size, works even far more impressively on my test filesystem. That's pure awesomesity!

On the other hand, using max_object beyond the st_ino, makes it slower by ~11%, on the same filesystem.

kilobyte commented 3 years ago

Branch with both changes applied: https://github.com/kilobyte/compsize/tree/loop_retry

Zygo commented 3 years ago

On the other hand, using max_object beyond the st_ino, makes it slower by ~11%, on the same filesystem.

That's interesting. Any idea why?

Hmmm...maybe the other max_ fields should be set differently. It will be fetching two inodes' complete extent maps. Maybe set max_type to BTRFS_INODE_ITEM, so it stops when it reaches the next inode, but doesn't dump out its extent list?

Zygo commented 3 years ago

That would be:

min = (st_ino, BTRFS_EXTENT_DATA_KEY, 0), max = (st_ino + 1, BTRFS_INODE_ITEM_KEY, 0)

That should make the search stop as soon as it finds the next inode, without including the inode's extents (or directory entries, if the next inode is a directory). At the end of the subvol it will have to do another 0-item loop, but usually there are a lot more inodes than subvols.

kilobyte commented 3 years ago

This helped, but is still slower than continuing if previous run had >512.

Self-Perfection commented 3 years ago

Guys this is super cool! compsize run on one sample directory on my system is just 0.15s with loop_retry branch while it takes 5.5s with version from package manager. loop_retry branch allowed me to calculate which directories are actually take space!

Why is this not merged already? I suspect you'd like to get it perfect before merging. Is it possible to make release with main improvement and ponder about fine tuning afterwards?

kilobyte commented 3 years ago

@Self-Perfection: what version are you comparing with? 1,5 should have the buffer size speedup.

If the alternate retry condition makes such a large difference that's good to hear — but, I suspect your distro release has a version older than 1.5. Care to check?

Self-Perfection commented 3 years ago

@kilobyte you are right. My distro has older compsize version. Thank you for pointing this out.

kilobyte / compsize

SEARCH_V2 buffer is too big, slows compsize down #38