knorrie / python-btrfs

Python Btrfs module
GNU Lesser General Public License v3.0
112 stars 22 forks source link

bblu --kthxbye #29

Open knorrie opened 3 years ago

knorrie commented 3 years ago

So, there's btrfs-balance-least-used, or bblu as we might call it. The reason this example program was created was to try defragment free space as efficient and quick as possible. I needed this to fight or to recover from situations in which the old -o ssd allocator was being used.

So what's the tool still good for now? Well, users still regularly ask for something that they can run periodically to prevent getting into unexpected ENOSPC situations because of whatever other reason. bblu could of course be used for this, by telling it to compact stuff until at least all block groups are at some % of usage. But, that would likely mean that it's often doing a lot of unnecessary work.

It would be interesting to make it a bit smarter so that it executes the minimal amount of work necessary with a goal of making sure there's actually usable unallocated raw disk space present. How hard can it be? Well, for example, if we're having 100G of unallocated disk space, but it's on 1 disk of 2 and the target profile is RAID1... fail.

What I'm thinking about is some fire-and-forget mode to run it in, in the title jokingly called --kthxbye, but maybe something like --auto. It should use a clear set of rules that we think need to be met.

Now, the python-btrfs library already has the fsusage module which provides a large amount of interesting information that can be used: https://python-btrfs.readthedocs.io/en/stable/btrfs.html#btrfs.fs_usage.FsUsage The btrfs-usage-report tool simply displays almost everything it can tell you.

Next: but how do we figure out which block groups exactly need to be fed to balance to fix the unbalanced situation?

knorrie commented 3 years ago

Oh actually, free_(meta)data is not the right one, since it also includes free space in existing chunk allocations, and the ENOSPC happens when the filesystem wants to force new chunk allocations.

Instead, estimated_allocatable_virtual_(meta)data can be used, which tells us how much actual (meta)data can be stored in completely new allocations that can still be done. (edited above)

Zygo commented 3 years ago

Things I learned from the "Century Balance" bug miner:

How much metadata space do you need?

e.g. a 2-disk raid1 filesystem with 4 GB of metadata BGs and 3.5 GB of metadata used must have have enough room for 8 GB of total metadata (3.5 GB used + .5 GB reserve = 4GB, 4GB * 1.25 + 2 GB for disks + 1 GB for balance = 8GB).

So far all metadata ENOSPC failures I've seen have occurred when used metadata space * 1.12 > (allocated + available) space. 1.25 is slightly larger. It is basically a fudge factor to estimate how many snapshot metadata pages are going to get CoWed within their lifetime.

Forza-tng commented 3 years ago

I like the idea of having a daemon doing regular balance across all the devices in the filesystem to even things of. Set and forget seems perfect. Possibly with email reports :)

knorrie commented 2 years ago

..oO(Yes, or use the kernel trace point in the chunk allocator as trigger to wake up and quickly look around if something needs to be done.)