knorrie / python-btrfs

Python Btrfs module
GNU Lesser General Public License v3.0
112 stars 22 forks source link

btrfs-balance-least-used: add metadata and system block groups #22

Closed GrahamCobb closed 4 years ago

GrahamCobb commented 5 years ago

For my own use, I had to teach btrfs-balance-least-used to balance metadata block groups as well as data block groups. I added system as well even though I did not need them.

This adds three command line options: -d/--data, -m/--meta and -s/--system. -d is the default. Combinations of the options work.

Help and man page are updated.

PR created in case you think this is a useful feature to add to your version.

knorrie commented 5 years ago

Hi! Thanks!

May I ask you what your 'own use' scenario is? In what scenario do you want to balance metadata?

I wrote this one as part of my research into the pre-4.14 kernel (often implicit) -o ssd option that left a huge amount of unused space behind in already allocated data block groups. So, the purspose is to compact data again in the fastest way. After fixing that and using newer kernels, I actually don't really need this tool any more. But, it can be useful to compact used space after deleting a huge amount of random stuff. And, moreover, it is an example for how to write your own user-space defined balance algorithm using the library functions.

I can imagine that a scenario for balancing metadata could be converting to another profile, but this tool does not have any convert options (should it?).

Just adding the options for sake of completeness (in a technical way) is something that I'm not in favor of. There should be a story behind it, that tells why it was a good idea.

I can especially not think of any reason for needing to be able to balancing system block groups in a way where the least used one goes first.

Usually it's not recommended to balance metadata with as goal compacting space used. Metadata writes are really chaotic and have a very different IO pattern than data writes. Metadata writes like to find empty space in the vicinity of where it's writing at the moment.

Sorry to ask questions and probably be annoying. I really value your feedback.

Hans

GrahamCobb commented 5 years ago

Hi,

On my main data disk (with heavy email processing) I had problems some time ago with running out of metadata space. Because of that, I set up several cron jobs which do frequent balances of both data and metadata space. For each, I balance at 0% and then 20% twice a week and have been doing it for a few years.

This works and I have had no further problems with space. It is quite possible that more recent kernels have reduced or even removed the need to do it, but I am not particularly inclined to find out.

One problem is that the frequent balances are quite heavy and have quite a performance impact on the system. Due to the impact, I stop mail processing while the balances are happening. I have fairly recently switched to using balance-least-used for the data balances and found that the progressive approach seemed to work faster and hence mean processing is halted for less time.

The 20% metadata balance is still taking a while, so I decided to try the progressive approach to that as well. In order to do that, I needed to add the metadata option to balance-least-used. I don't yet have any results to know whether it has made any significant impact but thought the feature might be generally useful.

I have not used the system option - I only added it because the btrfs balance command supports it! In fact, the only time I have ever done an explicit system block balance was when I needed to force conversion of the system profile.

I am actually in the process of rewriting my "balance-slowly" scripts to integrate them with balance-least-used (so that I can put time limits on balance operations). Once that is done, I don't suppose I will be using balance-least-used itself any more (although it is still the heart of the new balance-slowly tool).

Feel free to accept or ignore the PR as you see fit.

knorrie commented 5 years ago

Hi, thanks for sharing all of this.

When 'running out of metadata space', did you run out of unallocated disk space? If not, then it's one of those bugs where metadata code wants to reserve more space but doesn't want to trigger new block group allocations. Compacting metadata will only make it easier to get into this situation. (Very recently improvements on space reservation handling have been made, but we won't see those backported to existing older kernels [1].)

I think you should actually be 'inclined to find out' if the original cause of your problem is still present. I'm not aware of bugs that actually trigger over-allocation of metadata blockgroups (except for using the ssd_spread mount option, but then you're explicitly asking for it).

[1] https://www.spinics.net/lists/linux-btrfs/msg92874.html

Zygo commented 5 years ago

Metadata should be balanced in exactly two scenarios:

  1. You are changing RAID profile:
    • moving from dup on 1 disk to 2+ disks
    • moving from a RAID profile on 2+ disks to dup on 1 disk
    • moving from single, raid0, raid5 or raid6 profiles (which should never be used for metadata due to the risk of total filesystem data loss) to dup, raid1, or raid10
    • deleting empty single profile metadata block groups created by older versions of mkfs All of the above take the form -m soft,convert=dup or raid1 or raid10 depending on the number of disks in the filesystem. Removing or shrinking a device also relocates metadata in a similar (acceptable) way.
  2. You have hit a kernel bug (there are too many to list here) and as a result you have lots of empty metadata block groups that have not been cleaned up automatically: -m usage=0 or -m usage=1.

In all other cases, it is best to never balance metadata. It is slow, heavily write-intensive (especially for SSDs), and if it is done regularly it is likely to cause problems in the long term as the filesystem fills up.

If you release unused metadata block group space to the filesystem, e.g. with balance start -musage=20, that space may be allocated to data block groups. Later, when more metadata space is required again (sometimes quite surprising amounts of it), no space will be available, and the filesystem will be forced read-only. It can be quite difficult to recover the filesystem to a read-write state when this happens--in the worst cases it will require additional storage and/or kernel patches.

Balancing data block groups incrementally helps prevent the metadatapocalypse, but this will not work if the filesystem fills up to the last block group and free space cannot be further defragmented (common on small filesystems). In such cases it is best to leave any space allocated to metadata alone, so the filesystem will always have it available when needed.

knorrie commented 4 years ago

To conclude this, no, balance-least-used will only do data.