tytso / e2fsprogs

Ext2/3/4 file system utilities
http://ext4.wiki.kernel.org
373 stars 219 forks source link

Correct smallest size estimation for `resize2fs` shrink #174

Open dilyanpalauzov opened 7 months ago

dilyanpalauzov commented 7 months ago

https://github.com/tytso/e2fsprogs/blob/master/resize/resize2fs.8.in#L164 says it

KNOWN BUGS The minimum size of the file system as estimated by resize2fs may be incorrect, especially for file systems with 1k and 2k blocksizes.

Please improve resize2fs to create optimally small file systems. I want to create IMG files for Raspberry Pi, then burn these to SD cards. Once the operating system boots on the hardware, it expands to allocate all the available space on the SD card.

The larger the image to burn, the longer the burning takes and the more complicated distributing the files get. I could create sparse files, e.g. by calling LIBGUESTFS_BACKEND=direct virt-sparsify -v --inplace file.img, but I am not sure if this helps for the burning duration. Ideally the Image shall be as small as possible and to achieve this resize2fs shall be able to calculate the perfect smallest size of the image.

tytso commented 7 months ago

If it were easy to do this, don't you think I would have done it already? Unfotunately, calculating the exact amount of space needed to make sure we have enough space for both the metadata and data blocks is a bit tricky, since the amount of metadata blocks required is depedant on overall size of the file system, and as the file system is packed tighter and tighter, the data blocks can get more fragmented, requiring more metadata blocks.

And as much as users will occasionally whine and complain and kvetch that resize2fs calculation of the minimum size might be too large, they would whine and complain and kvetch even more if the estimate was too small, leading to resize2fs failing and leaving the file system damaged.

For example, suppose that you are trying to package up about 124M of data files:

% du -sh ~/imap/INBOX
124M    /home/tytso/imap/INBOX

This is a bad way to create a minimal file system:

% mke2fs -Fq -b 4k -t ext4 -d ~/imap/INBOX /tmp/foo.img 2T
Creating regular file /tmp/foo.img
% resize2fs -M /tmp/foo.img
resize2fs 1.47.0 (5-Feb-2023)
Resizing the filesystem on /tmp/foo.img to 553577 (4k) blocks.
The filesystem on /tmp/foo.img is now 553577 (4k) blocks long.
% dumpe2fs -h /tmp/foo.img | egrep "(Block count)|(Free blocks)"
dumpe2fs 1.47.0 (5-Feb-2023)
Block count:              553577
Free blocks:              245089

... and here is a smarter way to do the same thing:

% mke2fs -Fq -b 4k -t ext4 -d ~/imap/INBOX /tmp/foo.img 512M
Creating regular file /tmp/foo.img
% resize2fs -M /tmp/foo.img
resize2fs 1.47.0 (5-Feb-2023)
Resizing the filesystem on /tmp/foo.img to 38560 (4k) blocks.
The filesystem on /tmp/foo.img is now 38560 (4k) blocks long.
% dumpe2fs -h /tmp/foo.img | egrep "(Block count)|(Free blocks)"
dumpe2fs 1.47.0 (5-Feb-2023)
Block count:              38560
Free blocks:              1857

Unfortunately, if you pick too small of a file system size (to allow for the metadata blocks as well as the data block), mke2fs might fail:

% mke2fs -Fq -b 4k -t ext4 -d ~/imap/INBOX /tmp/foo.img 128M
Creating regular file /tmp/foo.img
__populate_fs: Could not allocate block in ext2 filesystem while writing file "1539183246_25.13823.cwcc,U=335989,FMD5=7e33429f656f1e6e9d79b29c3f82c57e:2,"
mke2fs: Could not allocate block in ext2 filesystem while populating file system

Getting the sizes right is tricky; if you think you can do better, please feel free to send me a patch. (And then we will need to make sure the patch won't cause failures, which will cause users to complain and whine.)

dilyanpalauzov commented 7 months ago

My understanding from the above is, that reducing the file system can increase the fragmentation and thus can require more space for metadata. I have no understanding about filesystems, so I am not going to provide patches.

Why is it not feasible to move first data to the beginning of the file system, in a way that fragmentation is kept minimal, then calculate the optimal size, assuming no further fragmentation will happen, and write down that this algorithm produces a lot of IO and takes more time.

tytso commented 7 months ago

Indeed, fragmentation is one of the problems (but not the only one). A full defragmentation is not actually that simple, because you might need to move data blocks around to keep things contioguous. Again, if you think it's simple, please provide patches, otherwise, please accept the word of someone who does know file systems that it's not so simple. In particular, making sure that if the system (or resize2fs) crashes in the middle of the defragmentation, that the user's data is not lost, is not trivial --- especially if you are trying to optimize for not having the defragmentation doesn't take a super long time.

If you are building an embeeded image, if you can more acccurately estimate the size of the requested file system, then resize2fs -M is going to work well. The real problem as far as I'm concerned is lazy embedded systems who want to be able to create a gigantic file system (say, 2TB) and then shrink it down to final size (which might only be a few hundred megabytes). Doctor, doctor, it hurts when I do that. Then don't do that!

Or send patches to make it do better, or pay someone to do that work if you can make the business case to your employer that it's worth it to make complex changes to e2fsprogs instead of fixing your build system....