Expanding superblock not happen when the filesystem is full more than 50%

gbolgradov commented 12 months ago

When the filesystem is more than 50% full, superblock expanding cannot occur because there is an explicit check in lfs.c. Is there a way to refine this precondition to check the size of the directory to be relocated is less than free space instead of doing a check for used space against the entire FS size.

This case is reached when we have an initially FS more than 50% full with ~5 files (eg initial settings or static content), then we do a large number of overwrites (>1000) on a small file (LFS_F_INLINE). The directory revision reaches block_cycles, but due to the above check failing, the directory relocation (with the file content) does not occur. This causes the same two blocks (directory blocks pair) of FS to wear out.

Just for test, I changed the condition from ((lfs_size_t)size < lfs->block_count/2) to (dir->count < (lfs->block_count - (lfs_size_t)size)), and in my case everything works fine.

rojer commented 11 months ago

@geky can you take a look at this please? Georgi is one of our engineers, and this seems quite serious if we are wearing out flash on filesystems that are more than half-full, because most of our devices are like that.

BrianPugh commented 11 months ago

It appears that @gbolgradov is talking about this check here. The surrounding context pasted here for completeness:

    if (lfs_dir_needsrelocation(lfs, dir)
            && lfs_pair_cmp(dir->pair, (const lfs_block_t[2]){0, 1}) == 0) {
        // oh no! we're writing too much to the superblock,
        // should we expand?
        lfs_ssize_t size = lfs_fs_rawsize(lfs);
        if (size < 0) {
            return size;
        }

        // do we have extra space? littlefs can't reclaim this space
        // by itself, so expand cautiously
        if ((lfs_size_t)size < lfs->block_count/2) {
            LFS_DEBUG("Expanding superblock at rev %"PRIu32, dir->rev);
            int err = lfs_dir_split(lfs, dir, attrs, attrcount,
                    source, begin, end);
            if (err && err != LFS_ERR_NOSPC) {
                return err;
            }

            if (err) {
                // welp, we tried, if we ran out of space there's not much
                // we can do, we'll error later if we've become frozen
                LFS_WARN("Unable to expand superblock");
            } else {
                end = begin;
            }
        }
    }

It's a bit unclear to me why this check is necessary. Sleuthing through the commit history, it seems like it might be a holdover from when expanding was done differently? I'm not sure.

geky commented 11 months ago

Hi all, sorry about the late response.

First thing to note, this should only be happening to the root directory. The root directory starts out in blocks 0x{0,1} (the same as the superblock) until the first superblock extension occurs. If the filesystem fills up before the first superblock extension it makes sense that this could happen, which is a problem.

Is there a way to refine this precondition to check the size of the directory to be relocated

Writing to the root directory is the most common way to trigger a superblock extension, but writes to any directory can trigger it after enough relocations of that directory. So I think the specific directory involved is a bit of a red herring.

It's a bit unclear to me why this check is necessary.

This is a specific case of a general copy-on-write filesystem issue. Since you need to make copies to write, if the filesystem becomes completely full, it gets stuck. Preventing writes, even if those writes would delete files.

This is especially bad for small filesystems. For example, if a superblock extension happened in an 8-block filesystem with 6 blocks in use, the filesystem is stuck.

littlefs also can't reclaim superblocks, which is why this check is so aggressive.

That being said, this check is probably too aggressive.

A quick fix is to change the extension threshold to something more like ~88% full, I believe this limit is used by both ZFS and btrfs, though they have quite a bit more blocks to play with:

if ((lfs_size_t)size < (lfs->block_count*7)/8) {

Will need to figure out how to rephrase that to avoid overflow issues...

rojer commented 11 months ago

thanks @geky ! fwiw, all our access is to root directory (we don't even have directories as such, everything lives in the root).

geky commented 11 months ago

Storing things in a directory is more or less functionally the same as storing things in the root after a superblock extension. Maybe we should add an option to pre-allocate a certain number of superblocks during format...

geky commented 11 months ago

ZFS => reserves ~3.2% [ref]
btrfs => reserves ~512MiB [ref] (lol)

Also one way to avoid overflow issues is to check what is available (this is mostly a note to self). Reserving ~6% for example:

if (lfs->block_count - size > lfs->block_count/16) {

geky commented 11 months ago

I do want to rip out this check entirely though. It's a cludge and a hidden filesystem traversal that any metadata commit can trigger is not great.

I think the way forward is to move this logic into the block allocator:

Keep track of roughly how many blocks are in use.
Mark each block allocation as critical/non-critical.
Fail non-critical allocations early if usage exceeds a user configurable reserved amount.

The hard part will be figuring out which block allocations are critical/non-critical. But the rest can be implemented relatively easily, and we can mark the superblock extension as the only "non-critical" allocation for the time being.

rojer commented 11 months ago

sounds like a good idea but a big change to make. could we start with adjusting the threshold perhaps? 6% sounds reasonable.

geky commented 11 months ago

My head space was that it would be a relatively quick changeset since I'm already working on some unrelated allocator changes. But fair point, a quick fix is a good idea.

I'm also thinking of sticking with ~12% unless more information becomes available (a deeper analysis would be valuable). The reserved blocks are still available for wear-leveling, and we have a lot more to worry about around block alignment than ZFS/btrfs (reserving 12% is going to be a tiny bit less than 512MiB here).

I think in general littlefs should prefer reliability over disk utilization, and when this eventually becomes user configurable, users can opt-in to a smaller margin of error.

But I'm open to feedback. I'll put up a PR.

geky commented 11 months ago

My original thinking was starting to reserve blocks when a filesystem is >=8 blocks feels right, but there's not really a lot to go on here...

geky commented 11 months ago

PR here: https://github.com/littlefs-project/littlefs/pull/910

geky commented 11 months ago

I'm still unsure about ~88% vs ~94% vs other, but at the very least this is easy to change in the future. Merging...

littlefs-project / littlefs

Expanding superblock not happen when the filesystem is full more than 50% #901