littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
5.19k stars 796 forks source link

lfs_file_sync (lfs_file_close) failure, lfs_dir_commitattr returning LFS_ERR_NOSPC; #478

Open Karlhead opened 4 years ago

Karlhead commented 4 years ago

Hello,

I'm facing som issues which I'm having a hard time understanding. I've been using lfs for some time now without experiencing this issue before, but lately this issue has been surfacing more than once.

I'm downloading data into several files, one at a time, around 10 files with approx. 80 KB data in each file. The first 9 files are succesfully filled with data and closed correctly, but when I'm trying to close the last file, I will get the LFS_ERR_NOSPC error code returned from the lfs_file_close-function, and the same thing happends if I try to call lfs_file_sync before lfs_file_close as well. The problem is consistent until I re-format the filesystem.

As I'm debugging the lfs_dir_commitattr-function I can see that off + dsize is larger than end, resulting in the following return: dsize = 16 commit->block = 7712 commit->off = 490 commit->begin = 0 commit->end = 504 if (commit->off + dsize > commit->end) { return LFS_ERR_NOSPC; }

However, there is no way that my device is out of memory. There is a total of 8388608 blocks (512B each) to lfs's disposal. When calling lfs_fs_size, I can verify that only 1960 blocks are in use and 8386648 blocks are left.

Any help would be greatly appreciated.

geky commented 3 years ago

Hi @Karlhead, sorry for the late response

The fact that NOSPC is comming from lfs_dir_commitattr suggests that the total file metadata can't fit in the metadata block. Do you have fairly large custom attributes attached to the files? The combined total of the custom attributes, file name, and a bit of extra metadata all need to fit in a single metadata block.

One option is to increase the block size to a multiple of the block device's block size. So 1KiB, 2KiB, 4KiB, etc. With 8M blocks this would also slightly improve the allocator performance.

Karlhead commented 3 years ago

Hi @geky, no worries.

I have no custom attributes attached to the files, but the metadata is as you pointed out not fitting in the block anyways. I will consider increasing the blocksize to a multiple.

Thanks!

If you have the time: 8M blocks would slightly improve the allocator performance, how so?

geky commented 3 years ago

8M blocks would slightly improve the allocator performance, how so?

Huh, I don't know what I meant by "8M", maybe that was a typo.

With larger (1/2/4KiB) blocks, the performance of the allocator slightly improves improves because there are less blocks in the filesystem. When the allocator runs, it doesn't actually read each block, but it needs to read metadata referencing the blocks. So less blocks == less metadata == faster allocator.

The tradeoff is that the filesystem may waste more space. LittleFS has inline files, but if your file is larger than the inline size (cache size), it will use full blocks for the file.

Other filesystems do something similar for similar reasons: https://support.microsoft.com/en-us/help/140365/default-cluster-size-for-ntfs-fat-and-exfat

Karlhead commented 3 years ago

Thanks!

remfan77 commented 3 years ago

Hello, in my system I reproduce a similar scenario. The bug is triggered from lfs_dir_commitattr.

static int lfs_dir_commitattr(lfs_t *lfs, struct lfs_commit *commit,
        lfs_tag_t tag, const void *buffer) {
    // check if we fit
    lfs_size_t dsize = lfs_tag_dsize(tag);
    if (commit->off + dsize > commit->end) {
        return LFS_ERR_NOSPC;
    }

This happens using a clean formatted partition, so I can exclude power-loss related bugs. I'm using littlefs-fuse. Littlefs partition is 128MB.

To trigger the problem I have to copy a folder from my PC to my ARM target using SAMBA connection. I always see the problem. The folder size is about 18MB. sector size = 512 dsize = 16 commit->off = 490 commit->end = 504

If I use a sector size = 1024 (as @geky ) I do not see the problem.

Is this a real solution ?

The find -ls output is attached. folder.txt The problem is related to copying one of these (maybe the creating of ./plc/TestFastcat_data) 128 1 drwxr-xr-x 1 root root 512 Jan 13 09:03 ./plc/TestFastcat_data 129 1 drwxr-xr-x 1 root root 512 Jan 13 09:03 ./plc/TestFastcat_data/Alarms 130 1 -rwxr-xr-x 1 root root 832 Dec 22 12:05 ./plc/TestFastcat_data/Alarms/Log.a

Thanks for support and this great project. Best regards,

Paolo

geky commented 3 years ago

Hmm, do you have any custom attributes? If so how many bytes of custom attributes do you have on each file?

In theory if the size of the file name + custom attributes for a single file is < 1/2 the block size you shouldn't see this. The filesystem should split metadata blocks until each file gets its own metadata block worst case. It's possible there is a bug that is leading to the filesystem not splitting metadata blocks when it needs to.

Other info that would help:

remfan77 commented 3 years ago

Hello @geky block_size=512 cache_size=512 (setted the default block_size) I do not change anything about custom attributes. To tell the truth I do not exactly know what is. I saw a little the code. Are they used to set some custum values (for example data and time) ?

void TRIGGER_BUG(void)
{
        printf("TRIGGER_BUG\n");
}

static int lfs_dir_commitattr(lfs_t *lfs, struct lfs_commit *commit,
        lfs_tag_t tag, const void *buffer) {
    // check if we fit
    lfs_size_t dsize = lfs_tag_dsize(tag);
    printf("%s : commit->off=%d dsize=%d commit->end%d commit_block=%d\n", __FUNCTION__, commit->off, dsize, commit->end, commit->block);
    if (commit->off + dsize > commit->end) {
        {
        TRIGGER_BUG();

gdb --args lfs /dev/mmcblk3p3 /data2 -f b TRIGGER_BUG r

(gdb) backtrace

0 0x0000b8cc in TRIGGER_BUG ()

1 0x0000b910 in lfs_dir_commitattr ()

2 0x0000cd66 in lfs_dir_compact ()

3 0x0000d2b6 in lfs_dir_commit ()

4 0x0000df0e in lfs_mkdir ()

5 0xb6fbcfae in fuse_fs_mkdir (fs=0x1a718, path=0x2aa58 "/plc/TestFastcat_data", mode=493) at fuse.c:2224

6 0xb6fbf358 in fuse_lib_mkdir (req=0x1a520, parent=2, name=0xb6e5b038 "TestFastcat_data", mode=)

at fuse.c:2945

7 0xb6fc24f2 in do_mkdir (req=, nodeid=, inarg=) at fuse_lowlevel.c:1126

8 0xb6fc293c in fuse_ll_process_buf (data=, buf=0xbefffbe8, ch=)

at fuse_lowlevel.c:2443

9 0xb6fc443a in fuse_session_process_buf (se=se@entry=0x1a4f8, buf=buf@entry=0xbefffbe8, ch=)

at fuse_session.c:87

10 0xb6fc07ee in fuse_session_loop (se=0x1a4f8) at fuse_loop.c:40

11 0xb6fbc4ea in fuse_loop (f=f@entry=0x1a618) at fuse.c:4322

12 0xb6fc55b0 in fuse_main_common (argc=, argv=, op=,

op_size=<optimized out>, user_data=user_data@entry=0x0, compat=compat@entry=0) at helper.c:371

13 0xb6fc5640 in fuse_main_real (argc=, argv=, op=,

op_size=<optimized out>, user_data=0x0) at helper.c:383

14 0x00009860 in main ()

I hope this helps. Thanks

remfan77 commented 3 years ago

An other small information... I made an archive containing the files/folders that show the problem. If I uncompress on the target (using tar xvfz ...) it works correctly.

If the files/folders are written by smbd (samba daemon ) I see the problem.

remfan77 commented 3 years ago

Tried to add a mutex on all fuse functions, so each call to lfs is serialized. I see the same problem.

remfan77 commented 3 years ago

Today I tried to downgrade littlefs keeping the same littlefs_fuse. I found that

remfan77 commented 3 years ago

In my case, the commit 0d4c0b1 introduces the problem.

Tried on different versions. If I revert this commit, I do not see the problem anymore.

I don't know and I do not understand well the internals of littlefs. It could not be the real solution. It is only based on the experience acquired with attempts made by brute force.

remfan77 commented 3 years ago

Now I'm able to reproduce the problem very easily on a linux pc. I get https://github.com/littlefs-project/littlefs-fuse (v2.4) make

On the same folder of lfs binary generated, I put the following shell script in a file, for example go.sh

mkdir mnt
dd if=/dev/zero of=lfs.img bs=256K count=1
losetup /dev/loop0 lfs.img  
./lfs /dev/loop0 --format  
./lfs /dev/loop0 mnt  
cd mnt
for i in $(seq 1 8192)
do
        if ! mkdir $i; then
                echo error mkdir $i
                exit 1
        fi
        if ! touch _$i; then
                echo error touch _$i
                exit 1
        fi
done
echo all is OK!

It simply creates a 256K image. It format it in. It creates 1 (directory) _1 (file lenght 0) 2 (directory) _2 (file lenght 0) 3 (directory) _3 (file lenght 0) 4 (directory) _4 (file lenght 0) ....and so on.

If a mkdir or touch command fails, it stops with a message.

I run go.sh. I see this message mkdir: cannot create directory '66': No space left on device This is the bug! There is free space.

If now I digit manually mkdir _66 it works.

Now also mkdir 66 works.

mrchristian6161 commented 2 years ago

I continue to see this issue with version 2.4.1. Has there been any progress in resolving this issue?

Additional information: If I have read_size and prog_size set the same as block_size, this issue occurs. If I reduce read_size and prog_size, then this error does not occur. The documentation in the source says that the read_size and prog_size must be a "factor" of block_size. Even though being equal is technically a factor, I found that I must make them smaller.