littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
4.9k stars 770 forks source link

NOR-Flash: frequent bad blocks, corrupted directory pairs after file write and directory operations #986

Closed SBembedded closed 1 month ago

SBembedded commented 1 month ago

Hi,

I'm using littlefs together with a custom driver for the MX25L3233F 4MB NOR Flash on STM32L4R7. (4kB sector, 1 Byte wrap-around read/write, running at 120Mhz)

  1. Creating many files on the "/" root directory, results in corrupted file system. (current max: 662-674 files)
  2. Writing about ~300 files, on a freshly erased and littlefs formatted, mounted fs, results in bad blocks.
  3. Using directory operations, mkdir/rmdir, write to file in sub-directory, results in corruption or bad blocks.
  4. Unittests of my flash driver, point to perfectly working flash chip, with full flash writes and erase cycles working correctly

One thing I found out so far, is that mount & umount and format operations should not be done frequently. There seems to be no self-check if a fs is already mounted. Double-mount operations immediately lead to corrupted fs. The occurence of bad blocks and corrupted dir, are directly influenced by the settings in the cfg struct below. So far, I don't see how my configuration is false?

Littlefs configuration: (v2.9.3)

static lfs_t          lfs;
static lfs_file_t     file;
static lfs_config     cfg;
static uint8_t        lfs_readBuf[1024];
static uint8_t        lfs_progBuf[1024];
static uint8_t        lfs_lookaheadBuf[128];

int FS_Init(void)
{
    cfg.context = NULL;

    // block device operations
    cfg.read = _readBlock;
    cfg.prog = _programBlock;
    cfg.erase = _eraseBlock;
    cfg.sync = _syncFS; // does nothing. Data is directly written to chip

    // block device configuration
    cfg.read_size = 1;     // no limit actually 1 - max-capacity
    cfg.prog_size = 1;     // no limit actually 1-256 per command
    cfg.block_size = 4096;
    cfg.block_count = 1024;
    cfg.block_cycles = 500;
    cfg.lookahead_size = 128;
    cfg.cache_size = 1024;
    cfg.compact_thresh = 0;

    cfg.read_buffer = lfs_readBuf;
    cfg.prog_buffer = lfs_progBuf;
    cfg.lookahead_buffer = lfs_lookaheadBuf;

    // defaults
    cfg.name_max = 0;
    cfg.file_max = 0;
    cfg.attr_max = 0;
    cfg.metadata_max = 0;
    cfg.inline_max = 0; 

    int result = FS_mount();

    return result;
}

Debug output of file-write test: (700 files, with 256 byte of data each)

Test:
(...)
    // write files until no longer possibe and print out number
    for (uint32_t i = 0; i<fileLimit; i++){
        // create new filename
        snprintf(filenameBuffer, 32, "writeTestFile_%d", numberOfFiles);
        result = FS_write(filenameBuffer, pageBuffer, 256, false);
        if( result == LFS_ERR_OK ){
            numberOfFiles++;
        }else {
            break;
        }
    }
(...)

Wrapped Write Function:
----
int FS_write(const char* filepath, const char* buffer, uint32_t size, bool append)
{
    int          result = LFS_ERR_IO;
    int          lfsOpenFlags = LFS_O_RDWR | LFS_O_CREAT | LFS_O_APPEND; // make sure to have
                                                                           // write permission

    // check if file
    lfs_info     info;
    result = lfs_stat(&lfs, filepath, &info);

    if ( info.type == LFS_TYPE_DIR )
    {
        return LFS_ERR_ISDIR;
    }

    // open file
    result = lfs_file_open(&lfs, &file, filepath, lfsOpenFlags);

    if ( result != LFS_ERR_OK )
    {
        return result;
    }

    // write
    if ( !append )
    {
        lfs_file_rewind(&lfs, &file);
    }
    lfs_ssize_t  written = lfs_file_write(&lfs, &file, buffer, size);
    result = ( written < 0 ) ? written : LFS_ERR_OK;

    // close
    lfs_file_close(&lfs, &file);

    return result;
}

../Middlewares/Third_Party/LittleFs/lfs.c:5994:trace: lfs_stat(0x20038dc8, "writeTestFile_286", 0x2009fc6c)
../Middlewares/Third_Party/LittleFs/lfs.c:5998:trace: lfs_stat -> -2
../Middlewares/Third_Party/LittleFs/lfs.c:6059:trace: lfs_file_open(0x20038dc8, 0x20038e48, "writeTestFile_286", 903)
../Middlewares/Third_Party/LittleFs/lfs.c:6065:trace: lfs_file_open -> 0
../Middlewares/Third_Party/LittleFs/lfs.c:6214:trace: lfs_file_rewind(0x20038dc8, 0x20038e48)
../Middlewares/Third_Party/LittleFs/lfs.c:6218:trace: lfs_file_rewind -> 0
../Middlewares/Third_Party/LittleFs/lfs.c:6147:trace: lfs_file_write(0x20038dc8, 0x20038e48, 0x2009fdb0, 256)
../Middlewares/Third_Party/LittleFs/lfs.c:6153:trace: lfs_file_write -> 256
../Middlewares/Third_Party/LittleFs/lfs.c:6096:trace: lfs_file_close(0x20038dc8, 0x20038e48)
../Middlewares/Third_Party/LittleFs/lfs.c:2072:debug: Bad block at 0x34c
../Middlewares/Third_Party/LittleFs/lfs.c:2444:debug: Relocating {0x34d, 0x34c} -> {0x396, 0x34d}
../Middlewares/Third_Party/LittleFs/lfs.c:6101:trace: lfs_file_close -> 0

...
../Middlewares/Third_Party/LittleFs/lfs.c:5994:trace: lfs_stat(0x20038dc8, "writeTestFile_673", 0x2009fc6c)
../Middlewares/Third_Party/LittleFs/lfs.c:5998:trace: lfs_stat -> -2
../Middlewares/Third_Party/LittleFs/lfs.c:6059:trace: lfs_file_open(0x20038dc8, 0x20038e48, "writeTestFile_673", 903)
../Middlewares/Third_Party/LittleFs/lfs.c:6065:trace: lfs_file_open -> 0
../Middlewares/Third_Party/LittleFs/lfs.c:6214:trace: lfs_file_rewind(0x20038dc8, 0x20038e48)
../Middlewares/Third_Party/LittleFs/lfs.c:6218:trace: lfs_file_rewind -> 0
../Middlewares/Third_Party/LittleFs/lfs.c:6147:trace: lfs_file_write(0x20038dc8, 0x20038e48, 0x2009fdb0, 256)
../Middlewares/Third_Party/LittleFs/lfs.c:6153:trace: lfs_file_write -> 256
../Middlewares/Third_Party/LittleFs/lfs.c:6096:trace: lfs_file_close(0x20038dc8, 0x20038e48)
../Middlewares/Third_Party/LittleFs/lfs.c:6101:trace: lfs_file_close -> 0
../Middlewares/Third_Party/LittleFs/lfs.c:5994:trace: lfs_stat(0x20038dc8, "writeTestFile_674", 0x2009fc6c)
../Middlewares/Third_Party/LittleFs/lfs.c:1369:error: Corrupted dir pair at {0x1, 0x0}
../Middlewares/Third_Party/LittleFs/lfs.c:5998:trace: lfs_stat -> -84
../Middlewares/Third_Party/LittleFs/lfs.c:6059:trace: lfs_file_open(0x20038dc8, 0x20038e48, "writeTestFile_674", 903)
../Middlewares/Third_Party/LittleFs/lfs.c:1369:error: Corrupted dir pair at {0x1, 0x0}
../Middlewares/Third_Party/LittleFs/lfs.c:6065:trace: lfs_file_open -> -84

I tried to get it stable. So far without success. In some instances leading to a bricked device idling in an assert, if you dont immediately erase & reformat after dir corruption. For a production software rolling out to embedded devices it seems littlefs is far too unstable and unreliable to be of use?!

Using directory operations with these settings, leads to even faster dir-corruption and tons of bad blocks.

I wonder how blocks can be marked as bad, when there has been at-most one write on them. Does littlefs not work correctly with NOR Flash?

So far I understand it, the flash space is evenly divided in blocks. So every address of a block, should be

uint32_t address = block * c->block_size;

Perhaps someone has an idea, what I'm doing wrong here.

SBembedded commented 1 month ago

In the end it turned out to be a timing and driver related issue.

On STM32 with OSPI communication, auto polling for mem-ready is configured with cpu cycles instead of total time. Depending on the CPU load and other background tasks, this timeout may work in normal application condition, but may not in testing conditions without background tasks.

In this case, depending on the load, the driver would sporadically run into a timeout (e.g. chip is busy on erase), subsequent write/erase operations are discarded by the chip. This leads to littlefs operations not completing correctly, thus the corrupted filesystem and bad blocks.

Currently, more than 970 files @4kB can be written and erased. It seems to work stable now.

geky commented 1 week ago

Hi @SBembedded, glad you figured out the issue. What an interesting bug.

Flash erase operations can take a surprising amount of time to complete, so it makes sense that is what would trip it up.

If you detect the timeout and return LFS_ERR_IO in your bd functions, it would be propagated up through littlefs's APIs in case this happens again in the future.