littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
4.92k stars 774 forks source link

littlefs writes only to the first erasable sector until this sector becomes full, corrupting its own filesystem. Unable to format the drive if prog_size=read_size=cache_size<64 Bytes. #883

Closed pawelstraszak closed 8 months ago

pawelstraszak commented 8 months ago

Hardware

STM32L496ZGT<==QSPI==>W25Q80DV. I use this development board.

Boot count demo application

I am running a "boot counter demo" application, in which during every reboot, boot count value is read from a file, placed in a local variable on a stack, incremented, written-back to the same file (the file is overwritten) and, finally, a message containing the boot count value is sent via a serial port. Boot count value is of an uint32_t type (4 bytes long). Reboot is generated by resetting the MCU or interrupting the power supply (I tested both methods and both seem to work well - correct boot count value is held in the flash memory as expected).

What is wrong

1) I am unable to format the drive when prog_size=read_size=cache_size<64 Bytes[^1]. 2) Formatting succeeds only if I set the prog_size=read_size=cache_size>=64 Bytes.[^2] 3) Application seems to work as expected. Every time I reset the MCU or turn-off-and-on a power switch, the boot count is incremented (hooray!). But... 4) When overwriting the file containing the boot count, littlefs uses only one memory erasable-sector (is this an expected behavior?) 5) When this sector is full, littlefs searches for a free space only in this sector over-and-over (littlefs goes crazy). 6) Then littlefs performs a "self-destruction" and corrupts the filesystem. 7) After the "self-destruction", drive needs to be re-formatted. 8) After re-formatting the drive, everything happens one more time (goto: point 3).

[^1]:Very rarely and randomly, it is possible to format the drive with prog_size=read_size=cache_size=32 Bytes. But just after the only sector used by littlefs becomes full, littlefs corrupts the filesystem and even re-formatting of the drive fails. [^2]:When prog_size=read_size=cache_size=256 Bytes, filesystem can be mounted, but sometimes file cannot be opened.

Memory characteristics and littlefs configuration

W25Q80DV characteristics:

With the 1st configuration drive formatting usually fails (32 bytes):

const struct lfs_config cfg = {
      .read  = fs_flash_read,
      .prog  = fs_flash_prog,
      .erase = fs_flash_erase,
      .sync  = fs_flash_sync,

      .read_size =  32,    // cache_size = read_size = prog_size [Bytes]
      .prog_size =  32,    // cache_size = read_size = prog_size [Bytes]
      .block_size =     4096,  // = erasable_sector_size [Bytes]
      .block_count =    256,   // = flash_capacity / erasable_sector_size
      .cache_size =     32,    // cache_size = read_size = prog_size [Bytes]
      .lookahead_size = 64,    // = block_count/8 (minimum, must be a multiple of 8) [Bytes]
      .block_cycles =   100    // [Cycles]
};

Very rarely and randomly, however, it is possible to create a valid filesystem with the 1st configuration. But always exactly after every 124 or 125 (I don't remember the exact value) consecutive successful write operations, filesystem corrupts itself and even re-formatting fails.

Mounting the drive
../Drivers/littlefs/lfs.c:1346:error: Corrupted dir pair at {0x1, 0x0}
Error: -84
Attempting to re-format the drive:
../Drivers/littlefs/lfs.c:2049:debug: Bad block at 0x1
../Drivers/littlefs/lfs.c:2054:warn: Superblock 0x1 has become unwritable
Error: -28
Unable to re-format the drive

With the 2nd configuration drive formatting usually succeeds (64 bytes):

const struct lfs_config cfg = {
      .read  = fs_flash_read,
      .prog  = fs_flash_prog,
      .erase = fs_flash_erase,
      .sync  = fs_flash_sync,

      .read_size =  64,    // cache_size = read_size = prog_size [Bytes]
      .prog_size =  64,    // cache_size = read_size = prog_size [Bytes]
      .block_size =     4096,  // = erasable_sector_size [Bytes]
      .block_count =    256,   // = flash_capacity / erasable_sector_size
      .cache_size =     64,    // cache_size = read_size = prog_size [Bytes]
      .lookahead_size = 64,    // = block_count/8 (minimum, must be a multiple of 8) [Bytes]
      .block_cycles =   100    // [Cycles]
};

This configuration usually works (nearly) as expected. Always exactly after every 61 consecutive successful write operations, filesystem corrupts itself (boot count value of 62 fails on write), but re-formatting usually succeeds.

With the 3rd configuration drive formatting usually succeeds (256 bytes):

const struct lfs_config cfg = {
      .read  = fs_flash_read,
      .prog  = fs_flash_prog,
      .erase = fs_flash_erase,
      .sync  = fs_flash_sync,

      .read_size =  256,    // cache_size = read_size = prog_size [Bytes]
      .prog_size =  256,    // cache_size = read_size = prog_size [Bytes]
      .block_size =     4096,  // = erasable_sector_size [Bytes]
      .block_count =    256,   // = flash_capacity / erasable_sector_size
      .cache_size =     256,    // cache_size = read_size = prog_size [Bytes]
      .lookahead_size = 64,    // = block_count/8 (minimum, must be a multiple of 8) [Bytes]
      .block_cycles =   100    // [Cycles]
};

This configuration usually works (nearly) as expected. Always exactly after every 14 consecutive successful write operations, filesystem corrupts itself (when attempting to write boot count 15), but re-formatting usually succeeds. Sometimes, if there was a filesystem present in the flash memory, filesystem is successfully mounted, but existing file cannot be opened (why?).

Glue/wrapper functions

int fs_init(void)
{
  qspi_init();

  return 0;
}

int fs_flash_read(const struct lfs_config *cfg, lfs_block_t block,
          lfs_off_t off, void *buffer, lfs_size_t size)
{

  uint32_t address;

  assert(off  % cfg->read_size == 0);
  assert(size % cfg->read_size == 0);
  assert(block < cfg->block_count);

#ifdef LFS_PORT_DEBUG
  myprintf("Read block %lu at offset %lu\r\n", block, off);
#endif

  address = block*cfg->read_size + off;

  qspi_read((uint32_t)address, (uint8_t*)buffer, size);

  return 0;
}

int fs_flash_prog(const struct lfs_config *cfg, lfs_block_t block,
          lfs_off_t off, const void *buffer, lfs_size_t size)
{

  uint32_t address;

  assert(off  % cfg->prog_size == 0);
  assert(size % cfg->prog_size == 0);
  assert(block < cfg->block_count);

#ifdef LFS_PORT_DEBUG
  myprintf("Write block %lu at offset %lu\r\n", block, off);
#endif

  address = block*cfg->prog_size + off;

  qspi_write((uint32_t)address, (uint8_t*)buffer, size);

  return 0;
}

int fs_flash_erase(const struct lfs_config *cfg, lfs_block_t block)
{

  uint32_t address;

  assert(block < cfg->block_count);

#ifdef LFS_PORT_DEBUG
  myprintf("Erase block %lu\r\n", block);
#endif

  address = block*cfg->block_size;

  qspi_erase_sector((uint32_t)address);

  return 0;
}

int fs_flash_sync(const struct lfs_config *cfg)
{

#ifdef LFS_PORT_DEBUG
  myprintf("Flash sync\r\n");
#endif

  return 0;
}

main.c - the boot count demo application

lfs_t lfs;
lfs_file_t file;
const char* filename = "boot_cntr.txt";

qspi_init(); // added just to be on the safe-side (not necessary)

memset(&lfs, 0x00, sizeof(lfs)); // added just to be on the safe-side

// mount the filesystem
myprintf("Mounting the drive\r\n");
int err = lfs_mount(&lfs, &cfg);

// reformat if we can't mount the filesystem
// this should __only__ happen on the first boot
if (err) {
    myprintf("Error: %d\r\n", err);
    myprintf("Attempting to re-format the drive:\r\n");

    err = lfs_format(&lfs, &cfg);

    if (err) {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to re-format the drive\r\n");
    while(1); // we have some serious problem...
    }

    myprintf("Successfully re-formatted the drive,\r\n");
    myprintf("attempting to mount the drive\r\n");

    // try to mount the filesystem
    err = lfs_mount(&lfs, &cfg);

    if (err) {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to mount the drive\r\n");
    while(1); // we have some serious problem...
    }
}

myprintf("Successfully mounted the drive.\r\n") ;

// open the file
myprintf("Attempting to open %s file\r\n", filename);
err = lfs_file_open(&lfs, &file, filename, LFS_O_RDWR | LFS_O_CREAT);

if (err) {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to open %s file\r\n", filename);
    while(1);
}

myprintf("Successfully opened the %s file.\r\n", filename) ;
myprintf("Attempting to read the %s file\r\n", filename);

lfs_ssize_t n = 0; // number of bytes read or written
uint32_t boot_count = 0; // holds current boot count value on a stack

// read current count
n = lfs_file_read(&lfs, &file, &boot_count, sizeof(boot_count));

if (n == 0) {

    uint32_t temp = 0; // initial boot count, stack variable

    myprintf("%s file is empty or unable to read the %s file\r\n", filename, filename);
    myprintf("Initial boot count set to: 0\r\n");
    myprintf("Attempting to write the %s file with initial boot count\r\n", filename);

    // initialize the boot count with 0 (the file is probably empty)
    n = lfs_file_write(&lfs, &file, &temp, sizeof(boot_count));

    if (n == 0) {
    myprintf("Unable to write the %s file\r\n", filename);
    while(1);
    }

    myprintf("Successfully wrote %d bytes to %s file\r\n", n, filename);

    myprintf("Closing %s file\r\n", filename);
    err = lfs_file_close(&lfs, &file);
    if(err)
      {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to close the %s file\r\n", filename);
    while(1);
      }

    myprintf("Re-opening %s file\r\n", filename);
    err = lfs_file_open(&lfs, &file, filename, LFS_O_RDWR | LFS_O_CREAT);
    if (err) {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to open the %s file\r\n", filename);
    while(1);
    }

    myprintf("Rewinding %s file\r\n", filename);
    err = lfs_file_rewind(&lfs, &file);
    if (err) {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to rewind the %s file\r\n", filename);
    while(1);
    }

    // check if everything is correct
    n = lfs_file_read(&lfs, &file, &boot_count, sizeof(boot_count));
    if (n == 0) {
    myprintf("Unable to read the %s file\r\n", filename);
    while(1);
    }

    myprintf("File read successfully\r\n");
    myprintf("Initial boot count set to: %d\r\n",boot_count);

} else {
    myprintf("File read successfully\r\n");
    myprintf("Current boot count: %d\r\n", boot_count);
}

myprintf("Updating boot count:\r\n");

// update boot count
boot_count += 1;

// rewind the file to the beginning
myprintf("1) Rewinding the %s file\r\n", filename);
err = lfs_file_rewind(&lfs, &file);
if (err) {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to rewind the %s file\r\n", filename);
    while(1);
}

// write updated boot count to the file
myprintf("2) Writing to the %s file...\r\n", filename);
n = lfs_file_write(&lfs, &file, &boot_count, sizeof(boot_count));
if (n == 0) {
    myprintf("Unable to write to the %s file\r\n", filename);
    while(1);
}

// print the written boot count
myprintf("boot count: %d written to %s \r\n", boot_count, filename);

// remember the storage is not updated until the file is closed successfully
myprintf("Closing the %s file\r\n", filename);
err = lfs_file_close(&lfs, &file);
if(err)
  {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to close the %s file\r\n", filename);
    while(1);
  }

// re-open the file
myprintf("Attempting to re-open %s file\r\n", filename);
err = lfs_file_open(&lfs, &file, filename, LFS_O_RDWR | LFS_O_CREAT);
if (err) {
    myprintf("Error code %d\r\n",err);
    myprintf("Unable to re-open %s file\r\n", filename);
    while(1);
}

myprintf("Successfully reopened %s file\r\n", filename);

// read updated file
n = lfs_file_read(&lfs, &file, &boot_count, sizeof(boot_count));
if(n == 0)
  {
    myprintf("File is empty or unable to read the %s file\r\n", filename);
    while(1);
  }

myprintf("Read boot count: %d\r\n",boot_count);

// close the file
myprintf("Closing the file\r\n");
err = lfs_file_close(&lfs, &file);
if(err)
  {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to close the %s file\r\n", filename);
    while(1);
  }

// release any resources we were using
myprintf("Unmounting the drive\r\n");
err = lfs_unmount(&lfs);
if(err)
  {
    myprintf("Error: %d\r\n",err);
    myprintf("Unable to unmount the drive\r\n");
    while(1);
  }

myprintf("\r\n\r\n\r\n");

while(1)
  {
    ; // stuck here until reboot
  }

Results

With this code (main.c, glue-functions and 2nd configuration) I obtain following resuts:

Mounting the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
File read successfully
Current boot count: 1
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 2 written to boot_cntr.txt 
Closing the boot_cntr.txt file
Attempting to re-open boot_cntr.txt file
Successfully reopened boot_cntr.txt file
Read boot count: 2
Closing the file
Unmounting the drive

Mounting the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
File read successfully
Current boot count: 2
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 3 written to boot_cntr.txt 
Closing the boot_cntr.txt file
Attempting to re-open boot_cntr.txt file
Successfully reopened boot_cntr.txt file
Read boot count: 3
Closing the file
Unmounting the drive

Mounting the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
File read successfully
Current boot count: 3
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 4 written to boot_cntr.txt 
Closing the boot_cntr.txt file
Attempting to re-open boot_cntr.txt file
Successfully reopened boot_cntr.txt file
Read boot count: 4
Closing the file
Unmounting the drive

and so on...

Mounting the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
File read successfully
Current boot count: 58
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 59 written to boot_cntr.txt 
Closing the boot_cntr.txt file
Attempting to re-open boot_cntr.txt file
Successfully reopened boot_cntr.txt file
Read boot count: 59
Closing the file
Unmounting the drive

Mounting the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
File read successfully
Current boot count: 59
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 60 written to boot_cntr.txt 
Closing the boot_cntr.txt file
Attempting to re-open boot_cntr.txt file
Successfully reopened boot_cntr.txt file
Read boot count: 60
Closing the file
Unmounting the drive

Mounting the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
File read successfully
Current boot count: 60
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 61 written to boot_cntr.txt 
Closing the boot_cntr.txt file
Attempting to re-open boot_cntr.txt file
Successfully reopened boot_cntr.txt file
Read boot count: 61
Closing the file
Unmounting the drive

Mounting the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
File read successfully
Current boot count: 61
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 62 written to boot_cntr.txt 
Closing the boot_cntr.txt file
../Drivers/littlefs/lfs.c:2049:debug: Bad block at 0x0
../Drivers/littlefs/lfs.c:2054:warn: Superblock 0x0 has become unwritable
Error: -28
Unable to close the boot_cntr.txt file

After another reset, the drive cannot be mounted and needs to be reformatted:

Mounting the drive
../Drivers/littlefs/lfs.c:1346:error: Corrupted dir pair at {0x0, 0x1}
Error: -84
Attempting to re-format the drive:
Successfully re-formatted the drive,
attempting to mount the drive
Successfully mounted the drive.
Attempting to open boot_cntr.txt file
Successfully opened the boot_cntr.txt file.
Attempting to read the boot_cntr.txt file
boot_cntr.txt file is empty or unable to read the boot_cntr.txt file
Initial boot count set to: 0
Attempting to write the boot_cntr.txt file with initial boot count
Successfully wrote 4 bytes to boot_cntr.txt file
Closing boot_cntr.txt file
Re-opening boot_cntr.txt file
Rewinding boot_cntr.txt file
File read successfully
Initial boot count set to: 0
Updating boot count:
1) Rewinding the boot_cntr.txt file
2) Writing to the boot_cntr.txt file...
boot count: 1 written to boot_cntr.txt 
Closing the boot_cntr.txt file
Attempting to re-open boot_cntr.txt file
Successfully reopened boot_cntr.txt file
Read boot count: 1
Closing the file
Unmounting the drive

If resets happen too frequently (or maybe it is simply a coincidence, not a rule), it is possible to break the filesystem even earlier, when the boot count is < 61 (i.e. about 40).

Results --verbose

I also attach a comprehensive log, including littlefs traces, block numbers, offset values and addresses of every IO operation. The first log covers boot counts from 2 to 11 (just after filesystem has started). output_verbose_boot_count_2_to_11.txt The second log covers boot counts from 56 to 61 (just before filesystem has failed) output_verbose_boot_count_56_to_61.txt The last log covers filesystem crash itself. output_verbose_fail_boot_count_62.txt There were so many entries in the log that I omit some, which is marked by:

...
...
...

What I have tested so far

The problem may lie in the:

I tested addresses, block numbers and offsets during read/write/erase operations in glue/wrapper functions. It seems OK.

I also tested underlying low-level functions (qspi_xxxxxxx()). These functions are supplied by the vendor of the development board and internally call STM32 HAL QSPI library functions.

I made to them small improvements and bug fixes (because i.e. erase_sector function wasn't waiting for erase operation to complete, there was wrong ERASE_SECTOR opcode, etc.).

I tested QSPI memory raw write, raw erase and raw readout. Before and after every test I uploaded whole affected memory to see what has changed. These tests prove that I can successfully write at an arbitrary address an arbitrary number of bytes (even if it requires programming multiple pages and slicing input data to fit to the page boundaries, because data are programmed in pages of 256 Bytes) without any issues. I tested also the erase function (whether it erases only the locations that are intended to be erased). It looks OK. By the way, if low-level functions were totally broken, littlefs would not have any chance to successfully format, mount, read and write data to the flash memory about 60 times...

Conclusion

Probably, there is something bad happening with littlefs itself. It is very suspicious that only the first sector (4KiB) is used by littlefs. When it becomes full, erase operation is performed and the filesystem becomes corrupted. However, I do not definitively rule out errors beyond littlefs itself.

pawelstraszak commented 8 months ago

Close this issue, cfg->prog_size and cfg->read_size should be replaced with cfg->block_size.