littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
5.16k stars 793 forks source link

Clarify underlying flash device behavior model #354

Open jimmo opened 4 years ago

jimmo commented 4 years ago

I'm involved with the MicroPython project, which has just added support for littlefs as the on-device filesystem. We'd like to write some integration tests, which will include simulating power-loss events. Before I go down a long rabbit hole, it would be really good to get a clarification of the underlying model of the behavior a flash device that littlefs is assuming in order to make assertions about the resulting state.

MicroPython also allows arbitrary block devices, so we'd also like to understand what guarantees they need to provide in order to behave like how littlefs expects a flash device to work. (Beyond the basic stuff like erase vs write).

From my reading of the documentation in the repo, it would seem that the model is something along the lines of "any write operation will either succeed completely or at worst only modify the bytes currently being written, even if only a subset of a page is being written".

Conversely, an understanding of what circumstances littlefs can handle, and what is definitely out of the question. i.e. I don't expect that arbitrary block corruption should lead to a mountable filesystem (but should I expect that littlefs can always detect that corruption occurred?). Or more subtle, what if the entire block is corrupted during a power-loss event while a small (sub-page) write was in progress.

Thanks, and sorry if I've missed something in the docs. I looked at the tests too but mostly "power cycle" testing seemed to focus on clean unmount + remount cycles.

geky commented 4 years ago

Hi @jimmo, excited to see littlefs adopted in MicroPython! This is interesting timing since I am also working on revamping the testing in littlefs to more aggressively find power-loss bug.

There are some power-loss tests in Mbed OS, which is where littlefs originally came from. I think relying only on these tests was a mistake as they're pretty intertwined into the Mbed's Green Tea testing framework, which is a bit difficult to get into and requires hardware. As such those tests have kinda fallen by the wayside and only get ran occasionally. https://github.com/ARMmbed/mbed-littlefs/tree/master/TESTS/filesystem_recovery

The test strategy I like, and will be porting to the littlefs core tests, is to have a reentrant test case and simulate a program reset at every prog/erase operation in the test. Though if you have any other suggestions I'm all ears.


MicroPython also allows arbitrary block devices, so we'd also like to understand what guarantees they need to provide in order to behave like how littlefs expects a flash device to work.

The key thing here is that littlefs has high expectations for the block device:

On the flip side, littlefs can work with a large range of geometries. So you should make sure the geometry matches what power-loss guarantees your device can provide.

It's also fine for block device operations to resolve out-of-order, for example, if you have is a cache or background-thread for disk operations. This is where the sync callback becomes important, as this is how littlefs tells the block device when it needs all operations to be realized on disk. Note that SD cards may have a cache in hardware that should be flushed on sync.

The approach littlefs takes to provide power-resilience is that whenever the user requests a littlefs operation, it either happens, or it doesn't. If littlefs returns, the state on disk reflects the new state. If a power-loss occurs, the state on disk is reflects either the new state, or the old state.


Conversely, an understanding of what circumstances littlefs can handle, and what is definitely out of the question.

Oh boy, lets see how much of this I can answer. Exactly what littlefs can handle gets a bit subtle.

what if the entire block is corrupted during a power-loss event while a small (sub-page) write was in progress.

littlefs assumes this can't happen. If it does, prog_size should be increased to include the full area a program operation can impact, even temporarily.

I've yet to see a storage device where this strategy doesn't work, but would be interested to know if there is one.

I don't expect that arbitrary block corruption should lead to a mountable filesystem (but should I expect that littlefs can always detect that corruption occurred?).

There's a number of different types of corruption that can occur. It can happen because of flash wear, bugs in the filesystem, and external users touching the storage. Here's some forms of corruption littlefs can/can't/should handle:

  1. Write corruption This can happen if a flash block is bad or if there are issues with the storage bus.

    littlefs double checks all programs to disk with a sync followed by a read. So if anything is corrupted at write-time, littlefs will consider the erase block as "bad" and move the data residing on the block to a different block. If all blocks are bad, littlefs will report LFS_ERR_NOSPC.

    littlefs does not currently track bad blocks persistently, though this is a requested feature https://github.com/ARMmbed/littlefs/issues/316.

    A note on wear Surprisingly, wear corruption doesn't only occur at write time. The physics of flash involves shoving a bunch of electrons behind a sort of insulated wall that breaks down over time. Even new parts can leak a few electrons here and there.

    Wear manifests as a decrease in data retention as blocks are erased. Keep in mind it's a slow process, but it exists: image

    Graph from Toshiba's TC58 parts. I couldn't find any documents with actual units, but most NOR parts list ~20 years of data retention and ~100,000 erase cycles, so I think those are the bounds in that chart.

    littlefs does NOT handle wear-corruption by itself. It relies on the block device reporting concerning wear as bad blocks. This can be implemented on a block device by adding ECC, and reporting bad blocks when the number of bit errors exceeds a concerning amount.

    I did look into providing this sort of ECC monitoring directly in the filesystem. But it's not a good architectural match. It's much easier and more flexible to add ECC at the block device layer, and most NAND devices provide ECC in hardware.

    It's also worth considering if you really need protection from wear-corruption. The wear-leveling in littlefs will delay wear-corruption issues for as many blocks have valuable erase cycles, so the device will likely be at its end-of-life when wear-corruption starts being a problem.

  2. External corruption I would say this is any accidental corruption littlefs is not aware of until mount time. External users touching the block device, bugs in littlefs, loss of data retention. Anything that changes the filesystem to be unrecognizable.

    The goal is to at least detect this sort of corruption, report LFS_ERR_CORRUPT, and halt. But littlefs is not there yet. It may hit an internal assert, or in the worst case go careening into erroneous behavior.

    This is also some of the consideration behind revamping the test framework. Introduce some sort of fuzzing so we can build up better coverage for detecting corrupted filesystems.

  3. Malicious corruption The concept of protecting littlefs against a "malicious user" has come up several times before (especially when talking about when littlefs should reformat a disk). But I'm of this opinion this is out-of-scope for littlefs.

    At some point a malicious user can craft a near perfect littlefs image with tweaks that break either littlefs or some system at a higher level.

    If you need this sort of protection, the best solution is to encrypt/auth each block in your block device with some sort of CMAC or HMAC. Because littlefs doesn't rely on the intermediary state of erases, this can be done quite easily with a bit of block device addressing math and setting prog_size = the size of your cipher block.


Hopefully that helps answer some questions.