littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
4.92k stars 774 forks source link

Minimizing latency #891

Open hattesen opened 8 months ago

hattesen commented 8 months ago

I have a large number of projects that use littlefs for storing configuration and logging.

Some projects use internal NAND flash, others use external (QSPI) NOR flash for storage.

Keeping worst-case write latency to a minimum will minimize the risk of buffer overruns and resulting data corruption.

Q1 In littlefs, what are the (normally occurring) situations that result in high latency during write operations?

Q2 Are there any general recommendations on how to reduce worst-case write latency in littlefs?

Q2: Is there any type of housekeeping that can be performed by an idle-loop task to reduce the worst-case write-latency of littlefs?

Thank you in advance for any guidance in this matter.

PS: I am in the process of building a USB MTP (Media Transfer Protocol) device handler for TinyUSB, allowing a "live" littlefs file system to be accessed by a USB host without risking file system corruption associated with accessing a "live" (FAT) file system using a USB MSD (Mass Storage Device) connection. It will provide great benefits allowing dynamically updating configuration files, downloading log files etc, completely transparently, from any USB Host (Although MacOS will need an MTP driver, as their default implementation is restricted to downloading photos and videos).

geky commented 8 months ago

Hi @hattesen, littlefs does have a couple cases where latency can spike. And to make things worse these latency spikes could be described as "extreme" due to poor scalability. This is something I'm working on improving[^1][^2] but it's how things are right now.

Causes of latency spikes:

  1. Metadata Compaction

    littlefs stores metadata in append-only logs that need to be periodically compacted. This scales $O(B^2)$ where $B$ is the configured block size.

    In theory you can decrease the compaction overhead by decreasing the block size, but this is of course limited by the block device's physical geometry.

    As a temporary workaround for scalability issues, metadata_max was added, which artificially limits how big metadata logs can get before being split. metadata_max can improve compaction overhead, but comes at a storage cost.

  2. Block allocation scans

    littlefs doesn't know exactly when blocks go out of scope. To know what blocks can be safely allocated, littlefs periodically scans the filesystem to populate a RAM-backed bitmap of known unused blocks called the "lookahead buffer". This scales $O(F)$ where $F$ is the size of the filesystem.

    You can decrease how often this scan needs to be done by increasing the size of the lookahead buffer. But this comes at a RAM cost, which runs counter to the goals of littlefs.

    Note: Even if the lookahead buffer contains the entire block device, littlefs still needs to periodically scan to update its knowledge of what blocks are unused.

    You can also decrease the cost of this scan by increasing the block size. Larger blocks means fewer blocks for a given filesystem. Unfortunately this makes write granularity worse and increases metadata compaction overhead.

Q2: Is there any type of housekeeping that can be performed by an idle-loop task to reduce the worst-case write-latency of littlefs?

This is the intention of the recently added lfs_fs_gc function. Calling lfs_fs_gc in your idle loop should move high latency tasks there as much as is currently possible.

Currently lfs_fs_gc only performs the block allocation scan, improving block allocation latency. It doesn't improve the metadata compaction latency yet. But the plan is to add any future janitorial work to this function.

[^1]: There is a new metadata data-structure in the works that should be able to reduce compaction overhead from $O(B^2)$ down to $O(B\log{B})$. [^2]: There is a plan to add an optional disk-backed block map that should remove the block allocation scan completely.

geky commented 8 months ago

PS: I am in the process of building a USB MTP (Media Transfer Protocol) device handler for TinyUSB, allowing a "live" littlefs file system to be accessed by a USB host without risking file system corruption associated with accessing a "live" (FAT) file system using a USB MSD (Mass Storage Device) connection.

That sounds quite neat! And what a weird Mac limitation... I guess you just need to name your firmware images firmware.png, they are "images" right? ;)

hattesen commented 8 months ago

PS: I am in the process of building a USB MTP (Media Transfer Protocol) device handler for TinyUSB, allowing a "live" littlefs file system to be accessed by a USB host without risking file system corruption associated with accessing a "live" (FAT) file system using a USB MSD (Mass Storage Device) connection.

That sounds quite neat! And what a weird Mac limitation... I guess you just need to name your firmware images firmware.png, they are "images" right? ;)

I believe it is part of Apple's "don't expose the file system of an iPhone" philosophy, which has caused every reputable camera manufacturer (and Google's Android File Transfer for android phones) to develop their own – more or less standards compliant – MTP driver for MacOS as part of a "connect your device to a Mac" kit.

Commercial applications also exist for the Mac: MacDroid

hattesen commented 8 months ago

Thank you so much for your detailed response!

  1. Metadata Compaction littlefs stores metadata in append-only logs that need to be periodically compacted. This scales O(B2) where B is the configured block size. In theory you can decrease the compaction overhead by decreasing the block size, but this is of course limited by the block device's physical geometry. As a temporary workaround for scalability issues, metadata_max was added, which artificially limits how big metadata logs can get before being split. metadata_max can improve compaction overhead, but comes at a storage cost.

Is there any way to force metadata compaction (at a time with no time-critical activities)? Any way to determine how soon a metadata compaction will be required (maybe on a 0.0~1.0 scale with 1.0 being NOW)?

  1. Block allocation scans ... This is the intention of the recently added lfs_fs_gc function. Calling lfs_fs_gc in your idle loop should move high latency tasks there as much as is currently possible. Currently lfs_fs_gc only performs the block allocation scan, improving block allocation latency. It doesn't improve the metadata compaction latency yet. But the plan is to add any future janitorial work to this function.

Well, that is taken care of, then :smile:

To what extent will calls to lfs_fs_gc() lock file system resources, potentially causing additional latency to the main file system activities?

Most of my applications run on ARM Cortex M0+ MCUs, conducted by a cooperative scheduler that intercepts the yield() method and thus also delay(millis) when running on top of an Arduino Core, allowing well-behaved libraries to busy-wait using those functions. I know I could check the source myself, but ... does a long running lfs_fs_gc() call yield() or delay(millis) at regular intervals? Otherwise I would have to move over to using a preemptive scheduler (FreeRTOS).

I assume reentrancy is not a problem when calling lfs_fs_gc(), but I have learnt not to assume anything when it comes to concurrency 😉.

Footnotes

  1. There is a new metadata data-structure in the works that should be able to reduce compaction overhead from $O(B^2)$ down to $O(B\log{B})$. ↩
  2. There is a plan to add an optional disk-backed block map that should remove the block allocation scan completely. ↩

Great - I'll have a rummage through those :smile:

hattesen commented 8 months ago

Sorry for the spam ...

I am currently working on a low cost data-logger, based on an RP2040 M0+ MCU, and would like to add an external Serial (SPI) NOR Flash to be used exclusively for a littlefs file system, filled with (rolling) log-files. An issue I can see blowing up in my face with that combination, is the long erase cycle time for NOR flash (30ms for a 4kB sector, 120ms for a 32kB block).

Is there any way for an application to "assist" littlefs by performing sector erases ahead of time, to minimize the risk that littlefs will ever have to perform a flash-erase itself, prior to a requested flash-program operation, and/or is that something that is already handled by lfs_fs_gc()

geky commented 8 months ago

Is there any way to force metadata compaction (at a time with no time-critical activities)?

Not at the moment. My thinking was to add this to lfs_fs_gc with an optional threshold. So you could configure lfs_fs_gc to compact metadata blocks that are more than x% full. But this is not implemented yet.

Any way to determine how soon a metadata compaction will be required (maybe on a 0.0~1.0 scale with 1.0 being NOW)?

No, though this is theoretically possible. Keep in mind there are potentially many metadata logs, so the result would be different from file to file.

I think the above threshold for lfs_fs_gc solves the use case you have in mind though.

To what extent will calls to lfs_fs_gc() lock file system resources, potentially causing additional latency to the main file system activities?

lfs_fs_gc locks and traverses the entire filesystem. So it does risk significant latency.

Thinking about it, lfs_fs_gc should probably set a flag so it only runs if anything has changed since the last lfs_fs_gc call. It currently doesn't do this...

I do have plans to add an incremental version of lfs_fs_gc, but it gets a bit tricky since any writes to the filesystem void any block allocation progress. The incremental API is also a bit low priority with everything else at the moment.

I assume reentrancy is not a problem when calling lfs_fs_gc(), but I have learnt not to assume anything when it comes to concurrency 😉.

You do need to define LFS_THREADSAFE and provide lock/unlock functions, but then everything should lock correctly. lfs_fs_gc does lock for the entire traversal though.

Most of my applications run on ARM Cortex M0+ MCUs, conducted by a cooperative scheduler that intercepts the yield() method and thus also delay(millis) when running on top of an Arduino Core, allowing well-behaved libraries to busy-wait using those functions. I know I could check the source myself, but ... does a long running lfs_fs_gc() call yield() or delay(millis) at regular intervals?

What I would suggest is putting yield in your block device's read/prog/erase functions. Waiting for bus transactions is by far the most costly part of any filesystem operation.

This wouldn't help lfs_fs_gc blocking though... I suppose there's a relatively simple enhancement where lfs_fs_gc could check for pending threads and cancel the current traversal. I'll put this on the TODO pile.

Sorry for the spam ...

No worries! It was perfect timing.

Is there any way for an application to "assist" littlefs by performing sector erases ahead of time, to minimize the risk that littlefs will ever have to perform a flash-erase itself, prior to a requested flash-program operation, and/or is that something that is already handled by lfs_fs_gc()

No. Unfortunately littlefs currently does not currently have a way to track erased blocks. This is a pretty heavily requested feature.

My current plan is to roll this up with the block map feature, since they are fairly related.

But it could be done without the block map by expanding each block in the lookahead buffer to two bits.

As you may have guessed by now, the current TODO list is a bit long...

geky commented 6 months ago

This should be improving soon: https://github.com/littlefs-project/littlefs/pull/913