Proactive garbage collection discussion.

evpopov commented 1 year ago

... as mentioned in #783 Also, as @GaryJSA made a note that #610 may be addressing something similar...

I have implemented a garbage collector that aims at substantially speeding up LittleFS. I'm starting this thread in order to figure out what would be the best way to share my code with the community. I'm currently considering creating a PR, but if the maintainers and/or the community don't want any of that, please let me know and I will simply share my code outside of this repo with whoever is interested.

General Description

The slowest part of using LittleFS is waiting for blocks to be erased, so this is what I'm trying to optimize. I create a cache that holds the state of all blocks. A block can be:

used
unused
- erased
- dirty

Having this information allows the erase() callback to return with success immediately if it was called to erase a block that it already know is clean thus saving substantial amounts of time. Unused-and-dirty blocks can be located and erased in the background and as long as the application saves data slower than the garbage collector cleans up unused blocks, the erase() can always return immediately.

Implementation Details

Please keep in mind that I am very new to this codebase and so my solution is completely parallel to LittleFS's code and does not involve any changes to the code that I'm so new to. That also means that my solution can probably be optimized, but more on that later. I have created 2 bitmaps, each of which is of size (lfs->cfg->block_count + 7) / 8. This allows me to keep state information for every block in the file system. The first bitmap holds the "used" blocks. The second bitmap holds the "clean" blocks. A clean block is a block that is not currently used by LittleFS and the garbage collector has verified that the block is completely erased. If a block is neither "used" nor "clean", it is considered "unused and dirty". Upon successful mounting of the file system, my code does the following:

Initializes the "used" bitmap by traversing the filesystem and recording which block is used.
Initializes the "clean" bitmap by marking all blocks as "dirty"
Goes over all blocks of the file system. If a block is already flagged in the "used" bitmap, nothing is done. If the block is not flagged in the "used" bitmap, the block is read and verified. If the entire block reads 0xFF, that block number is flagged as clean in the "clean" bitmap.
Next, the looks at lfs->free.off to find out which block will be the next one that LittleFS will attempt to use. The garbage collector is instructed to start from there.
I run the garbage collection code X times to ensure that there are at least a few verified clean blocks standing at the ready for LittleFS to use them.
At this point I consider the file system fully initialized and allow the rest of my system to start up.

The garbage collection code itself is pretty straight forward. It has a counter that represents the block to be examined. When called, the code checks if the current block is used. If so, nothing else is done and the counter is incremented and wrapped around if it exceeds the number of blocks. If the current block is not flagged in the "used" bitmap, the block is read and verified. If the block holds any non-0xFF data it is erased. Finally, the block is marked as clean in the "clean" bitmap.

That garbage collection function is called from a very low priority task a few times a second until it has gone through all blocks. After the initial traversal on mount(), keeping track of newly used blocks is easily handled in the write() callback: Whatever block LittleFS writes to is added to the used bitmap. Every once in a while the code also traverses the file system to un-mark blocks that are no longer used. This is something I'd like to improve upon someday, especially if someone familiar with the codebase can help me in figuring out when LittleFS stops using a block. Having a callback for this will nullify the need for run-time traversal of the filesystem for the purpose of finding blocks that are no longer used.

The performance improvement of this modification is substantial. I'm using a W25Q128JV QSPI NOR flash. I used to get ~75kB/s of write speed. After implementing this code, the write speed goes up about 2.5 times and I don't remember the exact numbers but was very close to the actual max write speed of the chip. Of course if one keeps writing and outruns the garbage collector, eventually, the erase() function runs out of verified clean blocks and has to wait for blocks to be erased causing the write speed to slow back down to ~75kB/s.

Questions

First of all, are the maintainers interested at all in this? If so....
Where should I add the code? Should I add a separate file like maybe lfs_contrib.c or should I add my code to lfs.c? A side-note to that is that none of my code needs to be in lfs.c so I can keep it clean if this is desired. The benefit of adding my code to lfs.c is that this way, I can integrate the allocation and free-ing of the bitmaps in the usual init/deinit functions.

geky commented 1 year ago

Hi @evpopov, thanks for creating a discussion.

I think based on feedback alone it would be well worth adding this sort of feature.

I mentioned in https://github.com/littlefs-project/littlefs/issues/783 that this solves half the problem with garbage in littlefs. The other half being metadata compaction. But don't let that stop you as these don't need to come in at the same time. Garbage collection is probably easy to improve over time without an API change.

It sounds like you've already managed to get this system working outside of littlefs. If you're interested in sharing it never hurts to throw the code in a repo with an MIT/BSD/etc license if you think other people will find it useful. (I really need to get on top of making a place for links to community projects...)

It sounds like it would also be valuable to bring this into mainline littlefs, but in an effort to keep the codebase small and stable that may be slower.

This proposal sounds quite similar to how the current "lookahead buffer" works, by using a bitmask to keep track of the next $n$ unused blocks. Which is a good sign as it may fit in quite easily into littlefs: https://github.com/littlefs-project/littlefs/blob/6a53d76e90af33f0656333c1db09bd337fa75d23/lfs.c#L557

Ok so the main concern: determining that a block has been erased by checking if it reads all 0xffs.

This is currently not a requirement for littlefs. The way littlefs views erase is that it puts the block into an "unknown-but-consistent" state until it is programmed. This nicely supports things such as encryption and non-flash storage.

It would be nice if garbage collection could maintain this property somehow.

Some ideas:

Store if a block is erased in a RAM-backed bitmap. Assume unerased during mount and erase before placing any blocks on the "erased" bitmap.

This is a notable risk of leading to excessive erases to the first couple unused blocks if a device is mounted often without being written to.

Maybe the block device implementation could perform the check for 0xffs in the erase function itself. Though this still has erase risk for simpler block device implementations / non-0xff block devices.
Store if a block is erased in a disk-backed bitmap persisted across mounts. This could be accomplished by storing such a bitmap in littlefs "gstate", perhaps as a pointer to a normal CoW file acting as the bitmap.

The main issue being that this would be quite a bigger undertaking. But an on-disk bitmap would open the doors for future garbage collector improvements.

I think this is the way to go long-term, but I understand it would be quite a bit of additional work.

Some other notes:

Every once in a while the code also traverses the file system to un-mark blocks that are no longer used. This is something I'd like to improve upon someday, especially if someone familiar with the codebase can help me in figuring out when LittleFS stops using a block. Having a callback for this will nullify the need for run-time traversal of the filesystem for the purpose of finding blocks that are no longer used.

It's interesting you mention this. LittleFS itself doesn't really know when it stops using a block. This strategy of traversing the filesystem is how the current block allocator works, and it's an open question if it is the best solution.

On one hand it's expensive (https://github.com/littlefs-project/littlefs/issues/75). On the other hand, by leaning heavily into garbage collection for block allocation, littlefs can avoid quite a bit of bookkeeping and 1. save code size and 2. be more robust and not "leak blocks" if there are power-loss issues.

This question hangs unanswered at the moment and needs more investigation. Accepting garbage collection for block allocation would also open the door for some interesting novel features (https://github.com/littlefs-project/littlefs/pull/557).

So, uh, traversing the filesystem for block usage is the best option at the moment.

I used to get ~75kB/s of write speed. After implementing this code, the write speed goes up about 2.5 times and I don't remember the exact numbers but was very close to the actual max write speed of the chip. Of course if one keeps writing and outruns the garbage collector, eventually, the erase() function runs out of verified clean blocks and has to wait for blocks to be erased causing the write speed to slow back down to ~75kB/s.

Very interesting to see these numbers!

The benefit of adding my code to lfs.c is that this way, I can integrate the allocation and free-ing of the bitmaps in the usual init/deinit functions.

If added to lfs.c make sure to enable these buffers to be user provided as static buffers similar to the current lookahead_buffer in lfs_config: https://github.com/littlefs-project/littlefs/blob/6a53d76e90af33f0656333c1db09bd337fa75d23/lfs.h#L240-L243

...

Merging both bitmaps together into a 2-bit map would allow an additional state (0=unused, 1=used, 2=erased, 3=?) which could be used to mark "bad blocks". This is a request in another issue somewhere. We should look into this if the on-disk option is pursued.

...

There is a whole other can of worms around how garbage collection will interact with multithreaded systems. The traversal is one of the most expensive read-only operation in littlefs, and it makes sense that in certain circumstances you may want to be able to interrupt a long-running garbage collection to write urgent data (@TjoBitDk's use case in https://github.com/littlefs-project/littlefs/issues/783#issuecomment-1484628381 being an excellent example).

Though this may be out-of-scope for a first implementation and can be added later.

Sorry about late/long post, there's a lot to unpack on this topic.

geky commented 11 months ago

I think relevant is lfs_fs_gc should be able to compact metadata soon: https://github.com/littlefs-project/littlefs/pull/913

After this the only missing garbage-collection is persistent block allocation, but that will require a new on-disk data structure.

littlefs-project / littlefs