littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
4.92k stars 774 forks source link

LFS takes time to mount #870

Open ajaybhargav opened 10 months ago

ajaybhargav commented 10 months ago

Hi @geky, I am facing an issue with LFS. when I mount LFS on a freshly formatted SPI NOR, everything works fine and fast. after few hours of continuous write operation, lfs_mount and lfs_fs_size takes time.

During mount, system goes into this while loop in lfs_rawmount function

https://github.com/littlefs-project/littlefs/blob/130790fa915d104b5ca19524e86d9618fdcac848/lfs.c#L4317

usually this might not be an issue on normal microcontroller system. However I am using this filesystem on an LTE module (with RTOS) where a loop can trigger WDT and I do not have control over WDT of LTE SoC. As a workaround if I add a sleep (to allow other tasks to run) in that while loop, system takes time but do not cause WDT reset.

same is the case with lfs_fs_size/traverse, the loop cause system to reset.

Can we add some system level hook for while loops, specially for RTOS based system where these loops prevent other task to run and can trigger WDT resets.

geky commented 10 months ago

Hi @ajaybhargav, thanks for raising an issue, this is an interesting problem.

I'm curious, do you know how many files there are in the filesystem, how many directories, and how many iterations the lfs_mount loop takes?

I have some plans in the works to provide an incremental version of traverse/garbage collection for concerns like this, but it wouldn't help lfs_mount as I didn't realize it could be a problematic bottleneck. lfs_mount only needs to visit each metadata log, whereas traverse/gc need to visit every block (and it's not super clear what an incremental lfs_mount API would look like).

In theory, if lfs_mount can trigger a watchdog, large files can as well. I realize this is application specific, but it may mean the lfs_mount loop isn't the best place for a watchdog reset.


Have you considered adding the sleep to the block device read/prog/erase functions? These are the "slow" functions when dealing with storage, since that's where the MCU twiddles its thumbs waiting for hardware, and usually the target for yield/sleep in async/coroutine systems.

ajaybhargav commented 10 months ago

Hi @geky

thanks for taking time to answer.

I'm curious, do you know how many files there are in the filesystem, how many directories, and how many iterations the lfs_mount loop takes?

File count is high, there are 5 directories and max number of files can go up to around 3000 (in total). each file is less than a block size (kept this size limit since I reported, file corruption issue during truncate). Keeping file under 4K block size improves overall application performance as well.

Have you considered adding the sleep to the block device read/prog/erase functions? These are the "slow" functions when dealing with storage, since that's where the MCU twiddles its thumbs waiting for hardware, and usually the target for yield/sleep in async/coroutine systems.

Not yet, but I can try (and could be a good option rather modifying lfs). I feel during mount, lfs mostly use read operation which is relatively faster. My first thought was to try it with sleep in main loop as mentioned which did fix the critical issue happened on field device. LTE modules do not give control over a lot of things, and you always have to work your way around the limitations of the system.

Adding a yield callback to lfs configuration will be a good one for RTOS systems, and lfs can utilize it wherever necessary. I would be happy to test anything if needed.