littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
4.9k stars 771 forks source link

Zephyr OS little FS file system Write timeing issue #935

Open dhruvshah1997 opened 5 months ago

dhruvshah1997 commented 5 months ago

Hi I am working on one medical product using MT29F8G01ADBFD12 NAND flash(1 GB). I am using NRF52840 MCU and used Zephyr RTOS. I have ported a little FS file system on top of NAND flash. I can mount the file system, and create a file to the file system.

Now my requirement is to ECG data into Flash. Here ECG data is coming at a 10 ms rate. There is another data called IMU data which is coming at a rate of 5 ms—and temperature data at a rate of 1 second.

I have created 3 files, one for ECG, one for IMU and one for temperature. And append all data into respective files. I have taken 3 separate buffers to store 4096 bytes of data in the file system. So basically I have buffered 4096 bytes of data. Additionally, I am using a queue of 5 chunks(5 *4096 bytes). But still, some of the IMU and ECG data are missing, which means the file system append call will take a lot of time.

Please find my little FS file configuration:

const struct lfs_config cfg = {

    .context = NULL,
    .read = sh_lfsread,
    .prog = sh_lfsprog,
    .erase = sh_lfserase,
    .sync = sh_lfssync,

#ifdef LFS_THREADSAFE
    .lock = sh_lfslock,
    .unlock = sh_lfsunlock,
#endif

    .read_size = 4096,
    .prog_size = 4096,
    .block_size = 0x40000,
    .block_count = 4096,
    .block_cycles = 500, //500
    .cache_size = 4096,
    .lookahead_size = 4096,

};
geky commented 5 months ago

Hi @dhruvshah1997, thanks for opening an issue.

littlefs currently has some severe scalability issues, especially on "large" storage such as NAND flash. There are several open issues, but there are multiple contributing factors: https://github.com/littlefs-project/littlefs/issues?q=is%3Aopen+is%3Aissue+label%3Aperformance

This is being worked on, but it's not a simple thing to improve.

You may want to consider setting metadata_max to artificially limit how large metadata can get to workaround some of the scalability issues, but this trades off storage space: https://github.com/littlefs-project/littlefs/blob/f53a0cc961a8acac85f868b431d2f3e58e447ba3/lfs.h#L269-L273

But even then the throughput you're looking for may not be reachable with littlefs's current design.

CSC-Sendance commented 3 weeks ago

Hi,

Do you have more details on this issue? We have a similar need but use a Winbond W25N01G NAND flash via SPI. It has a prog / page size of 2048 byte with 128 kB erases. We did some benchmarks on appending a single file with 300 byte file-write calls for 5k, respectively 100k times, and 400k times (which amounts to nearly the entire 1Gbit chip) - the latter is still in progress (I'll post it as soon as I have the data).

It seems with increasing occupation (or file size?) higher (file-)write times get more frequent While the vast MAJORITY is in the ~ 7ms range for the 2k byte page/prog size.

LittleFS Parameters --> no metadata_max limit!:

const struct lfs_config cfg = {
    // block device operations
....
    // block device configuration
    .read_size = 2048,
    .prog_size = 2048,
    .block_size =  2048 * 64,
    .block_count = 1024,
    .block_cycles = 500,
    .cache_size = 2048, // "multiple of read and prog size, and a FACTOR of of block size"
    .lookahead_size = 2048,
    .compact_thresh = 0,
    .read_buffer = NULL,
    .prog_buffer = NULL,
    .lookahead_buffer = NULL,
    .name_max = 0,
    .file_max = 0,
    .attr_max = 0,
    .metadata_max = 0,
    .inline_max = 0};
Detailed NAND (LFS) benchmark without ECC, 5000x 300-byte buffer writes

![image](https://github.com/littlefs-project/littlefs/assets/102303875/2b2463a5-842e-4f5a-920f-5e1357d28683) ``` Min: 0.0 Max: 25.39 Mean: 1.0249369873974794 Std: 2.4980383884536765 95% CI: [0.9556655405754659 , 1.0942084342194929] ``` Outlier analysis over 10ms per write call: write call index to duration (ms): ``` 3 -> 25.39 873 -> 11.72 1747 -> 14.65 2621 -> 11.72 3495 -> 17.58 4368 -> 11.72 ``` ![image](https://github.com/sendance/grid-firmware/assets/102303875/31e105e0-4b71-4bef-8a6d-90a16f4d26b5)

Detailed NAND (LFS) benchmark without ECC, 100k x 300-byte buffer writes

![image](https://github.com/littlefs-project/littlefs/assets/102303875/ded0f8c2-b368-4692-868b-42021b70ebee) ``` Total: 101115.77999996635 Min: 0.0 Max: 29.3 Mean: 1.0111679116791168 Std: 2.458219120299975 95% CI: [0.9959316549089265 , 1.026404168449307] ``` Outlier analysis over 10ms per write call: write call index to duration (ms): ``` 3 -> 25.39 873 -> 11.72 1747 -> 14.65 2621 -> 11.72 3495 -> 17.58 4368 -> 11.72 5242 -> 14.65 6116 -> 11.72 6990 -> 20.51 7863 -> 11.72 8737 -> 14.65 9611 -> 11.72 10485 -> 17.58 11358 -> 11.72 12232 -> 14.65 13106 -> 11.72 13980 -> 23.44 14853 -> 11.72 15727 -> 14.65 16601 -> 11.72 17475 -> 17.58 18349 -> 11.72 19222 -> 14.65 20096 -> 11.72 20970 -> 20.51 ... 96987 -> 11.72 97861 -> 23.44 98734 -> 11.72 99608 -> 14.65 ``` ![image](https://github.com/littlefs-project/littlefs/assets/102303875/5791ba95-4df9-48bf-a93b-73394de04c72)

Detailed NAND (LFS) benchmark without ECC, 400k x 300-byte buffer writes

![image](https://github.com/littlefs-project/littlefs/assets/102303875/53c9f6ab-57de-40dd-863f-446cb5407941) ``` Total: 407096.9100003858 Min: 0.0 Max: 35.16 Mean: 1.0177448193620489 Std: 2.475232007985664 95% CI: [1.0100740944624653 , 1.0254155442616324] ``` Outlier analysis over 10ms per write call: write call index to duration (ms): ``` 3 -> 25.39 873 -> 11.72 1747 -> 14.65 2621 -> 11.72 3495 -> 17.58 4368 -> 11.72 5242 -> 14.65 6116 -> 11.72 6990 -> 20.51 7863 -> 11.72 8737 -> 14.65 9611 -> 11.72 10485 -> 18.55 11358 -> 11.72 12232 -> 14.65 13106 -> 11.72 13980 -> 23.44 14853 -> 11.72 15727 -> 14.65 16601 -> 11.72 17475 -> 17.58 18349 -> 11.72 19222 -> 14.65 20096 -> 11.72 20970 -> 20.51 ... 396687 -> 14.65 397560 -> 11.72 398434 -> 20.51 399308 -> 11.72 ``` ![image](https://github.com/littlefs-project/littlefs/assets/102303875/34f7dcd3-2df1-46ba-b96b-d05eec0c96cb)

Detailed NAND (LFS) benchmark without ECC, 5k x 2048 (page/read/prog-sized)-byte buffer writes

![image](https://github.com/littlefs-project/littlefs/assets/102303875/4da5e30c-f769-4c0b-88ed-3d3cbc3ed9a8) ``` Total: 37061.46999999942 Min: 6.84 Max: 26.37 Mean: 7.413776755351072 Std: 0.8442777623269407 95% CI: [7.390364648320866 , 7.437188862381277] ``` Outlier analysis over 10ms per write call: write call index to duration (ms): ``` 127 -> 11.72 255 -> 14.65 383 -> 11.72 511 -> 17.58 639 -> 11.72 767 -> 14.65 895 -> 11.72 1023 -> 20.51 1151 -> 11.72 1279 -> 14.65 1407 -> 11.72 1535 -> 17.58 1663 -> 11.72 1791 -> 14.65 1919 -> 11.72 2047 -> 23.44 2175 -> 11.72 2303 -> 14.65 2431 -> 11.72 2559 -> 17.58 2687 -> 11.72 2815 -> 14.65 2943 -> 11.72 3071 -> 20.51 3199 -> 11.72 ... 4607 -> 17.58 4735 -> 11.72 4863 -> 14.65 4991 -> 11.72 ``` ![image](https://github.com/littlefs-project/littlefs/assets/102303875/516fb38a-bade-4e81-b76b-3c2ec688cbc0)

It may be more insightful if I'd redo the tests with the page write size of 2kB but 300bytes was closer to our use-case. We (currently) do no other buffering or queueing - I may do that later though. I suppose you can also see that the longer write times (>10ms) are relatively rare and follow a distinct pattern, so there is definately something that can be optimizable ;)

--> i added a page-sized write-call benchmark with 5k sample. One can clearly see that write times spike on every second block. This may be related to the lfs_gc call? these time spikes also seem to increase in maximum time needed the larger file gets? limiting metadata_max to the suggested values did not have an effect.

edit: the >10ms writes definetly happen when the end of some block(s) is reached [and something else also happens?] (ca. every 873 write calls; it takes 6.8 samples to write a page (i.e. perform a prog call), and there are 64 pages in a block (6.8 * 64 = 436,9066, which makes 873 write calls 2 full blocks)

edit2: added the 440k x 300 bytes benchmark. Same behavior. Max write time is at around 35 ms now. Initial tests with 5k samples at 300 bytes with metadata_max at 8kB has shown literally no difference at all. Will test now with 100k samples.

edit3: setting metadata_max to 2048 also had no effect for 100k samples. Results are a very close reproduction of the metadata_max set to 0 (i.e. block size). Outliers are predictable at the block borders.

p.s.: Benchmarks were performed with FreeRTOS on an nrf52840

geky commented 1 week ago

Hi @CSC-Sendance, sorry for the late response.

My first thought was that you're now hitting lookahead buffer scans, but it's very interesting, and a bit wild, that you can clearly see the topology of the CTZ skip-list in your performance measurements.

It would be interesting to know the breakdown of read vs prog vs erase operations, this might be easy to measure with some simple global counters (example).

The best explanation I can think of is that this is simply the overhead necessary to build each CTZ skip-list node.

The way our skip-lists work is every $n$ th block contains $\text{ctz}(n)+1$ pointers. This forms the spiky pattern you are seeing, with the tallest spikes being limited to $\log_2 n$.

What may be happening is building each of these pointers with a 2KiB reads ends up spending a lot of time reading unused data to find the relevant block address.


If this is the case I'm not sure there's an easy solution that doesn't require significant changes to littlefs. Multi-block caching would help avoid redundant reads, but brings its own complexity and has been low priority. At the very least B-trees would avoid this bottleneck, which is one thing I'm currently experimenting with as a CTZ skip-list replacement.

You could try increasing the block size, since more blocks means fewer pointers. This would make metadata compaction performance worse, but that may just mean metadata_max is more useful.

CSC-Sendance commented 1 week ago

Hi,

No problem! We have internally finished this "project" for now since we came to the conclusion that LittleFS with a NAND flash would be more suitable for our use case than our current NOR + FAT approach, despite the occurring limitations. However, it will not make it into active development until we have fixed a new major hardware revision.

It would be interesting to know the breakdown of read vs prog vs erase operations, this might be easy to measure with some simple global counters (example).

I am still willing to help with this. If you wanna provide a modified LittleFS version that includes this (or a detailed description of what exactly you'd need) and the benchmark you'd like to receive, I can execute it for you.

... You could try increasing the block size, since more blocks means fewer pointers. This would make metadata compaction performance worse, but that may just mean metadata_max is more useful.

Interesting, but the maximum block size is defined by the flash itself (and its block erase operations etc.) or would you handle that more independently and just check for that in, e.g. the erase option and then perform, e.g., 2x erases if the "virtual" block size is doubled?

geky commented 1 week ago

Interesting, but the maximum block size is defined by the flash itself (and its block erase operations etc.) or would you handle that more independently and just check for that in, e.g. the erase option and then perform, e.g., 2x erases if the "virtual" block size is doubled?

Ah yeah. It's not well documented, but there's nothing stopping you from defining block_size as a multiple of the flash's actual erase size, and emulating the larger erase with multiple operations. This can be useful for working around issues with the block allocator scaling poorly with too many blocks, but usually also runs into issues with metadata compaction, metadata_max, etc.

I've been considering adding an additional erase_size config option, and doing this emulation littlefs-side, to make this easier for users, but haven't been sure if it's worth the code cost.

I am still willing to help with this. If you wanna provide a modified LittleFS version that includes this (or a detailed description of what exactly you'd need) and the benchmark you'd like to receive, I can execute it for you.

Thanks! But it was more for curiosity's sake if it was little extra work, to confirm the above assumption.

Long term, the plan is to move away from CTZ skip-lists, so I don't think this will lead to anything actionable short-term.