joltwallet / esp_littlefs

LittleFS port for ESP-IDF
MIT License
259 stars 97 forks source link

LittleFS file caching #18

Open atanisoft opened 3 years ago

atanisoft commented 3 years ago

I've encountered an odd case where the caching in LittleFS seems to be causing an issue that I'm not seeing on SD or SPIFFS.

Using a code pattern similar to below the second file handle will not see the data from the first file handle update:

uint8_t buf1[1] = {'a'};
uint8_t buf2[1] = {};
int fd1 = open("/fs/file.bin", O_RDWR);
assert(fd1 >= 0);
int fd2 = open("/fs/file.bin", O_RDWR);
assert(fd2 >= 0);
lseek(fd1, 0, SEEK_SET);
write(fd1, &buf1, sizeof(buf1));
fsync(fd1);
lseek(fd2, 0, SEEK_SET);
read(fd2, &buf2, sizeof(buf2));
assert(buf1[0] == buf2[0]);

Using two handles for the same file doesn't make a ton of sense in most cases, especially within the same function, but this pattern works on other FS types.

Any ideas or suggestions on config tweaks to reduce the chances of the two file handles being out of sync?

BrianPugh commented 3 years ago

investigating

BrianPugh commented 3 years ago

So I'm working on branch:

https://github.com/joltwallet/esp_littlefs/tree/multi-fd-sync

I added your code snippet as a unit test, and there is some non-determinism going on. It usually works the first time, but fails the second time. Trying to figure out if its a problem in my port, or if its an upstream issue.

atanisoft commented 3 years ago

Interesting, is it using a cache per file handle or is it per file in the FS? If it is per handle, is it possibly the caches are getting out of sync?

BrianPugh commented 3 years ago

so within my port, the function lfs_file_open will be called twice, and two unique file descriptors will be returned. This means there are also two independent lfs_file_t structures. I'm looking through upstream lfs code right now, and I really can't figure out why this works sometimes. I can't figure out why the first time this code snippet is ran, the (relatively) independent seconds lfs_file_t structure gets updated with the new file length. Still looking.

BrianPugh commented 3 years ago

operating under the assumption that upstream littlefs cannot handle the same file being open multiple times (waiting to hear back from their authors), I can add this functionality to my port relatively easily. However, in the easy way to implement this, opening the same file multiple times would return the fame small-int-value file descriptor. Would this be acceptable? For example:

uint8_t buf1[1] = {'a'};
uint8_t buf2[1] = {};
int fd1 = open("/fs/file.bin", O_RDWR);     # lets say, returns value 3
assert(fd1 >= 0);
int fd2 = open("/fs/file.bin", O_RDWR);     # also returns 3
assert(fd2 >= 0);
lseek(fd1, 0, SEEK_SET);
write(fd1, &buf1, sizeof(buf1));
fsync(fd1);
lseek(fd2, 0, SEEK_SET);
read(fd2, &buf2, sizeof(buf2));
assert(buf1[0] == buf2[0]);

close(fd1);           # each descriptor still needs to be closed, or really "close" just needs to be called the same number of times on any of the descriptors
close(fd2);
atanisoft commented 3 years ago

Perhaps track the lfs_file_t instance and return the same in your API as part of littlefs_vfs_open()? If that is what you are proposing I think it should be fine to return the same FD back to the caller (ESP VFS). The ESP VFS layer will assign a unique FD and translate it back to the VFS IMPL FD automatically.

I'd suggest tracking N usages so your VFS adapter holds onto the lfs_file_t until the last FD has been closed.

BrianPugh commented 3 years ago

that's exactly how I was going to implement it 👍 . I can work on that more this evening.

It seems simpler to have a count in the vfs_littlefs_file_t struct rather than having a unique fd and searching along the linked list if another instance of the same file is open. Actually, the more I think about it, it might not be too hard to just do this "properly" with unique file descriptors. I'll investigate that as well.

atanisoft commented 3 years ago

Once you have something ready to test I can test it in my application and see how it performs. If you have a CAN-USB adapter for your PC and a CAN transceiver you can test it locally yourself possibly as well.

BrianPugh commented 3 years ago

@atanisoft I implemented (and merged) the lazy solution where the same small-integer FD is returned. It required significantly less code change, and doing it the "proper" way may have some performance impact on people that really push the file system. Let me know if this solution works for you (it passes the unit test). Thanks!

See: https://github.com/joltwallet/esp_littlefs/pull/19

atanisoft commented 3 years ago

I put one comment on #19, but it won't impact my testing of the fix. I'll do some testing in the full setup and confirm there are no issues.

atanisoft commented 3 years ago

This appears to have fixed the caching issue I was seeing, more work will be needed in the dependency which is opening the same file multiple times so that it can be consolidated to a single file handle being used for all flows but that will take more work.

BrianPugh commented 3 years ago

I can't see your comment on the PR, are you sure you submitted it?

Also, after sleeping on it, I realized that this fix doesn't totally solve your issue, unless your usecase is exactly reflected in that snippet. With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...

atanisoft commented 3 years ago

With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...

Yeah, I was thinking of the same and was considering commenting on it as well.

I can't see your comment on the PR, are you sure you submitted it?

Somehow it got stuck in a Pending state...

BrianPugh commented 3 years ago

alright, I have a greater understanding of the upstream lfs codebase now, and I also realized my test would pass the first time due to artifacts left behind from a previous failed test. Now that I know nothing magical is going on behind the scenes (like the lfs file struct magically getting updated), i can tackle this a bit better.

BrianPugh commented 3 years ago

Alright here is my new attack plan:

  1. Create a new struct like:
typedef struct _ {
    lfs_file_t file;
    uint8_t open_count;
} vfs_littlefs_file_wrapper_t

With this new setup, every open gets a unique FD (as it should), but multiple FD's can be pointing back to the same vfs_littlefs_file_wrapper_t

  1. replace vfs_littlefs_file_t.file with a pointer to struct described in (1). Add a new attribute lfs_off_t pos into vfs_littlefs_file_t. This will keep track of this file descriptor's position in file.
  2. before every file operation, run lfs_file_seek(..., vfs_littlefs_file_t.pos, LFS_SEEK_SET) if is mismatches the lfs_file_t.pos attribute. This function just flushes some writes are in memory, and then sets the internal pos attrtibute, so it should be a very cheap operation.
  3. After every file operation, copy over lfs_file_t.pos to vfs_littlefs_file_t.pos.
  4. Like currently, on every close, decrement the open_count and deallocate the vfs_littlefs_file_wrapper_t when it reaches 0.

How does this plan sound?

Possible downsides:

  1. Each fopen now requires an additional malloc, which itself would increase memory usage per FD by 4bytes for the pointer, a few bytes (not sure how much overhead esp-idf's malloc has) for the additional malloc, and then 4 more bytes for the aligned open_count. All in all, lets say this would increase the memory overhead by 12 bytes per FD
atanisoft commented 3 years ago

I think overall it sounds good but do add some locking of some sort to ensure no two threads are directly modifying the lfs_file_t concurrently.

BrianPugh commented 3 years ago

all the above operations would be done while the filesystem mutex is obtained, so there should be no issues.

BrianPugh commented 3 years ago

@X-Ryl669 do you have any thoughts on this? You have a keen eye for these sorts of things.

X-Ryl669 commented 3 years ago

Honestly, this is a lot of change for a barely invalid usage pattern (so it adds a lot potential bugs and multitask issues). If I were you, I would:

  1. If the filename are stored in the structures (depends on some build flags, IIRC)
  2. Prevent opening an already opened file by searching for it. It returns -1 if you try to open an already opened file and set errno to EBUSY. This means adding a hash check in the opened hash table (this is an acceptable cost, IMHO). No bug, no multitask issue. I would prefer this one.

Please notice that the file might not be opened with the same attributes (for example, once in read mode and later in read/write mode). In that case, you can't reuse the file descriptor nor the file_t object. What could you do in that case ? Close the file and reopen with the new mode (or a combined mode) ? If you do that, suddenly a read only file descriptor will be allowed to write!

So unless it's fixed upstream, there is no valid solution that's not a hack like the proposed implementation above, IMHO.

atanisoft commented 3 years ago

Note that this is a perfectly valid use case, multiple file handles on the same file from different threads. It works on other FS types (including SPIFFS).

But you are right, the multiple file handles would potentially have different mode flags which would need to be accounted for in the read/write implementations.

X-Ryl669 commented 3 years ago

I never had the need for this. Anyway, I don't think the HAL should be the right layer to implement this if the underlying driver does not, since in all cases, it'll lead to bugs that are almost impossible to solve here (I'm not even talking about partial write, truncate and all the other oddities) Hopefully, Brian already reported the bug to LittleFS developers, so it'll likely be fixed at the right place IMHO, that is: the filesystem code. Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.

atanisoft commented 3 years ago

I never had the need for this.

Yeah, this would definitely be considered a corner case.

I don't think the HAL should be the right layer to implement this if the underlying driver does not

I suspect the underlying FS code does support this but it may be dependent on using the same lfs_file_t or enforcing some sort of cache synchronization method.

Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.

If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of. If there was a way to disable or reduce the cache size that may be a better option than throwing an error if you attempt to open the same file in multiple code paths.

X-Ryl669 commented 3 years ago

I suspect the underlying FS code does support this but it may be dependent on using the same lfs_file_t or enforcing some sort of cache synchronization method.

No, this is not possible, since the open option can be different, and the current position in the file will have to be different, and so on (think about ftruncate for example).

If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of.

I don't know about your project. I understand how simpler it seems not to care about the file path when running open. Yet, this is a wormhole hiding in this simplicity, and I really doubt SPIFFS is acting correctly in that case (and for sure FatFS does not, even if it seems to work for dumb test case).

In your example, it seems to work but it'll break as soon as the operation happens in different task unless you have a big lock/mutex for all your FS operations, but it's not how ESP-IDF is implemented since it would really hurt performance.

This is because underlying write to the flash is not atomic once it touches multiple pages. So you might have situation like Task X writes "A..B..C" and Task Y will read "A.." or "A..B" or "A..B..C" or the unexpected "A.. ..C" since the writing the B page is not finished yet. What happens if one task seeks to B and the other task truncate to A ?

Honestly, I don't know about your software, but if it relies on atomic operation on the FS, I don't think it'll work correctly anyway on ESP-IDF, whatever the FS.

BrianPugh commented 3 years ago

these are good points. Alright I think my action plan on this (for now) is to:

  1. Revert PR19 on master in the meantime, since its a half-ass attempt that doesn't really solve it.
  2. Continue working on PR 21. Its a mostly working solution right now, i just want to polish it up a bit and test it more. As @X-Ryl669 said, this won't properly handle modes. I could add that in in the same way I handled file position, but I also agree with the fact that these features shouldn't be added in my layer. That said, as I was going through the LFS codebase, i think the chances are slim that this is/will be properly supported at the littlefs level. I don't think files really know of each other at the root lfs layer.
  3. Reorganize (2) a little bit so we can enable/disable this feature.
atanisoft commented 3 years ago

The issue of atomicity is not a problem in the code in using and is working correctly on SPIFFS and FatFS (SD) as well as on other platforms (Linux mainly).

I'll take a look at PR 21, as long as it allows the file to be opened multiple times it should be fine. It can be on the consumer code to ensure atomicity is preserved across tasks if/when needed.

geky commented 3 years ago

Hello, I left some comments in the littlefs thread (https://github.com/littlefs-project/littlefs/issues/483#issuecomment-727597870). TLDR: littlefs files are transactional in the sense that each open file gets an independent snapshot of the file's contents.

What is the use case for opening the same file multiple times? Is it primarily a usability thing to keep track of multiple file positions?

atanisoft commented 3 years ago

@geky it's a small bit of a bug in a library I'm using but it works on all FS types except LittleFS from what I've tested so far. It would seem there should be a way to flush the caches even across multiple file handles so that one handle being written to can be read by another after a flush operation.

BrianPugh commented 3 years ago

So its hard to determine what the best path forward here is. There are multiple possible solutions, but they're all sort of hacky/weird.

geky commented 3 years ago

It is an exciting case where existing practices don't fit the embedded world well.

It's interesting to note that SPIFFS uses an internal file descriptor table: https://github.com/pellepl/spiffs/blob/9d12e8f47b0e611f277a0de8d39dc9089971ce20/src/spiffs.h#L385-L386

Also interesting to note Chan's FATFS prohibits this: http://elm-chan.org/fsw/ff/doc/appnote.html#dup

atanisoft commented 3 years ago

Yeah, it is an interesting issue for sure. FatFS on esp32 works though when using an SD for storage as long as you flush after writes (something we are fixing in the library as well). But there will be a couple cases that will still open two handles to the same file for different purposes with access being from different threads.

geky commented 3 years ago

~Oh, it looks like ESP is working around this in a similar manner to @BrianPugh's solution: https://github.com/espressif/esp-idf/blob/357a2776032299b8bc4044900a8f1d6950d7ce89/components/fatfs/vfs/vfs_fat.c#L30~

geky commented 3 years ago

Wait no I misread that. I wonder if this is actually an undefined area for FatFS and you risk data corruption if the two file descriptor's caches overlap each other.

atanisoft commented 3 years ago

It's possible in when using flash for storage but a bit unlikely on SD storage.

johnnytolengo commented 3 years ago

Hello @BrianPugh , I got an error compiling your multi-fd branch:

esp_littlefs.c:204:9: error: unknown field 'pwrite_p' specified in initializer seems that pwrite_p is for some reason duplicated from write_p. Removing .pwrite_p and .pread_p it compiles :)

what a pitty that's not documented.. I in fact I am hoping that this branch is already thread-safe for ESP32, specially when the same file is attemped to be open multiple times.

  1. Is it already thread-safe?
  2. what are the improvements from the default branch?

Thanks

BrianPugh commented 3 years ago

So i abandoned that effort; it was too big of a hack that didn't actually really solve the fundamental problem.

This littlefs driver is threadsafe, but you cannot safely write and read from the same file using multiple file descriptors.

johnnytolengo commented 3 years ago

ok, so if I well understand the master branch is threadsafe when you are accessing to multiple different files but it's not safe when you are accessing to the same file from 2 different cores at the same time. is that correct?

johnnytolengo commented 3 years ago

Hello @BrianPugh I was reading the esp_littlefs code for several hours, I am looking for some solution to avoid crashes when writing and reading the same file simultaneously. Any idea how to do that? some workaround for that, at least for now? Why you delete the "multi-fd-sync" branch? something was wrong with it?

Thanks

BrianPugh commented 3 years ago

It's an upstream littlefs limitation. See this upstream PR draft:

https://github.com/littlefs-project/littlefs/pull/513

I should just delete that branch, it was an attempt at a workaround, but it ended up just making a mess and not really solving the underlying problem

X-Ryl669 commented 3 years ago

Meanwhile, in this specific usage case, you can also do this:

  1. whenever you write to a file (or call fsync)
  2. Maintain a counter
  3. Walk the efs->cache to look for your file's name hash. Increment the counter upon find one
  4. If the counter is higher than 2, walk the efs->cache again and close all matching files, then reopen them. You might want to remember the current position in the file (via ftellg and restore it once opened).

This requires storing the file's path (so no CONFIG_LITTLEFS_USE_ONLY_HASH).

This scheme will obviously not work if you use ftruncate to reduce the size of the file smaller than any other reader's reading position.

johnnytolengo commented 3 years ago

@X-Ryl669 at first look your proposal sounds reasonable taking in mind that the MCUs will never have simultaneously too many files reading and writing. In my case is because I have a "core 1" task saving values in "logs.log" and another task in "core 0" web-service reading continuously that "logs.log" file. Without control of that overlapping the FS crashes and the only way to recover that is formatting it. I think that is a super common use case :)

BrianPugh commented 3 years ago

@johnnytolengo can you use a circular buffer in memory instead? Or possibly cycle through a few log files (log_0.txt, log_1.txt, etc) For a lot of filesystems (I'm pretty sure) the action you are describing is undefined.

X-Ryl669 commented 3 years ago

Append only mode is where littlefs is performing less efficiently. In your situation, I would either:

  1. Use a mutex that's preventing writing when a read is requested.
  2. Upon a read, open the file, read it completely and close it
  3. Release the mutex and let any pending write goes through

You could have a single file descriptor either too (and still use a mutex) so when a read is requested, ftell the current write position, fseek to the part to read, read and seek back to the write position.

On ESP32, we have plenty of memory, you can spare 64kb for the live logs, and as Brian says, swap the live buffer once it's full by writing all at once.

johnnytolengo commented 3 years ago

what a pity that littlefs is not file-descriptor safe :)
I think that a good alternative is to use some Ram buffer (in case that's available) to keep logs in othewise just use mutex to protect the FS while writing.

Seems that's a simple thing but wide complicated than anything else..

BrianPugh commented 3 years ago

just to be clear, this isn't an issue with the filesystem being threadsafe. The issue is having multiple independent file descriptors open at the same time pointing at the same file where at least one of them is writing.

johnnytolengo commented 3 years ago

well.. that's your point of view, in fact if you are running on different threads the filesystem is partial-safe, sadly the chances to corrupt it are very high.

BrianPugh commented 3 years ago

Could you be a bit more precise on how you think a corruption occurs? You can have multiple file descriptors in multiple threads, all doing writes, and everything will be fine so long as they aren't pointing to the same filename.

johnnytolengo commented 3 years ago

you are right, just with multiple file descriptors :) but we have to take care in case the same fd is used at the same time. Will be fine if the FS can store the paths for a while to check if that file is already taken by another task and at least return false or use lock() in order to wait until the file will be safetly available.

BrianPugh commented 3 years ago

You can share the same file descriptor across threads and it will work. Once again, the issue is multiple file descriptors pointing to the same file where at least one of them is writing.

johnnytolengo commented 3 years ago

mmm.. I never thought about that possibility but I have not clue how to use the same file descriptor in multiple threads. The only function I know that takes and return the fd is eg. File fdlog = fs.open("mylog.log"); how to share fdlog and what happens once the fdlog.close() in the another thread?

X-Ryl669 commented 3 years ago

The code that's creating the tasks should open the descriptor and close it. So you don't care about sharing here, since it's not the task responsibility to do that, and you'll be safe. You'll have something like:

main()
{
   fd = open("log.log");
   createTask(&task1Handle, task1, (void*)&fd);
   createTask(&task2Handle, task2, (void*)&fd);
   while(isRunning) {}
   close(fd);
}

Or you can use a mutex for the file operations:

int fd = -1;
Mutex m;

main()
{
   createTask(&task1Handle, task1, (void*)&fd);
   createTask(&task2Handle, task2, (void*)&fd);
   while(isRunning) {}
}

task1()
{
[...]
   {
   ScopeMutex scope(m);
   if (fd == -1) fd = open("whatever");
   // Process fd
   fd.read() or fd.write()
   }
}

task2()
{
[...]
   { 
   ScopeMutex scope(m);
   if (fd == -1) fd = open("whatever");
   // Process fd
   fd.read() or fd.write()
   fd.close(); fd = -1 
   }
}

In that case, the mutex protects the fd variable, not what's pointed by the descriptor (which has its own mutex/semaphore anyway).