Open atanisoft opened 3 years ago
investigating
So I'm working on branch:
https://github.com/joltwallet/esp_littlefs/tree/multi-fd-sync
I added your code snippet as a unit test, and there is some non-determinism going on. It usually works the first time, but fails the second time. Trying to figure out if its a problem in my port, or if its an upstream issue.
Interesting, is it using a cache per file handle or is it per file in the FS? If it is per handle, is it possibly the caches are getting out of sync?
so within my port, the function lfs_file_open
will be called twice, and two unique file descriptors will be returned. This means there are also two independent lfs_file_t
structures. I'm looking through upstream lfs code right now, and I really can't figure out why this works sometimes. I can't figure out why the first time this code snippet is ran, the (relatively) independent seconds lfs_file_t
structure gets updated with the new file length. Still looking.
operating under the assumption that upstream littlefs cannot handle the same file being open multiple times (waiting to hear back from their authors), I can add this functionality to my port relatively easily. However, in the easy way to implement this, opening the same file multiple times would return the fame small-int-value file descriptor. Would this be acceptable? For example:
uint8_t buf1[1] = {'a'};
uint8_t buf2[1] = {};
int fd1 = open("/fs/file.bin", O_RDWR); # lets say, returns value 3
assert(fd1 >= 0);
int fd2 = open("/fs/file.bin", O_RDWR); # also returns 3
assert(fd2 >= 0);
lseek(fd1, 0, SEEK_SET);
write(fd1, &buf1, sizeof(buf1));
fsync(fd1);
lseek(fd2, 0, SEEK_SET);
read(fd2, &buf2, sizeof(buf2));
assert(buf1[0] == buf2[0]);
close(fd1); # each descriptor still needs to be closed, or really "close" just needs to be called the same number of times on any of the descriptors
close(fd2);
Perhaps track the lfs_file_t
instance and return the same in your API as part of littlefs_vfs_open()? If that is what you are proposing I think it should be fine to return the same FD back to the caller (ESP VFS). The ESP VFS layer will assign a unique FD and translate it back to the VFS IMPL FD automatically.
I'd suggest tracking N usages so your VFS adapter holds onto the lfs_file_t
until the last FD has been closed.
that's exactly how I was going to implement it 👍 . I can work on that more this evening.
It seems simpler to have a count in the vfs_littlefs_file_t
struct rather than having a unique fd and searching along the linked list if another instance of the same file is open. Actually, the more I think about it, it might not be too hard to just do this "properly" with unique file descriptors. I'll investigate that as well.
Once you have something ready to test I can test it in my application and see how it performs. If you have a CAN-USB adapter for your PC and a CAN transceiver you can test it locally yourself possibly as well.
@atanisoft I implemented (and merged) the lazy solution where the same small-integer FD is returned. It required significantly less code change, and doing it the "proper" way may have some performance impact on people that really push the file system. Let me know if this solution works for you (it passes the unit test). Thanks!
I put one comment on #19, but it won't impact my testing of the fix. I'll do some testing in the full setup and confirm there are no issues.
This appears to have fixed the caching issue I was seeing, more work will be needed in the dependency which is opening the same file multiple times so that it can be consolidated to a single file handle being used for all flows but that will take more work.
I can't see your comment on the PR, are you sure you submitted it?
Also, after sleeping on it, I realized that this fix doesn't totally solve your issue, unless your usecase is exactly reflected in that snippet. With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...
With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...
Yeah, I was thinking of the same and was considering commenting on it as well.
I can't see your comment on the PR, are you sure you submitted it?
Somehow it got stuck in a Pending state...
alright, I have a greater understanding of the upstream lfs codebase now, and I also realized my test would pass the first time due to artifacts left behind from a previous failed test. Now that I know nothing magical is going on behind the scenes (like the lfs file struct magically getting updated), i can tackle this a bit better.
Alright here is my new attack plan:
typedef struct _ {
lfs_file_t file;
uint8_t open_count;
} vfs_littlefs_file_wrapper_t
With this new setup, every open
gets a unique FD (as it should), but multiple FD's can be pointing back to the same vfs_littlefs_file_wrapper_t
vfs_littlefs_file_t.file
with a pointer to struct described in (1). Add a new attribute lfs_off_t pos
into vfs_littlefs_file_t
. This will keep track of this file descriptor's position in file.lfs_file_seek(..., vfs_littlefs_file_t.pos, LFS_SEEK_SET)
if is mismatches the lfs_file_t.pos
attribute. This function just flushes some writes are in memory, and then sets the internal pos
attrtibute, so it should be a very cheap operation.lfs_file_t.pos
to vfs_littlefs_file_t.pos
.open_count
and deallocate the vfs_littlefs_file_wrapper_t
when it reaches 0.How does this plan sound?
Possible downsides:
malloc
, which itself would increase memory usage per FD by 4bytes for the pointer, a few bytes (not sure how much overhead esp-idf's malloc has) for the additional malloc
, and then 4 more bytes for the aligned open_count
. All in all, lets say this would increase the memory overhead by 12 bytes per FDI think overall it sounds good but do add some locking of some sort to ensure no two threads are directly modifying the lfs_file_t
concurrently.
all the above operations would be done while the filesystem mutex is obtained, so there should be no issues.
@X-Ryl669 do you have any thoughts on this? You have a keen eye for these sorts of things.
Honestly, this is a lot of change for a barely invalid usage pattern (so it adds a lot potential bugs and multitask issues). If I were you, I would:
Please notice that the file might not be opened with the same attributes (for example, once in read mode and later in read/write mode).
In that case, you can't reuse the file descriptor nor the file_t
object.
What could you do in that case ?
Close the file and reopen with the new mode (or a combined mode) ?
If you do that, suddenly a read only file descriptor will be allowed to write!
So unless it's fixed upstream, there is no valid solution that's not a hack like the proposed implementation above, IMHO.
Note that this is a perfectly valid use case, multiple file handles on the same file from different threads. It works on other FS types (including SPIFFS).
But you are right, the multiple file handles would potentially have different mode flags which would need to be accounted for in the read/write implementations.
I never had the need for this. Anyway, I don't think the HAL should be the right layer to implement this if the underlying driver does not, since in all cases, it'll lead to bugs that are almost impossible to solve here (I'm not even talking about partial write, truncate and all the other oddities) Hopefully, Brian already reported the bug to LittleFS developers, so it'll likely be fixed at the right place IMHO, that is: the filesystem code. Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.
I never had the need for this.
Yeah, this would definitely be considered a corner case.
I don't think the HAL should be the right layer to implement this if the underlying driver does not
I suspect the underlying FS code does support this but it may be dependent on using the same lfs_file_t
or enforcing some sort of cache synchronization method.
Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.
If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of. If there was a way to disable or reduce the cache size that may be a better option than throwing an error if you attempt to open the same file in multiple code paths.
I suspect the underlying FS code does support this but it may be dependent on using the same
lfs_file_t
or enforcing some sort of cache synchronization method.
No, this is not possible, since the open option can be different, and the current position in the file will have to be different, and so on (think about ftruncate for example).
If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of.
I don't know about your project. I understand how simpler it seems not to care about the file path when running open. Yet, this is a wormhole hiding in this simplicity, and I really doubt SPIFFS is acting correctly in that case (and for sure FatFS does not, even if it seems to work for dumb test case).
In your example, it seems to work but it'll break as soon as the operation happens in different task unless you have a big lock/mutex for all your FS operations, but it's not how ESP-IDF is implemented since it would really hurt performance.
This is because underlying write to the flash is not atomic once it touches multiple pages. So you might have situation like Task X writes "A..B..C" and Task Y will read "A.." or "A..B" or "A..B..C" or the unexpected "A.. ..C" since the writing the B page is not finished yet. What happens if one task seeks to B and the other task truncate to A ?
Honestly, I don't know about your software, but if it relies on atomic operation on the FS, I don't think it'll work correctly anyway on ESP-IDF, whatever the FS.
these are good points. Alright I think my action plan on this (for now) is to:
The issue of atomicity is not a problem in the code in using and is working correctly on SPIFFS and FatFS (SD) as well as on other platforms (Linux mainly).
I'll take a look at PR 21, as long as it allows the file to be opened multiple times it should be fine. It can be on the consumer code to ensure atomicity is preserved across tasks if/when needed.
Hello, I left some comments in the littlefs thread (https://github.com/littlefs-project/littlefs/issues/483#issuecomment-727597870). TLDR: littlefs files are transactional in the sense that each open file gets an independent snapshot of the file's contents.
What is the use case for opening the same file multiple times? Is it primarily a usability thing to keep track of multiple file positions?
@geky it's a small bit of a bug in a library I'm using but it works on all FS types except LittleFS from what I've tested so far. It would seem there should be a way to flush the caches even across multiple file handles so that one handle being written to can be read by another after a flush operation.
So its hard to determine what the best path forward here is. There are multiple possible solutions, but they're all sort of hacky/weird.
It is an exciting case where existing practices don't fit the embedded world well.
It's interesting to note that SPIFFS uses an internal file descriptor table: https://github.com/pellepl/spiffs/blob/9d12e8f47b0e611f277a0de8d39dc9089971ce20/src/spiffs.h#L385-L386
Also interesting to note Chan's FATFS prohibits this: http://elm-chan.org/fsw/ff/doc/appnote.html#dup
Yeah, it is an interesting issue for sure. FatFS on esp32 works though when using an SD for storage as long as you flush after writes (something we are fixing in the library as well). But there will be a couple cases that will still open two handles to the same file for different purposes with access being from different threads.
~Oh, it looks like ESP is working around this in a similar manner to @BrianPugh's solution: https://github.com/espressif/esp-idf/blob/357a2776032299b8bc4044900a8f1d6950d7ce89/components/fatfs/vfs/vfs_fat.c#L30~
Wait no I misread that. I wonder if this is actually an undefined area for FatFS and you risk data corruption if the two file descriptor's caches overlap each other.
It's possible in when using flash for storage but a bit unlikely on SD storage.
Hello @BrianPugh , I got an error compiling your multi-fd branch:
esp_littlefs.c:204:9: error: unknown field 'pwrite_p' specified in initializer
seems that pwrite_p is for some reason duplicated from write_p. Removing .pwrite_p and .pread_p it compiles :)
what a pitty that's not documented.. I in fact I am hoping that this branch is already thread-safe for ESP32, specially when the same file is attemped to be open multiple times.
Thanks
So i abandoned that effort; it was too big of a hack that didn't actually really solve the fundamental problem.
This littlefs driver is threadsafe, but you cannot safely write and read from the same file using multiple file descriptors.
ok, so if I well understand the master branch is threadsafe when you are accessing to multiple different files but it's not safe when you are accessing to the same file from 2 different cores at the same time. is that correct?
Hello @BrianPugh I was reading the esp_littlefs code for several hours, I am looking for some solution to avoid crashes when writing and reading the same file simultaneously. Any idea how to do that? some workaround for that, at least for now? Why you delete the "multi-fd-sync" branch? something was wrong with it?
Thanks
It's an upstream littlefs limitation. See this upstream PR draft:
https://github.com/littlefs-project/littlefs/pull/513
I should just delete that branch, it was an attempt at a workaround, but it ended up just making a mess and not really solving the underlying problem
Meanwhile, in this specific usage case, you can also do this:
efs->cache
to look for your file's name hash. Increment the counter upon find oneefs->cache
again and close all matching files, then reopen them. You might want to remember the current position in the file (via ftellg
and restore it once opened).This requires storing the file's path (so no CONFIG_LITTLEFS_USE_ONLY_HASH
).
This scheme will obviously not work if you use ftruncate
to reduce the size of the file smaller than any other reader's reading position.
@X-Ryl669 at first look your proposal sounds reasonable taking in mind that the MCUs will never have simultaneously too many files reading and writing. In my case is because I have a "core 1" task saving values in "logs.log" and another task in "core 0" web-service reading continuously that "logs.log" file. Without control of that overlapping the FS crashes and the only way to recover that is formatting it. I think that is a super common use case :)
@johnnytolengo can you use a circular buffer in memory instead? Or possibly cycle through a few log files (log_0.txt, log_1.txt, etc) For a lot of filesystems (I'm pretty sure) the action you are describing is undefined.
Append only mode is where littlefs is performing less efficiently. In your situation, I would either:
You could have a single file descriptor either too (and still use a mutex) so when a read is requested, ftell the current write position, fseek to the part to read, read and seek back to the write position.
On ESP32, we have plenty of memory, you can spare 64kb for the live logs, and as Brian says, swap the live buffer once it's full by writing all at once.
what a pity that littlefs is not file-descriptor safe :)
I think that a good alternative is to use some Ram buffer (in case that's available) to keep logs in othewise just use mutex to protect the FS while writing.
Seems that's a simple thing but wide complicated than anything else..
just to be clear, this isn't an issue with the filesystem being threadsafe. The issue is having multiple independent file descriptors open at the same time pointing at the same file where at least one of them is writing.
well.. that's your point of view, in fact if you are running on different threads the filesystem is partial-safe, sadly the chances to corrupt it are very high.
Could you be a bit more precise on how you think a corruption occurs? You can have multiple file descriptors in multiple threads, all doing writes, and everything will be fine so long as they aren't pointing to the same filename.
you are right, just with multiple file descriptors :) but we have to take care in case the same fd is used at the same time. Will be fine if the FS can store the paths for a while to check if that file is already taken by another task and at least return false or use lock() in order to wait until the file will be safetly available.
You can share the same file descriptor across threads and it will work. Once again, the issue is multiple file descriptors pointing to the same file where at least one of them is writing.
mmm.. I never thought about that possibility but I have not clue how to use the same file descriptor in multiple threads. The only function I know that takes and return the fd is eg. File fdlog = fs.open("mylog.log"); how to share fdlog and what happens once the fdlog.close() in the another thread?
The code that's creating the tasks should open the descriptor and close it. So you don't care about sharing here, since it's not the task responsibility to do that, and you'll be safe. You'll have something like:
main()
{
fd = open("log.log");
createTask(&task1Handle, task1, (void*)&fd);
createTask(&task2Handle, task2, (void*)&fd);
while(isRunning) {}
close(fd);
}
Or you can use a mutex for the file operations:
int fd = -1;
Mutex m;
main()
{
createTask(&task1Handle, task1, (void*)&fd);
createTask(&task2Handle, task2, (void*)&fd);
while(isRunning) {}
}
task1()
{
[...]
{
ScopeMutex scope(m);
if (fd == -1) fd = open("whatever");
// Process fd
fd.read() or fd.write()
}
}
task2()
{
[...]
{
ScopeMutex scope(m);
if (fd == -1) fd = open("whatever");
// Process fd
fd.read() or fd.write()
fd.close(); fd = -1
}
}
In that case, the mutex protects the fd
variable, not what's pointed by the descriptor (which has its own mutex/semaphore anyway).
I've encountered an odd case where the caching in LittleFS seems to be causing an issue that I'm not seeing on SD or SPIFFS.
Using a code pattern similar to below the second file handle will not see the data from the first file handle update:
Using two handles for the same file doesn't make a ton of sense in most cases, especially within the same function, but this pattern works on other FS types.
Any ideas or suggestions on config tweaks to reduce the chances of the two file handles being out of sync?