Open HariTheCoder341 opened 4 months ago
Hi @HariTheCoder341, thanks for creating an issue.
This is going to be difficult to debug. Is it possible to reduce this down into a locally-reproducible example? Preferably in a littlefs test case (example).
We also noticed that lfs_dir_relocatingcommit is called with pdir = NULL in three points, line 2433, 2494, 2545.
Are you sure you're using v2.9? I don't think these line numbers quite line up.
The pdir
argument here is actually a side-effect of lfs_dir_relocatingcommit
. It contains the previous mdir in the dir's linked-list if the mdir needs to be dropped (mdir.count=0
, LFS_OK_DROPPED). We need to find the pdir
in lfs_dir_relocatingcommit
to determine how to update the mlist, but we can't actually drop in lfs_dir_relocatingcommit
without recursion. A lot of the mess in these functions is tip-toeing around a flattened recursive algorithm.
But not every call to lfs_dir_relocatingcommit
can drop. This would not be solved by always providing pdir
, because the layer above would not be able to handle the resulting LFS_OK_DROPPED state correctly.
The Fixing move while relocating
message (here) is followed immediately by an lfs_dir_relocatingcommit
call with a delete tag, which is probably resulting a drop, which littlefs doesn't expect.
Hi @geky and thank you for your quick reply.
I confirm we are using 2.9.
I'm trying to reproduce the problem with a specific test function (written in c) in attach. With this function, the problem does no appear and we reach 725000 iterations without any reset, 1 day running. Our real process involve attributes, I don't know if it is correlated to... I don't think so. I will try now to enrich the test function, including appending at the end of the process, instead of removing, in order to mimic the real scenario.
We also tried to replace (in the real environment) the lfs_rename function with a custom function that perform a file copy. With this change, the problem does not occur.
I'll keep investigating and let you know...
Hi, In the past few days I tried to write a test to reproduce the problem, without success. The operations are the same as in the real application, but in a Linux environment they do not lead to the problem, which instead persists on STM32. I attach the test code.
I confirm that by replacing the lfs_rename with a copy function, which replicates the purpose, we have no reset after extensive and continuous testing for 10 consecutive days.
I also generated a log file with LFS_YES_TRACE active, but it exceeds 1.5 GB, so I don't think it can be attached here or that it can be of any help, I'm only publishing the final part.
It takes a day and a half for the problem to occur... not knowing how to proceed, we will comment out the lfs_rename in favor of the copy for now.
We are experiencing hardfault time to time in this scenario (while(true) loop). We have 3 directory created, "Records", "ToSend", "Archive"
Hardfault happens in function "lfs_dir_relocatingcommit" line 2226 hitted by LFS_ASSERT(pdir); because pdir is 0 (null). It happen here:
After reboot and during mount:
We also noticed that lfs_dir_relocatingcommit is called with pdir = NULL in three points, line 2433, 2494, 2545. Is this intentional?
Why could this problem happen and how can we avoid this?
But, if we mount the fs at the beginning and unmount it at each while cycle, it does not happen. EDIT: It happens slower... more iteration to get to the error
Thank you