Open andriyndev opened 5 months ago
Oh wow, looks like you're right. Wild that this hasn't been caught until now.
I guess it helped that the fact that littlefs's directories are sorted hasn't been the best documented. Other filesystems may have wildly different orderings.
We're also lucky none of littlefs's internals currently require sorted order. Otherwise we'd probably be stuck with this weird ordering for backwards compatibility reasons...
Went ahead and created a PR here: https://github.com/littlefs-project/littlefs/pull/926
Thanks for reporting!
Are you sure it's still safe to change the sorting algorithm? From what I understand, on the existing filesystems files are already sorted on disk according to the old algorithm, and even after the fix they'll remain sorted according to it, and further files operations will lead to mixing when old files will remain sorted according to the old algorithm, while new and newly modified files will be sorted according to the new algorithm. Cannot it lead to some nasty bugs like inability to find an existing file because of wrong sorting (if it's taken into account during search). But if currently littlefs doesn't require sorted order at all, won't implementing such a feature lead to breaking the old filesystems where the incorrect ordering of at least some files persists.
As an option - apply the fix to the next major version, and fix the ordering of existing files through a migration process.
This is a valid concern. This change definitely deserves caution.
I was also too cavalier earlier. I thought fixing this would not be a problem since littlefs does not rely on ordering internally, ordering is more a convenience provided to users. But further investigation indicates this will be a problem a problem with the code as-is as we do rely on ordering for early loop termination[1].
It may still be possible to fix this on a disk-minor-version[2]:
Change directory lookup to not terminate early. File creation would then need to two passes, but runtime complexity would not change $O(n) \rightarrow O(n)
$.
This would allow littlefs to open files in directories of any ordering.
Bump the on-disk minor version lfs2.1 -> lfs2.2. This has happened before, but has caused migration pains for users. Maybe a second bump would be less of a pain due to learning from the first bump?
This would prevent breaking of old filesystems because old filesystems would reject the newer on-disk minor version. Though you can probably see how this can cause other headaches.
I realized after writing this that I think this is basically the "migration process" you mentioned.
I guess the question is, is this worth it? This may end up needing more user feedback.
[1] It's a bit indirect, ok it's very indirect, but:
lfs_dir_fetchmatch
keeps track of the smallest id that compares LFS_CMP_GT:
https://github.com/littlefs-project/littlefs/blob/3513ff1afc1d67adb2e6f492f0b9bc0d798fcb0d/lfs.c#L1275-L1279lfs_dir_fetchmatch
it returns LFS_ERR_NOENT if any LFS_CMP_GT ids were found:
https://github.com/littlefs-project/littlefs/blob/3513ff1afc1d67adb2e6f492f0b9bc0d798fcb0d/lfs.c#L1337-L1340lfs_dir_find
terminates early on LFS_ERR_NOENT, while lfs_dir_fetchmatch
updates both the mdir and the id. Since we terminate early, lfs_dir_find
may miss later entries if the ordering changes:
https://github.com/littlefs-project/littlefs/blob/3513ff1afc1d67adb2e6f492f0b9bc0d798fcb0d/lfs.c#L1523-L1525[2] littlefs has a few more release options than other libraries because, as you noticed, storage is unforgiving:
Disk-major-versions are very disruptive, and unless we have other major changes this fix would not be worth a disk-major-version on its own.
OK. I think the fix should be postponed to the next major disk version, and be applied together with them.
The function uses
lfs_bd_cmp()
which, from what I understand, compares data on disk with an arbitrary data passed as argument, and returns:LFS_CMP_LT
- the data on disk is less than the arbitrary dataLFS_CMP_GT
- the data on disk is greater than the arbitrary dataLFS_CMP_EQ
- the data on disk and the arbitrary data are equalIf the function returns
LFS_CMP_EQ
, we perform an additional comparison:This comparison looks suspicious because it returns
LFS_CMP_LT
is the arbitrary data (not the data on disk) is shorter than the data on disk. Maybe it was intended this way but just in case decided to report.