By "luck" the previous code somehow managed to not be broken, though it was possible to traverse the same file twice in lfs_fs_traverse/size (which is not an error).
The problem was an underlying assumption in lfs_dir_get that it would never be called when the requested id is pending removal because of a powerloss. The assumption was either:
lfs_dir_find would need to be called first to find the id, and it would correctly toss out pending-rms with LFS_ERR_NOENT.
lfs_fs_mkconsistent would be implicitly called before any filesystem traversals, cleaning up any pending-rms. This is at least true for allocator scans.
But, as noted by @andriyndev, both lfs_fs_traverse and lfs_fs_size can call lfs_fs_get with a pending-rm id if called in a readonly context.
By "luck" this somehow manages to not break anything:
If the pending-rm id is >0, the id is decremented by 1 in lfs_fs_get, returning the previous file entry during traversal. Worst case, this reports any blocks owned by the previous file entry twice.
Note this is not an error, lfs_fs_traverse/size may return the same block multiple times due to underlying copy-on-write structures.
More concerning: If the pending-rm id is 0, the id is decremented by 1 in lfs_fs_get and underflows. This underflow propagates into the type field of the tag we are searching for, decrementing it from 0x200 (LFS_TYPE_STRUCT) to 0x1ff (LFS_TYPE_INTERNAL(UNUSED)).
Fortunately, since this happens to underflow to the INTERNAL tag type, the type intended to never exist on disk, we should never find a matching tag during our lfs_fs_get search. The result? lfs_dir_get returns LFS_ERR_NOENT, which is actually what we want.
Also note that LFS_ERR_NOENT does not terminate the mdir traversal early. If it did we would have missed files instead of duplicating files, which is a slightly worse situation.
The fix is to add an explicit check for pending-rms in lfs_dir_get, just like in lfs_dir_find. This avoids relying on unintended underflow propagation, and should make the internal API behavior more consistent.
This is especially important for potential future gc extensions.
By "luck" the previous code somehow managed to not be broken, though it was possible to traverse the same file twice in
lfs_fs_traverse/size
(which is not an error).The problem was an underlying assumption in
lfs_dir_get
that it would never be called when the requested id is pending removal because of a powerloss. The assumption was either:lfs_dir_find
would need to be called first to find the id, and it would correctly toss out pending-rms with LFS_ERR_NOENT.lfs_fs_mkconsistent
would be implicitly called before any filesystem traversals, cleaning up any pending-rms. This is at least true for allocator scans.But, as noted by @andriyndev, both
lfs_fs_traverse
andlfs_fs_size
can calllfs_fs_get
with a pending-rm id if called in a readonly context.By "luck" this somehow manages to not break anything:
If the pending-rm id is >0, the id is decremented by 1 in
lfs_fs_get
, returning the previous file entry during traversal. Worst case, this reports any blocks owned by the previous file entry twice.Note this is not an error,
lfs_fs_traverse/size
may return the same block multiple times due to underlying copy-on-write structures.More concerning: If the pending-rm id is 0, the id is decremented by 1 in
lfs_fs_get
and underflows. This underflow propagates into the type field of the tag we are searching for, decrementing it from0x200
(LFS_TYPE_STRUCT) to0x1ff
(LFS_TYPE_INTERNAL(UNUSED)).Fortunately, since this happens to underflow to the INTERNAL tag type, the type intended to never exist on disk, we should never find a matching tag during our
lfs_fs_get search
. The result?lfs_dir_get
returns LFS_ERR_NOENT, which is actually what we want.Also note that LFS_ERR_NOENT does not terminate the mdir traversal early. If it did we would have missed files instead of duplicating files, which is a slightly worse situation.
The fix is to add an explicit check for pending-rms in
lfs_dir_get
, just like inlfs_dir_find
. This avoids relying on unintended underflow propagation, and should make the internal API behavior more consistent.This is especially important for potential future gc extensions.
Related issue: https://github.com/littlefs-project/littlefs/issues/920 Found by: @andriyndev