littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
4.9k stars 770 forks source link

Support get file path #976

Open crafcat7 opened 2 months ago

crafcat7 commented 2 months ago

I added two new interfaces to obtain the path of the file/folder.

int lfs_dir_path(lfs_t *lfs, lfs_dir_t *dir, char *path, lfs_size_t size);
int lfs_file_path(lfs_t *lfs, lfs_file_t *file, char *path, lfs_size_t size);

Mainly rely on a new internal interface, to a recursive search target

static lfs_ssize_t lfs_dir_path_(lfs_t *lfs,
         lfs_mdir_t *dir, uint16_t id, char *path, lfs_size_t size)
geky-bot commented 2 months ago
Tests passed ✓, Code: 17300 B (+1.4%), Stack: ∞ B (-100.0%), Structs: 812 B (+0.0%) | | Code | Stack | Structs | | Coverage | |:--|-----:|------:|--------:|:--|---------:| | Default | 17300 B (+1.4%) | ∞ B (-100.0%) | 812 B (+0.0%) | Lines | 2394/2616 lines (-1.5%) | | Readonly | 6194 B (+0.0%) | 448 B (+0.0%) | 812 B (+0.0%) | Branches | 1245/1602 branches (-0.9%) | | Threadsafe | 18216 B (+1.6%) | ∞ B (-100.0%) | 820 B (+0.0%) | | **Benchmarks** | | Multiversion | 17360 B (+1.4%) | ∞ B (-100.0%) | 816 B (+0.0%) | Readed | 29369693876 B (+0.0%) | | Migrate | 18996 B (+1.3%) | ∞ B (-100.0%) | 816 B (+0.0%) | Proged | 1482874766 B (+0.0%) | | Error-asserts | 18012 B (+1.5%) | ∞ B (-100.0%) | 812 B (+0.0%) | Erased | 1568888832 B (+0.0%) |
geky commented 1 month ago

Hi @crafcat7, thanks for creating a PR for this.

This is a clever implementation, unfortunately I don't think it makes sense to add this API to littlefs. I need to default to rejecting features without a strong motivation in order to combat feature-creep/code cost, so sorry if this comes across a bit harsh.

The first issue is this seems relatively easy to emulate outside of the littlefs core. When a user opens a file, they must know the file path, so if they want access to the file path later, they can store the file path next to the lfs_file_t handle.

The second issue is the use of lfs_fs_parent. lfs_fs_parent is actually a very expensive function, requiring a $O(n)$ scan where $n$ is the number of files in the filesystem. For this reason lfs_fs_parent is currently limited to cases where you may expect heavy IO (mdir compaction, dir creation/remove, etc).

There is also work in progress to move away from lfs_fs_parent because it does not work well. And I'm not sure there is a practical implementation of these functions without lfs_fs_parent (at least not without just keeping the path in RAM)...

I think when PC-scale OSs provide this feature, they are just returning the path stored in RAM inside the kernel.

yamt commented 1 month ago

I think when PC-scale OSs provide this feature, they are just returning the path stored in RAM inside the kernel.

some OSes including linux, yes.

some others implement it by performing readdir on ".." as it would be done for a naive getcwd() implementation. (such OSes usually don't support the operation for non-directory files.) i suspect such an implementation is impossible with littlefs though, because littlefs doesn't seem to have user-visible file-id to match the correct directory entry.

geky commented 1 month ago

some others implement it by performing readdir on ".." as it would be done for a naive getcwd() implementation. (such OSes usually don't support the operation for non-directory files.) i suspect such an implementation is impossible with littlefs though, because littlefs doesn't seem to have user-visible file-id to match the correct directory entry.

Oh interesting, that is clever.

It would also not work because littlefs doesn't really have ".." entries. They are just faked in the path parser.

I wonder if the lack of real ".." entries will become problematic when adding openat/*at functions...

yamt commented 1 month ago

I wonder if the lack of real ".." entries will become problematic when adding openat/*at functions...

yes. actually, it's problematic even for traditional chdir(). only way i can think of is something path-based, which is not simple to deal with renames.

btw, this get-path functionality "requirement" came from an attempt to implement F_SETLK style file lock. https://github.com/apache/nuttx/pull/11724 i guess what's really necessary there is some kind of file identifier, not necessarily a path. (actually a path is not so appropriate as it changes on a rename.) does littlefs have something usable for the purpose?

geky commented 1 month ago

i guess what's really necessary there is some kind of file identifier, not necessarily a path. (actually a path is not so appropriate as it changes on a rename.) does littlefs have something usable for the purpose?

Unfortunately, no. This is an area where littlefs diverges greatly from POSIX filesystems in that littlefs doesn't have inodes, so no unique ino, no hard-linking, etc...

This wasn't really an intentional design decision, just a side-effect of not having inodes as a requirement, unlike filesystem that start off in Linux, etc. It's in theory possible to extend littlefs to use inodes, but it would add a layer of indirection and complexity (and ino reclamation, bleh).

littlefs does map each file to an id internally, but this id can change if neighboring files are created/deleted. This is one reason we keep all open files in a linked-list, so we can keeps files up to date with any id changes. This id hasn't been exposed in lfs_stat because it could be confused for an ino without actually being useful...

btw, this get-path functionality "requirement" came from an attempt to implement F_SETLK style file lock. https://github.com/apache/nuttx/pull/11724

This is an interesting puzzle. I'm not really sure what the best solution is. Some thoughts:

You could keep the file path around, though as you mention handling renames would be tricky. You'd also need to worry about messy path comparisons (a/b/c == /a/b/./c?)...

You could use a custom attribute to assign each file a fake ino when needed. Though making sure each ino is unique would be difficult (ino reclamation, bleh). This solution would be a bit easier if we had a way to iterate over all files in the filesystem, which isn't an unreasonable API...

If you keep track of each locked file handle, and we add some sort of comparison API (either lfs_file_handlecmp-like or by exposing the metadata-id), you could compare against all locked files to find conflicting locks. This would be less efficient than a hashtable, but littlefs's opened-linked-list already scales $O(n)$ with the number of open files so... I think this is my favorite solution so far.

There is some planned work to make custom attributes with multiple open file handles more intuitive, by "broadcasting" custom attribute updates on lfs_file_sync. This almost sounds like what you need, except you probably don't want the locked flag to actually be written to disk... Maybe we should allow custom attributes to not be written to disk? Maybe this is the wrong tool for this...

yes. actually, it's problematic even for traditional chdir().

This is probably something more familiar to OS-level developers, but outside of shell emulators and backwards compatibility, does chdir() really make sense for applications? It seems like you'd want to prefer openat/*at relative to an open directory handle these days.

yamt commented 1 month ago

There is some planned work to make custom attributes with multiple open file handles more intuitive, by "broadcasting" custom attribute updates on lfs_file_sync. This almost sounds like what you need, except you probably don't want the locked flag to actually be written to disk... Maybe we should allow custom attributes to not be written to disk? Maybe this is the wrong tool for this...

how do you plan to find the broadcast destinations? it's what's exactly needed for F_SETLK. ie. find all open files (lfs_file_t) of a given on-disk file.

yes. actually, it's problematic even for traditional chdir().

This is probably something more familiar to OS-level developers, but outside of shell emulators and backwards compatibility, does chdir() really make sense for applications? It seems like you'd want to prefer openat/*at relative to an open directory handle these days.

in an ideal world, maybe. but in reality, as far as i know, chdir is ubiquitous and far commonly used than openat family.

xiaoxiang781216 commented 1 month ago

Hi @crafcat7, thanks for creating a PR for this.

This is a clever implementation, unfortunately I don't think it makes sense to add this API to littlefs. I need to default to rejecting features without a strong motivation in order to combat feature-creep/code cost, so sorry if this comes across a bit harsh.

This is a trend off, file name consume memory, saving file path for each open handle will increase the memory consumption if many files are opened.

geky commented 1 month ago

how do you plan to find the broadcast destinations? it's what's exactly needed for F_SETLK. ie. find all open files (lfs_file_t) of a given on-disk file.

There is an invasive linked-list going through all lfs_file_t and lfs_dir_t structs so we can keep these handles up-to-date.

I suppose we could expose this to the user, though the API would be complicated (we can't just say use the internals of lfs_file_t, switching to type-specific linked-lists was considered at one point for example).

Thinking about it, if you control the lfs_file_t structs, could you add an additional invasive linked-list outside of littlefs for the purpose of updating locks, etc?

in an ideal world, maybe. but in reality, as far as i know, chdir is ubiquitous and far commonly used than openat family.

Fair enough, guess adding openat will be a learning experience.

file name consume memory, saving file path for each open handle will increase the memory consumption if many files are opened.

Fair point. But if lfs_fs_parent is not an option, the only option is saving the file path. In this case it's preferable to save the file path outside of littlefs so not everyone needs to pay the cost.

Maybe an option is to 1. add an API to iterate over all files, and 2. add file handle comparison (metadata-id), so the lfs_fs_parent solution could be implemented outside littlefs. Though I would warn this is still going to be very expensive ($O(n)$ all files).

yamt commented 1 month ago

how do you plan to find the broadcast destinations? it's what's exactly needed for F_SETLK. ie. find all open files (lfs_file_t) of a given on-disk file.

There is an invasive linked-list going through all lfs_file_t and lfs_dir_t structs so we can keep these handles up-to-date.

are you talking about "lfs->mlist"? to "broadcasting" custom attribute updates, you need to find out which entries on the list to apply updates, don't you?

I suppose we could expose this to the user, though the API would be complicated (we can't just say use the internals of lfs_file_t, switching to type-specific linked-lists was considered at one point for example).

Thinking about it, if you control the lfs_file_t structs, could you add an additional invasive linked-list outside of littlefs for the purpose of updating locks, etc?

is there a way to determine if given two lfs_flie_t objects are referencing to the same on-disk file or not?

geky commented 1 month ago

are you talking about "lfs->mlist"? to "broadcasting" custom attribute updates, you need to find out which entries on the list to apply updates, don't you?

Yep.

The exact logic to update ids/pairs, though messy, is here: https://github.com/littlefs-project/littlefs/blob/d01280e64934a09ba16cac60cf9d3a37e228bb66/lfs.c#L2326

Extending this to also update custom attributes would not be too difficult. The only blocker is that WRONLY attributes are currently allowed to be const, which is a bit of a problem. So I've been waiting to address this until an eventual major version bump.

is there a way to determine if given two lfs_flie_t objects are referencing to the same on-disk file or not?

Not currently, but this isn't an unreasonable API to add. The file->m.pair + file->id fields uniquely determine each file (but may change on mutation).

Right now creating an API it is a bit tricky, since you need to both compare the block addresses with lfs_pair_cmp and compare the id. In some long-term ongoing work this is changing to a single uint32_t metadata-id which would be a bit easier to expose API-wise.

yamt commented 1 month ago

are you talking about "lfs->mlist"? to "broadcasting" custom attribute updates, you need to find out which entries on the list to apply updates, don't you?

Yep.

The exact logic to update ids/pairs, though messy, is here: https://github.com/littlefs-project/littlefs/blob/d01280e64934a09ba16cac60cf9d3a37e228bb66/lfs.c#L2326

Extending this to also update custom attributes would not be too difficult. The only blocker is that WRONLY attributes are currently allowed to be const, which is a bit of a problem. So I've been waiting to address this until an eventual major version bump.

is there a way to determine if given two lfs_flie_t objects are referencing to the same on-disk file or not?

Not currently, but this isn't an unreasonable API to add. The file->m.pair + file->id fields uniquely determine each file (but may change on mutation).

Right now creating an API it is a bit tricky, since you need to both compare the block addresses with lfs_pair_cmp and compare the id. In some long-term ongoing work this is changing to a single uint32_t metadata-id which would be a bit easier to expose API-wise.

ok. so, at least this api can be provided today, right? https://github.com/WebAssembly/wasi-filesystem/blob/e79b05803e9ffd3b0cfdc0a8af20ac743abbe36a/wit/types.wit#L576-L582

geky commented 1 month ago

ok. so, at least this api can be provided today, right? https://github.com/WebAssembly/wasi-filesystem/blob/e79b05803e9ffd3b0cfdc0a8af20ac743abbe36a/wit/types.wit#L576-L582

Yes, if this API is added to littlefs. But it's not currently exposed.

That's interesting that WASI did not just expose ino, I wonder what other filesystems motivated this.

yamt commented 1 month ago

That's interesting that WASI did not just expose ino, I wonder what other filesystems motivated this.

iirc it's motivated by some windows filesystems (ReFS?) which have 128-bit file ID.