trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.21k stars 170 forks source link

Phantom zero-sized files, delayed updates & other weird behavior #1281

Closed fedy-cz closed 7 months ago

fedy-cz commented 10 months ago

Describe the bug

Recently I have been experiencing some weird behavior from mergerfs:

  1. There is a phantom zero-sized file showing in the directory listing based on the name of some existing directory, after a while it usually disappears. Often the file is "shown twice". In the latest instance of this problem I have two directories containing a space in the name: "Long Name - description" and "Long Name (1234) further description" and i got two zeros sized files in the listing: "Long Name" and "Long Name " (notice the space at the end) in addition to the directories being shown. If I remember correctly there were previously also instances where only the file was listed (and the directories went missing) - might be connected to 2.
  2. Mergerfs "freezes" and doesn't show changes made to the underlying file-system. After a while it recovers.

To Reproduce

I just tried to reproduce the behavior described in 1. and experienced 2. (the newly created directories are not shown at all).

I suspect the bug was introduced in 2.37 which introduced some concurrent behavior (readdir policies). Was testing them initially but once I encountered these problems I switched to func.readdir=seq, but that didn't help either.

System information:

Additional context

Same truncated filename is shown at the end of paths in the extended attributes user.mergerfs.relpath , user.mergerfs.fullpath & user.mergerfs.allpaths

I have multiple mergerfs mountpoints on the same system.

Could it have something to do with some internal filename parsing (truncated at space + 2 versions with at without it)? Also the - of ( symbol might be critical. Unable to test that theory right now (freeze).

fedy-cz commented 10 months ago

Additional info: Originally the file listing (ls -l) looked quite normal (files size 0). I tried calling stat on the phantom files (which failed), and after the file listing looks like this:

ls -l
...
-??????????  ? ?     ?              ?             ? Long Name
-??????????  ? ?     ?              ?             ? Long Name
...
trapexit commented 10 months ago

You have readdir caching enabled. Are you making changes out of band?

trapexit commented 10 months ago

Please file different tickets for different concerns. Combining them makes it harder to discuss.

fedy-cz commented 10 months ago

Please file different tickets for different concerns. Combining them makes it harder to discuss.

I highly suspect they are related. Haven't encountered 2. without the 1. so far.

fedy-cz commented 10 months ago

You have readdir caching enabled. Are you making changes out of band?

Not sure what you mean by "out of band". If you mean accessing / writing to one of the "merged" underlying file-systems directly then sure. Is that combination discouraged? Don't see any warnings about that in the docs.

trapexit commented 10 months ago

I suppose I could ensure the warning is plastered on every cache option but I do mention it https://github.com/trapexit/mergerfs#entry--attribute-caching ("As with the page cache these should not be used if the underlying filesystems are being manipulated at the same time as it could lead to odd behavior or data corruption.") and this is just a general truth of almost every cache ever. If you change things out of band then you will inevitably run into issues.

Phantom behaviors working their way out over time (the cache timeout probably) and "freezing" match up exactly with what a stale cache may show.

I would suggest disabling readdir cache and seeing if it continues. Or stop the out of band changes to see if it continues.

fedy-cz commented 10 months ago

Thank. Disabled the cache, hopefully it will be fine. I probably added the option during the mentioned update to 2.37 .

If I understand it correctly, in this case it is pretty much a kernel issue, and there is nothing that can be done on your side to detect this usage pattern & warn the user. The kernel cache invalidation is completely in it's hands, and until it notices some corruption or it's evicted for another reason fuse isn't consulted at all.

Regarding the docs: Maybe there could be just small disclaimer in the man page on those options. Something like: WARNING: Read the CACHING section first.

One thing which might have confused me (not being familiar with the fuse API) is what side of fuse/mergerfs is this cache option affecting. I probably thought kernel was keeping separate readdir cache for fuse clients (which apparently doesn't make much sense).

I set up mergerfs years ago and read the whole docs back then. It has been pretty much the ideal "set it up and forget it" scenario since then, but when you update the urge to look up what's new and tune it 'just a little bit better' here surely gets you in trouble :smile: .

Anyway: Thanks for an awesome project and sorry about a bogus bug report.

trapexit commented 10 months ago

The cache isn't "completely in it's hands" in that I can fire off invalidation notifications but that really isn't practical. mergerfs would have to keep track of everything it knows about and check it changed by polling the system or by leveraging inotify. The whole point of the cache is that the kernel no longer needs to send requests to mergerfs so those two options are all that are possible. inotify requires bespoke watches for every directory in question. If you had 10 branches that's 10 watchers for every directory. You run "find" on a path and that would be 10 * every path at the very least. It would add up pretty quickly.

One thing which might have confused me (not being familiar with the fuse API) is what side of fuse/mergerfs is this cache option affecting. I probably thought kernel was keeping separate readdir cache for fuse clients (which apparently doesn't make much sense).

I suppose I don't say it explicitly but the docs do say readdir caching is available only if the kernel supports it which suggests it is a kernel side feature. And yes, that is exactly what it does. Client app requests a directory opened. Kernel sends a request to mergerfs asking it to open a dir. mergerfs does so. If the cache readdir option is set it tells the kernel it can cache the results. A readdir later happens and mergerfs sends back the data which the kernel then caches if appropriate. From that point on it is out of the hands of mergerfs till it sends an invalidation message (which it never does) or the kernel decides otherwise.

I'll pad the docs with more warnings about caching. But lets first confirm that is the issue. Unfortunately, it isn't easy for me to replicate given the current info. If you straced mergerfs while this was occurring then we could confirm it since mergerfs would just not be seeing readdir requests when you ran ls or similar.

fedy-cz commented 10 months ago

keep track of everything it knows about and check it changed by polling the system or by leveraging inotify

Even that would probably lead to a possible data race when the readdir call happens after the modification but before mergerfs instructs the kernel to invalidate.

I just opened the docs for fanotify(7) which could possibly work with single instance per branch and maybe even intercept the calls to branches to prevent the data race, but that is probably too crazy ... Another option could be to do it in kernel with something like eBPF, but again, crazy ...

Will let you know if the issue re-emerges and remember to use the strace if it does. Thanks.

trapexit commented 10 months ago

Yeah. I've considered those but depending on the particular problem attempting to solve it would be a bit of a lift... and not cross platform. Not that I've kept good compatibility with BSD recently. It looks like in 5.1 they expanded on fanotify which IIRC was some of the reason I didn't consider it years ago. Perhaps I'll need to revisit it and see if there are possible uses.

There have been discussions of eBPF w/ FUSE for some time that could help in some of these situations too but it's been years since the conversations started and still nothing in the kernel so... like the passthrough work... just kinda waiting to see.

Will let you know if the issue re-emerges and remember to use the strace if it does.

Thanks