Open sgielen opened 3 years ago
The issue seems to be resolved when I sort files returned from a readdir
(via calls to fill
):
diff --git a/client.go b/client.go
index b7cab35..bc220ae 100644
--- a/client.go
+++ b/client.go
@@ -53,6 +53,7 @@ import (
"math/rand"
"os"
ospath "path"
+ "sort"
"strings"
"sync/atomic"
"time"
@@ -198,7 +199,15 @@ func (f *Mount) Readdir(path string, fill func(name string, stat *fuse.Stat_t, o
}
time.Sleep(time.Millisecond * time.Duration(rand.Intn(200)))
- for fn, f := range ret.Files {
+ var files []string
+ for fn, _ := range ret.Files {
+ files = append(files, fn)
+ }
+
+ sort.Strings(files)
+
+ for _, fn := range files {
+ f := ret.Files[fn]
attr := fuse.Stat_t{}
attr.Size = f.Size
if f.IsDirectory {
Normally, the issue was visible at least about once every 20-30 runs of ls
, depending on Readdir duration and load etc, but now it's run about 500 times without the issue showing... Also, the weird unexpected Stats of "Tsttst/\x98\x03"
and similar seem to have stopped with this patch.
Perhaps there's an incorrect assumption in cgofuse or macfuse that the readdir results are sorted, or at least in a consistent order? That didn't hold in my case, since a go Map sorting isn't consistent...
How are you using the ofst
argument when calling fill
? Do you pass 0
or alternatively do you maintain stable offsets to your directory entries?
How are you using the
ofst
argument when callingfill
? Do you pass0
or alternatively do you maintain stable offsets to your directory entries?
Hi @billziss-gh, thanks for getting back to me!
I always pass 0 for ofst when calling fill. You can find the code here: https://gist.github.com/sgielen/b229ded71f5946fc242bfb6e75e08da6#file-client-go-L211 (this is without the above patch applied, so still the ‘broken’ version.)
Do you ever get a non-0 offset in Readdir
? (You should not if you always pass 0
to fill
.)
BTW, one other thing I noticed is that you do not implement Opendir
. I am wondering if this somehow causes the problem.
Do you ever get a non-0 offset in
Readdir
? (You should not if you always pass0
tofill
.)
I logged this for a while, and saw no non-0 offsets given to Readdir, even as the issue appeared. I will try again, while also adding an implementation of Opendir and see if that makes a difference.
The non-0 offsets to Readdir
indicate that something is wrong. If you only pass 0 to fill
You should not see non-0 offsets in Readdir
. (This is true under libfuse (including the version in OSXFUSE) and WinFsp. I note that it may not be true in OpenBSD. See #49.)
I can confirm that even with Opendir implemented, the issue still occurs. Readdir always has ofst=0
. I tested with the following patch applied on top of my Gist:
diff --git a/client.go b/client.go
index b7cab35..eaab5d0 100644
--- a/client.go
+++ b/client.go
@@ -184,12 +184,21 @@ func (f *Mount) Getattr(path string, attr *fuse.Stat_t, fh uint64) int {
return 0
}
+func (f *Mount) Opendir(path string) (int, uint64) {
+ log.Printf("Opendir %q", path)
+ return 0, atomic.AddUint64(&f.lastHandle, 1)
+}
+
func (f *Mount) Readdir(path string, fill func(name string, stat *fuse.Stat_t, ofst int64) bool, ofst int64, fh uint64) int {
/*start := time.Now()
defer func() {
log.Printf("Readdir %q took %d ms", path, time.Since(start).Milliseconds())
}()*/
+ if ofst != 0 {
+ log.Fatalf("ofst in Readdir was not 0, but %d\n", ofst)
+ }
+
path = stripPath(path)
ret := vfsReaddir(path)
if ret == nil {
The log.Fatalf
was not triggered. I did see output such as:
21:45:04.596330 client.go:188: Opendir "/Tsttst/EEEEE EEEE/aaaaaa aaaaaa (0000)"
21:45:04.653208 client.go:188: Opendir "/Tsttst/KKKKKK KKKKKKKK.KKKKKKKKK"
21:45:04.655606 client.go:188: Opendir "/Tsttst/GGGGGG"
21:45:04.657734 client.go:172: Invalid stat "Tsttst/\x16" (2), returning ENOENT
21:45:04.687663 client.go:188: Opendir "/Tsttst/KKKKKK KKKKKKKK.KKKKKKKKK/ccccccccccc.ccccccccc"
21:45:04.687912 client.go:188: Opendir "/Tsttst/KKKKKK KKKKKKKK.KKKKKKKKK"
21:45:04.726393 client.go:188: Opendir "/Tsttst/UUU UUUUU UUUUUUUU"
confirming that the changes were applied. The issue is visible in the output of another terminal running while :; do ls test/Tsttst | wc -l; sleep 1; done
:
25
25
16 # <--
25
25
25
[...]
25
25
25
25
24 # <--
20 # <--
25
25
You said:
Perhaps there's an incorrect assumption in cgofuse or macfuse that the readdir results are sorted, or at least in a consistent order? That didn't hold in my case, since a go Map sorting isn't consistent...
Cgofuse is a thin layer over the local platform's FUSE library and therefore does not (should not) make assumptions about directory entry order. There may be a problem in macfuse although I have never observed it (but in general my file systems return directory listings in consistent order).
I note that in your file system every time a process issues opendir
and readdir*
it will get different results (the notation readdir*
is taken to mean issue readdir
0 or more times in a loop). Likewise if it issues opendir
, readdir*
, rewinddir
, readdir*
. Perhaps this confuses a particular component.
21:45:04.657734 client.go:172: Invalid stat "Tsttst/\x16" (2), returning ENOENT
Did you really access a file name with an \x16
character in it?
You said:
Perhaps there's an incorrect assumption in cgofuse or macfuse that the readdir results are sorted, or at least in a consistent order? That didn't hold in my case, since a go Map sorting isn't consistent...
Cgofuse is a thin layer over the local platform's FUSE library and therefore does not (should not) make assumptions about directory entry order. There may be a problem in macfuse although I have never observed it (but in general my file systems return directory listings in consistent order).
Perhaps then a good next step would be to try to convert this testcase to C to talk directly against the macfuse implementation and see if I can reproduce the same issue there. What do you think?
Did you really access a file name with an
\x16
character in it?
Definitely not, my testing code just runs ls
and du
over the filesystem, and if I sort the files in my Readdir
impl this doesn't occur, so it's really caused by this issue...
I note that in your file system every time a process issues
opendir
andreaddir*
it will get different results (the notationreaddir*
is taken to mean issuereaddir
0 or more times in a loop). Likewise if it issuesopendir
,readdir*
,rewinddir
,readdir*
. Perhaps this confuses a particular component.
How do you mean this? For the same directory, this testcase always returns the same files (except indeed in a different order). But, different files are visible in ls
sometimes, hence the bug.
Perhaps then a good next step would be to try to convert this testcase to C to talk directly against the macfuse implementation and see if I can reproduce the same issue there. What do you think?
Yes, this would be ideal. Although you would have to make sure to return the file name list in an inconsistent order.
Definitely not, my testing code just runs ls and du over the filesystem, and if I sort the files in my Readdir impl this doesn't occur, so it's really caused by this issue...
This suggests a real problem. Is this reproducible?
How do you mean this? For the same directory, this testcase always returns the same files (except indeed in a different order).
Yes, I meant that the directory listing will be in different order.
Hi @billziss-gh! I've written the testcase against macfuse in C++ and could reproduce the same issue. That's probably good news for you! ;-)
I've reported the issue further upstream at https://github.com/osxfuse/osxfuse/issues/819, along with the test case. Unfortunately, since macfuse is closed-source nowadays that means my debugging stops here. Hopefully the macfuse maintainer can reproduce and can find a fix :-)
Feel free to close this issue, unless you'd like to keep tracking it.
To answer your other question:
This suggests a real problem. Is this reproducible?
Yes, for me the code in the Gist linked above always reproduces the non-printable characters issue. It sometimes takes a couple minutes for the messages to appear.
@sgielen thanks for the update and for your investigation on this. Let's keep this issue open for now so we can track the progress on the upstream issue.
I encountered an issue on macOS where under load, a
ls
on a cgofuse-mounted filesystem would sometimes show files twice, or not show some files at all. My application is confirmed to callfill
exactly as many times with the same filenames when the issue shows as when it does not. To confirm this is an issue outside of my application, I created a simple test case: https://gist.github.com/sgielen/b229ded71f5946fc242bfb6e75e08da6The issue seems to worsen when the
Readdir
call takes longer, so my testcase Readdir implementation has a sleep between 0-200 ms in it. Furthermore, I've noticed that the lengths of filenames, and perhaps the hierarchy of the filesystem, must be reproduced in order to reproduce the bug.I can reproduce with golang 1.16.3, cgofuse 1.5.0, macfuse 4.1.2, macOS Big Sur 11.2.3. I can't reproduce on Windows or on Linux.
I'm willing to investigate further, but could use your advice for next steps.