Closed jkowalski closed 1 year ago
I tried debugging this deeper. I modified hello
example to return one image file and it worked.
Then I mounted a directory containing exactly one image file in Kopia and it failed.
I emitted, cleaned up and compared debug logs from both sessions
The only difference between hello
(working) and kopia
(not working) was:
hello
: OPEN n2 {O_RDONLY} => OK, {Fh 0 CACHE}
kopia
: OPEN n2 {O_RDONLY} => OK, {Fh 1 DIRECT}
And sure enough, disabling direct IO fixed the issue.
great to hear that you have solved the problem. Do you still need anything here?
great to hear that you have solved the problem. Do you still need anything here?
Perhaps some help understanding what could be wrong with direct io here? Would this be a macfuse issue?
it's hard to tell. You can probably tell from the getattr/open/read sequence what is going on. Normally, a program wouldn't just read 1 byte; is it really requesting 1 byte?
it's hard to tell. You can probably tell from the getattr/open/read sequence what is going on. Normally, a program wouldn't just read 1 byte; is it really requesting 1 byte?
This is the cleaned-up and matched sequence of operations as recorded by debug log:
INIT n0 {7.19 Ra 0x100000 VOL_RENAME,CASE_INSENSITIVE,EXPLICIT_INVAL_DATA,ATOMIC_O_TRUNC,XTIMES,0xc000000} => OK, {7.8 Ra 0x100000 0/0 Wr 0x10000 Tg 0x0}
LOOKUP n1 [".hidden"] 8b => 2=no such file or directory, {n0 g0 tE=30s tA=0s {M00 SZ=0 L=0 0:0 0 0:0 A 0.000000 M 0.000000 C 0.000000}}
LOOKUP n1 [".DS_Store"] 10b => 2=no such file or directory, {n0 g0 tE=30s tA=0s {M00 SZ=0 L=0 0:0 0 0:0 A 0.000000 M 0.000000 C 0.000000}}
LOOKUP n1 ["foo.jpg"] 8b => OK, {n2 g0 tE=30s tA=30s {M0100744 SZ=2431902 L=0 0:0 0 0:3 A 0.000000 M 0.000000 C 0.000000}}
OPEN n2 {O_RDONLY} => OK, {Fh 1 DIRECT}
FLUSH n2 {Fh 1} => OK
RELEASE n2 {Fh 1 L0} => OK
LOOKUP n1 ["foo.jpg"] 8b => OK, {n2 g0 tE=30s tA=30s {M0100744 SZ=2431902 L=0 0:0 0 0:3 A 0.000000 M 0.000000 C 0.000000}}
OPEN n2 {O_RDONLY} => OK, {Fh 1 DIRECT}
FLUSH n2 {Fh 1} => OK
RELEASE n2 {Fh 1 L0} => OK
LOOKUP n1 ["foo.jpg"] 8b => OK, {n2 g0 tE=30s tA=30s {M0100744 SZ=2431902 L=0 0:0 0 0:3 A 0.000000 M 0.000000 C 0.000000}}
OPEN n2 {O_RDONLY} => OK, {Fh 1 DIRECT}
FLUSH n2 {Fh 1} => OK
RELEASE n2 {Fh 1 L0} => OK
OPEN n2 {O_RDONLY} => OK, {Fh 1 DIRECT}
READ n2 {Fh 1 [0 +1) } => OK, 1b data "\xff"
FLUSH n2 {Fh 1} => OK
RELEASE n2 {Fh 1 L0} => OK
OPEN n2 {O_RDONLY} => OK, {Fh 1 DIRECT}
FLUSH n2 {Fh 1} => OK
RELEASE n2 {Fh 1 L0} => OK
DESTROY n1 => OK
weird; what does it look like without DIRECT ? On Linux, I'd use strace to see the system calls executed; it seems strange for a program to read 1 byte, but the whole trace looks suspect (why would it open a file to directly close it again?)
That's macOS - it's what it does :)
Also, here's the output from modified hello:
INIT n0 {7.19 Ra 0x100000 CASE_INSENSITIVE,EXPLICIT_INVAL_DATA,ATOMIC_O_TRUNC,XTIMES,VOL_RENAME,0xc000000} => OK, {7.8 Ra 0x100000 0/0 Wr 0x10000 Tg 0x0}
LOOKUP n1 [".hidden"] 8b => 2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 0 0:0 A 0.000000 M 0.000000 C 0.000000}}
LOOKUP n1 [".DS_Store"] 10b => 2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 0 0:0 A 0.000000 M 0.000000 C 0.000000}}
LOOKUP n1 ["foo.jpg"] 8b => OK, {n2 g0 tE=0s tA=0s {M0100644 SZ=1827552 L=0 0:0 0 0:2 A 0.000000 M 0.000000 C 0.000000}}
OPEN n2 {O_RDONLY} => OK, {Fh 0 CACHE}
FLUSH n2 {Fh 0} => OK
RELEASE n2 {Fh 0 L0} => OK
OPEN n2 {O_RDONLY} => OK, {Fh 0 CACHE}
FLUSH n2 {Fh 0} => OK
RELEASE n2 {Fh 0 L0} => OK
OPEN n2 {O_RDONLY} => OK, {Fh 0 CACHE}
FLUSH n2 {Fh 0} => OK
RELEASE n2 {Fh 0 L0} => OK
OPEN n2 {O_RDONLY} => OK, {Fh 0 CACHE}
READ n2 {Fh 0 [0 +4096) } => OK, 4096b data "\xff\xd8\xff\xe0\x00\x10JF"...
SETATTR n2 {atime 1613105922.268628721} => OK, {tA=0s {M0100644 SZ=1827552 L=0 0:0 0 0:0 A 0.000000 M 0.000000 C 0.000000}}
FLUSH n2 {Fh 0} => OK
READ n2 {Fh 0 [4096 +28672) } => OK, 28672b data " "...
RELEASE n2 {Fh 0 L0} => OK
LISTXATTR n2 {sz 0} => OK
(note it's slightly different JPEG, that's why SZ is different)
did this work with DIRECT in bazil.org FUSE ? Why do you need DIRECT ?
the only difference I see is that your user-supplied inode number doesn't match the number we send to the kernel (n2 vs 0:3). I don't know where this difference comes from. I think it shouldn't matter.
I used DIRECT by accident, I don't think I need it at all. Just curious. BTW I found 2 issues in osxfuse repo that talk about this behavior, but they refer to old versions of macOS. I have seen this on 11.1.
one nit: would be great if there was an option of reporting single-line logs (buffer rx in some kind of map and emit together with tx), otherwise grepping is hard. I had to write a tool to match requests and responses to produce these logs.
otherwise grepping is hard. I had to write a tool to match requests and responses to produce these logs.
A lot of bugs are due to timing issues, so it is really helpful for me to see some some kind of event ordering.
I've been trying to rewrite filesystem mounting layer in https://github.com/kopia/kopia from
bazil.org/fuse
togo-fuse
(which is excellent, especially v2 API)The port went mostly smoothly, but I've noticed some really weird behavior with mounted filesystems and macOS Preview app. I don't know which layer the problem is in (my code,
go-fuse
,macfuse
or macOS itself) to even investigate. Any suggestions are appreciated:The symptoms are really weird:
When I mount snapshots (using
kopia mount all /some/mountpoint
) everything appears fine - I can see the files, directories, they have the right file mode, attributes, owners and I can see their contents just fine using simple Unix tools (cat, wc, sha256sum, ...) and most GUI tools (like watching videos using VLC for example). I can also see photo thumbnails in Finder.However, when I try to open some image files from a FUSE mount using
Preview
it fails with:The file AAA.jpg cannot be opened. It may be damaged or use a file format that Preview doesn’t recognize.
When this happens,
go-fuse
invokes theRead()
of one byte (which succeeds and returns first byte of the JPEG which is 0xFF) and never reads anything again.cp
), I'm able to open the file just fine.Here's my pull request to kopia, seems straightforward, I think: https://github.com/kopia/kopia/pull/829/files
Am I missing something mac-specific I need to enable in go-fuse to get this to work? I don't remember
bazil.org/fuse
having this behavior, but it stopped working once I upgraded to macfuse 4.x.