hanwen / go-fuse

FUSE bindings for Go
Other
2.04k stars 327 forks source link

[v2 API] xfstests generic/007: panic: unknown node 3050993 #324

Closed rfjakob closed 4 years ago

rfjakob commented 5 years ago

Running fuse-xfstests against loopback, I sometimes get this:

generic/007 3s ... 23:09:41.507092 unknown node 3050993
panic: unknown node 3050993

goroutine 226 [running]:
log.Panicf(0x568f19, 0xf, 0xc000044dd0, 0x1, 0x1)
    /usr/local/go/src/log/log.go:345 +0xc0
github.com/hanwen/go-fuse/v2/fs.(*rawBridge).inode(0xc0000b6000, 0x2e8df1, 0x1, 0x0, 0x0)
    /home/jakob/go/src/github.com/hanwen/go-fuse/fs/bridge.go:231 +0x19f
github.com/hanwen/go-fuse/v2/fs.(*rawBridge).Flush(0xc0000b6000, 0xc000020120, 0xc0000ca198, 0x1)
    /home/jakob/go/src/github.com/hanwen/go-fuse/fs/bridge.go:742 +0x46
github.com/hanwen/go-fuse/v2/fuse.doFlush(0xc0000b8000, 0xc0000ca000)
    /home/jakob/go/src/github.com/hanwen/go-fuse/fuse/opcode.go:372 +0x4b
github.com/hanwen/go-fuse/v2/fuse.(*Server).handleRequest(0xc0000b8000, 0xc0000ca000, 0xc000000000)
    /home/jakob/go/src/github.com/hanwen/go-fuse/fuse/server.go:470 +0x3ab
github.com/hanwen/go-fuse/v2/fuse.(*Server).loop(0xc0000b8000, 0xc0003c6c01)
    /home/jakob/go/src/github.com/hanwen/go-fuse/fuse/server.go:437 +0x196
created by github.com/hanwen/go-fuse/v2/fuse.(*Server).readRequest
    /home/jakob/go/src/github.com/hanwen/go-fuse/fuse/server.go:304 +0x42b
- output mismatch (see /home/jakob.donotbackup/code/fuse-xfstests/results//generic/007.out.bad)
    --- tests/generic/007.out   2018-01-20 14:29:39.057451924 +0100
    +++ /home/jakob.donotbackup/code/fuse-xfstests/results//generic/007.out.bad 2019-10-20 23:09:41.531867509 +0200
    @@ -13,10 +13,543 @@
     .........................................................................
     .........................................................................
     .........................................................................
    -....................................................
    -creates:  18736 OK,  18802 EEXIST  ( 37538 total, 50% EEXIST)
    -removes:  18675 OK,  19927 ENOENT  ( 38602 total, 51% ENOENT)
    -lookups:  12000 OK,  11860 ENOENT  ( 23860 total, 49% ENOENT)
    ...
    (Run 'diff -u tests/generic/007.out /home/jakob.donotbackup/code/fuse-xfstests/results//generic/007.out.bad'  to see the entire diff)
hanwen commented 5 years ago

I tried compiling xfstest, but didn't manage (damn autoconf, something with LT_INIT).

can you summarize this test for me?

hanwen commented 5 years ago

or provide a recipe for compiling xfstest?

rfjakob commented 5 years ago

I have written down what it took to compile on Fedora here: https://github.com/rfjakob/fuse-xfstests#prerequisites

Test summary:

# drive the src/nametest program
# which does a heap of open(create)/unlink/stat
rfjakob commented 5 years ago

nametest.c has a better summary on top: https://github.com/rfjakob/fuse-xfstests/blob/gocryptfs-2018-08-18/src/nametest.c#L13

rfjakob commented 5 years ago

I'm attaching a tarball that has "make configure" already done, which means autoconf should not be needed to build and avoids the error you get: fuse-xfstests.git182841fea487419.tar.gz

hanwen commented 5 years ago

I have binaries now, but it's unclear how to actually run tests.

[hanwen@localhost fuse-xfstests]$ echo DIR $TEST_DIR $TEST_DEV DIR /tmp/y /tmp/y [hanwen@localhost fuse-xfstests]$ ./check generic/007 common/config: TEST_DEV (/dev/vda) is not a block device or a network filesystem

rfjakob commented 5 years ago

sudo ./check-loopback

This is all automated in fuse-xfstests

On Wed, Oct 23, 2019 at 10:08 AM Han-Wen Nienhuys notifications@github.com wrote:

I have binaries now, but it's unclear how to actually run tests.

[hanwen@localhost fuse-xfstests]$ echo DIR $TEST_DIR $TEST_DEV DIR /tmp/y /tmp/y [hanwen@localhost fuse-xfstests]$ ./check generic/007 common/config: TEST_DEV (/dev/vda) is not a block device or a network filesystem

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hanwen/go-fuse/issues/324?email_source=notifications&email_token=AACGA7ZHWUAGK4XMY7YUBTLQQABBHA5CNFSM4JCWDOFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECAPS4Y#issuecomment-545323379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACGA73ZVGFHHZKXJVQ7NODQQABBHANCNFSM4JCWDOFA .

rfjakob commented 4 years ago

Caught one on camera:

17:09:07.651735 rx 84444: CREATE i13632136 {0100644 [WRONLY,TRUNC,CREAT,0x8000] (022)} ["XXXXXXXXXXXX.38"] 16b
17:09:07.651790 tx 84444:     OK, {i13632483 g1 {M0100644 SZ=0 L=1 0:0 B0*4096 i0:13632483 A 1572192547.650470 M 1572192547.650470 C 1572192547.650470} &{7 0 0}}
[...]
17:09:07.684439 rx 87988: CREATE i13632119 {0100644 [CREAT,TRUNC,WRONLY,0x8000] (022)} ["XXXXXXXXXXXX.37"] 16b
[...]
17:09:07.684472 rx 87992: FORGET i13632483 {Nlookup=1} 
17:09:07.684483 tx 87988:     OK, {i13632483 g1 {M0100644 SZ=0 L=1 0:0 B0*4096 i0:13632483 A 1572192547.683470 M 1572192547.683470 C 1572192547.683470} &{8 0 0}}
[...]
17:09:07.684557 rx 88006: FLUSH i13632483 {Fh 8} 
17:09:07.684560 unknown node 13632483
17:09:07.684564 tx 88002:     OK,  "XXXXXXXXXXXX.830"
panic: unknown node 13632483

Complete last ~3000 lines of the log: loopback.tail.log

rfjakob commented 4 years ago

Another one:

17:27:56.456797 rx 87972: MKDIR i13894900 {0755 (022)} ["XXXXXXXXXXXX.890"] 17b
[...]
17:27:56.457005 rx 87988: FORGET i14025239 {Nlookup=1} 
removeRef: delete i14025239
[...]
17:27:56.457485 tx 87972:     OK, {i14025239 g1 tE=1s tA=1s {M040755 SZ=4096 L=2 0:0 B8*4096 i0:14025239 A 1572193676.455471 M 1572193676.455471 C 1572193676.455471}}
removeRef: delete i13895478
[...]
17:27:56.725060 rx 108266: FORGET i14025239 {Nlookup=1} 
17:27:56.725066 unknown node 14025239
17:27:56.725083 tx 108272:     OK
panic: unknown node 14025239
kjschubert commented 4 years ago

Hi,

I think I have figured out what the problem is here. I had those „unknown node“ panics with a prototype I develop. What got me thinking was, that the debug logs always showed, as with the example from @rfjakob, that the kernel sent an Forget Request for node X, then I return the exact same Inode with the exact same Ino in a following Lookup Request and after that the panic happens.

After debugging for me it looks like:

As far as I understand we should remove the Inode's reference to rawBridge, while the bridge is still locked, so no other request can use ops.embed() between Unlocking and removing the reference, whereby we would need an intermediate pointer to the bridge. That's what I tried to do in my pull request #336.

As this is my first pull request I'm looking forward to your feedback.

rfjakob commented 4 years ago

Hi, FYI, there is one PR via Gerrit in progress that should fix the problem: https://review.gerrithub.io/c/hanwen/go-fuse/+/473813

hanwen commented 4 years ago

sorry for taking time. Last week was hectic with Real Work (tm)

rfjakob commented 4 years ago

No problem!