hanwen / go-fuse

FUSE bindings for Go
Other
1.97k stars 313 forks source link

Issues when unmounting on linux/ darwin #512

Closed marius-enlock closed 1 month ago

marius-enlock commented 2 months ago

Hi,

On linux I found the need to call fusermount -uz mountpoint, in the unmount command;

I get the following panic:

Exit
Line: 2024/04/25 07:16:07 unmount: executing ["/usr/bin/fusermount3" "-uz" "/home/marius/playground/test_mount"]
Line: 2024/04/25 07:16:07 received ENODEV (unmount request), thread exiting
Line: 2024/04/25 07:16:07 received ENODEV (unmount request), thread exiting
Line: 2024/04/25 07:16:07 received ENODEV (unmount request), thread exiting
Line: 2024/04/25 07:16:07 received ENODEV (unmount request), thread exiting
Line: 2024/04/25 07:16:07 received ENODEV (unmount request), thread exiting
Line: 2024/04/25 07:16:07 received ENODEV (unmount request), thread exiting
Line: 2024/04/25 07:16:07 received ENODEV (unmount request), thread exiting
Line: panic: sync: negative WaitGroup counter
Line: 
Line: goroutine 7278 [running]:
Line: sync.(*WaitGroup).Add(0x12f2780?, 0xc001158a58?)
Line:   /usr/local/go/src/sync/waitgroup.go:62 +0xd8
Line: sync.(*WaitGroup).Done(0xc000363e40?)
Line:   /usr/local/go/src/sync/waitgroup.go:87 +0x1a
Line: github.com/hanwen/go-fuse/v2/fuse.(*Server).loop(0xc0011589c0, 0xd0?)
Line:   /home/marius/go/pkg/mod/github.com/hanwen/go-fuse/v2@v2.5.1/fuse/server.go:503 +0x22a
Line: created by github.com/hanwen/go-fuse/v2/fuse.(*Server).readRequest in goroutine 7225
Line:   /home/marius/go/pkg/mod/github.com/hanwen/go-fuse/v2@v2.5.1/fuse/server.go:367 +0x549

These are my mount options:

    server, err := fs.Mount(d.pathToMount, root, &fs.Options{
        MountOptions: fuse.MountOptions{Debug: true, AllowOther: true, MaxReadAhead: FileBlockSize, FsName: "drive"},
        Logger: log.New(os.Stderr, "fuse", 0),
    })

I need the lazy unmount as I observed that without lazy, the mount points remain with ????? attribute and can't be removed.

marius-enlock commented 2 months ago

I also found a need on darwin to use /sbin/umount instead of libc_unmount_trampoline, as my filesystem doesn't get unmounted otherwise, would a PR that tries umount in the case the syscall unmount fails be welcomed for darwin?

I am aware that it should be two separate issues, but for the darwin question, I felt it's ok to ask here, as it might be related to why I have the need of lazy unmount on linux.

hanwen commented 2 months ago

Line: panic: sync: negative WaitGroup counter

this is a bug in go-fuse. I will have a look tomorrow.

hanwen commented 2 months ago

can you confirm that using the race detector to run your file system doesn´t turn up warnings?

marius-enlock commented 2 months ago

I was not aware of the race detector. Thank you for pointing out.

I have 4 race warnings in the beginning of my program that are not related to the filesystem. I will fix them and re-try the unmount to see if they affected it somehow.

(Edit: They have to do with context handling - cancelling - inside some unrelated parts of my program)

My program perform a few operations, and after the filesystem gets mounted without any race warnings until I perform the unmount and the above panic happens.

Edit: Running the file system with the race detector doesn't turn up any warnings.

hanwen commented 2 months ago

is it possible that you call Server.Serve() twice on the same server?

marius-enlock commented 2 months ago

No, I double checked, and I am confident I don't call it twice.

hanwen commented 2 months ago

Curious. is there a way I could get my hands on a reproducer? Also, the trace you posted is a smoking gun, but the complete goroutine stacks may be useful too.

hanwen commented 2 months ago

wait, this code

/usr/bin/fusermount3" "-uz"

does not exist in go-fuse. Can you reproduce this problem with the latest go-fuse (without local changes?)

marius-enlock commented 2 months ago

wait, this code

/usr/bin/fusermount3" "-uz"

does not exist in go-fuse. Can you reproduce this problem with the latest go-fuse (without local changes?)

Yes, the -uz does not exist in go-fuse, the command called from go-fuse without my change is:

"/usr/bin/fusermount3" "-u" 

This one works without any issue; go-fuse has no panics as long as the unmount is not attempted in lazy mode.

Curious. is there a way I could get my hands on a reproducer? Also, the trace you posted is a smoking gun, but the complete goroutine stacks may be useful too.

The trace I posted is all I got from my program.

I will try to make a minimum reproducible example.

But for example the following scenario also breaks:

I run the go-fuse filesystem without any change in the library; then on a different terminal I call:

fusermount3 -uz  mounted_folder

The exact same panic happens - this is the full stacktrace my program prints.

Line: panic: sync: negative WaitGroup counter
Line: 
Line: goroutine 7383 [running]:
Line: sync.(*WaitGroup).Add(0x12f2780?, 0xc0048c83d8?)
Line:   /usr/local/go/src/sync/waitgroup.go:62 +0xd8
Line: sync.(*WaitGroup).Done(0x0?)
Line:   /usr/local/go/src/sync/waitgroup.go:87 +0x1a
Line: github.com/hanwen/go-fuse/v2/fuse.(*Server).loop(0xc0048c8340, 0x0?)
Line:   /home/marius/go/pkg/mod/github.com/hanwen/go-fuse/v2@v2.5.1/fuse/server.go:503 +0x22a
Line: created by github.com/hanwen/go-fuse/v2/fuse.(*Server).readRequest in goroutine 7284
Line:   /home/marius/go/pkg/mod/github.com/hanwen/go-fuse/v2@v2.5.1/fuse/server.go:367 +0x549
marius-enlock commented 2 months ago

Edit: I edited the example.

Here is a full reproducible example (I had modified the example for the loopback): The modification is at the end of the program, in how the file system gets mounted

// Copyright 2016 the Go-FUSE Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

// This is main program driver for the loopback filesystem from
// github.com/hanwen/go-fuse/fs/, a filesystem that shunts operations
// to an underlying file system.
package main

import (
    "flag"
    "fmt"
    "log"
    "os"
    "os/signal"
    "path"
    "runtime/pprof"
    "syscall"
    "time"

    "github.com/hanwen/go-fuse/v2/fs"
    "github.com/hanwen/go-fuse/v2/fuse"
)

func writeMemProfile(fn string, sigs <-chan os.Signal) {
    i := 0
    for range sigs {
        fn := fmt.Sprintf("%s-%d.memprof", fn, i)
        i++

        log.Printf("Writing mem profile to %s\n", fn)
        f, err := os.Create(fn)
        if err != nil {
            log.Printf("Create: %v", err)
            continue
        }
        pprof.WriteHeapProfile(f)
        if err := f.Close(); err != nil {
            log.Printf("close %v", err)
        }
    }
}

func main() {
    log.SetFlags(log.Lmicroseconds)
    // Scans the arg list and sets up flags
    debug := flag.Bool("debug", false, "print debugging messages.")
    other := flag.Bool("allow-other", false, "mount with -o allowother.")
    quiet := flag.Bool("q", false, "quiet")
    ro := flag.Bool("ro", false, "mount read-only")
    directmount := flag.Bool("directmount", false, "try to call the mount syscall instead of executing fusermount")
    directmountstrict := flag.Bool("directmountstrict", false, "like directmount, but don't fall back to fusermount")
    cpuprofile := flag.String("cpuprofile", "", "write cpu profile to this file")
    memprofile := flag.String("memprofile", "", "write memory profile to this file")
    flag.Parse()
    if flag.NArg() < 2 {
        fmt.Printf("usage: %s MOUNTPOINT ORIGINAL\n", path.Base(os.Args[0]))
        fmt.Printf("\noptions:\n")
        flag.PrintDefaults()
        os.Exit(2)
    }
    if *cpuprofile != "" {
        if !*quiet {
            fmt.Printf("Writing cpu profile to %s\n", *cpuprofile)
        }
        f, err := os.Create(*cpuprofile)
        if err != nil {
            fmt.Println(err)
            os.Exit(3)
        }
        pprof.StartCPUProfile(f)
        defer pprof.StopCPUProfile()
    }
    if *memprofile != "" {
        if !*quiet {
            log.Printf("send SIGUSR1 to %d to dump memory profile", os.Getpid())
        }
        profSig := make(chan os.Signal, 1)
        signal.Notify(profSig, syscall.SIGUSR1)
        go writeMemProfile(*memprofile, profSig)
    }
    if *cpuprofile != "" || *memprofile != "" {
        if !*quiet {
            fmt.Printf("Note: You must unmount gracefully, otherwise the profile file(s) will stay empty!\n")
        }
    }

    orig := flag.Arg(1)
    loopbackRoot, err := fs.NewLoopbackRoot(orig)
    if err != nil {
        log.Fatalf("NewLoopbackRoot(%s): %v\n", orig, err)
    }

    sec := time.Second
    opts := &fs.Options{
        // The timeout options are to be compatible with libfuse defaults,
        // making benchmarking easier.
        AttrTimeout:  &sec,
        EntryTimeout: &sec,

        NullPermissions: true, // Leave file permissions on "000" files as-is

        MountOptions: fuse.MountOptions{
            AllowOther:        *other,
            Debug:             *debug,
            DirectMount:       *directmount,
            DirectMountStrict: *directmountstrict,
            FsName:            orig,       // First column in "df -T": original dir
            Name:              "loopback", // Second column in "df -T" will be shown as "fuse." + Name
        },
    }
    if opts.AllowOther {
        // Make the kernel check file permissions for us
        opts.MountOptions.Options = append(opts.MountOptions.Options, "default_permissions")
    }
    if *ro {
        opts.MountOptions.Options = append(opts.MountOptions.Options, "ro")
    }
    // Enable diagnostics logging
    if !*quiet {
        opts.Logger = log.New(os.Stderr, "", 0)
    }
    server, err := fs.Mount(flag.Arg(0), loopbackRoot, opts)
    if err != nil {
        log.Fatalf("Mount fail: %v\n", err)
    }
    if !*quiet {
        fmt.Println("Mounted!")
    }

    c := make(chan os.Signal)
    d := make(chan bool)
    signal.Notify(c, os.Interrupt, syscall.SIGTERM)
    go func() {
        <-c
        close(d)
        server.Unmount()
    }()

    e := make(chan bool)

    go func() { e <- true }()
    for {
        select {
        case <-d:
            time.Sleep(10 * time.Second)
            break
        case <-e:
            server.Serve()
        }
    }
}

calling fusermount3 -uz mounted_folder (t1 in the example) results in:

marius@marsik:~/work/enlock/code/go-fuse/example/loopback$ go run main.go t1 t2
Mounted!
panic: sync: negative WaitGroup counter

goroutine 1 [running]:
sync.(*WaitGroup).Add(0x51f560?, 0xc00008a8b8?)
        /usr/local/go/src/sync/waitgroup.go:62 +0xd8
sync.(*WaitGroup).Done(0x0?)
        /usr/local/go/src/sync/waitgroup.go:87 +0x1a
github.com/hanwen/go-fuse/v2/fuse.(*Server).loop(0xc00008a820, 0x2?)
        /home/marius/go/pkg/mod/github.com/hanwen/go-fuse/v2@v2.5.1/fuse/server.go:503 +0x22a
github.com/hanwen/go-fuse/v2/fuse.(*Server).Serve(0xc00008a820)
        /home/marius/go/pkg/mod/github.com/hanwen/go-fuse/v2@v2.5.1/fuse/server.go:422 +0x35
main.main()
        /home/marius/work/enlock/code/go-fuse/example/loopback/main.go:149 +0xb6c
exit status 2
marius-enlock commented 2 months ago

In the provided example, the panic happens also without the lazy unmount flag (just ctrl+c from the terminal that launched it)

hanwen commented 2 months ago

This is quite odd. Somehow the infinite loop is causing this to happen. Trimmed down code:

package main

import (
    "flag"
    "fmt"
    "log"
    "os"
    "path"

    "github.com/hanwen/go-fuse/v2/fs"
    "github.com/hanwen/go-fuse/v2/fuse"
)

func main() {
    log.SetFlags(log.Lmicroseconds)
    // Scans the arg list and sets up flags
    debug := flag.Bool("debug", false, "print debugging messages.")
    quiet := flag.Bool("q", false, "quiet")
    flag.Parse()
    if flag.NArg() < 2 {
        fmt.Printf("usage: %s MOUNTPOINT ORIGINAL\n", path.Base(os.Args[0]))
        fmt.Printf("\noptions:\n")
        flag.PrintDefaults()
        os.Exit(2)
    }

    orig := flag.Arg(1)
    loopbackRoot, err := fs.NewLoopbackRoot(orig)
    if err != nil {
        log.Fatalf("NewLoopbackRoot(%s): %v\n", orig, err)
    }

    opts := &fs.Options{
        MountOptions: fuse.MountOptions{
            Debug: *debug,
        },
    }
    // Enable diagnostics logging
    opts.Logger = log.New(os.Stderr, "", 0)
    server, err := fs.Mount(flag.Arg(0), loopbackRoot, opts)
    if err != nil {
        log.Fatalf("Mount fail: %v\n", err)
    }
    if !*quiet {
        fmt.Println("Mounted!")
    }

    e := make(chan bool)
    go func() { e <- true }()
    for {
        select {
        case <-e:
            log.Println("serve")
            server.Serve()
            log.Println("serve done!")
            server = nil
        }
    }
}
rfjakob commented 2 months ago

It's not the loop. Adding a sleep at the bottom also shows the panic. My guess now is that it always panics on unmount, but most of the time the process exists before the panic is printed.

package main

import (
    "flag"
    "fmt"
    "log"
    "os"
    "path"
    "time"

    "github.com/hanwen/go-fuse/v2/fs"
    "github.com/hanwen/go-fuse/v2/fuse"
)

func main() {
    log.SetFlags(log.Lmicroseconds)
    flag.Parse()
    if flag.NArg() < 2 {
        fmt.Printf("usage: %s MOUNTPOINT ORIGINAL\n", path.Base(os.Args[0]))
        fmt.Printf("\noptions:\n")
        flag.PrintDefaults()
        os.Exit(2)
    }

    orig := flag.Arg(1)
    loopbackRoot, err := fs.NewLoopbackRoot(orig)
    if err != nil {
        log.Fatalf("NewLoopbackRoot(%s): %v\n", orig, err)
    }

    opts := &fs.Options{
        MountOptions: fuse.MountOptions{
            Debug: true,
        },
    }
    // Enable diagnostics logging
    opts.Logger = log.New(os.Stderr, "", 0)
    server, err := fs.Mount(flag.Arg(0), loopbackRoot, opts)
    if err != nil {
        log.Fatalf("Mount fail: %v\n", err)
    }

    fmt.Println("serve")
    server.Serve()
    fmt.Println("serve done!")
    server = nil

    time.Sleep(1 * time.Second)
}
rfjakob commented 2 months ago

This is calling Serve() twice. Once implicitely via fs.Mount(), once explicitely via server.Serve().

rfjakob commented 2 months ago

I suggest https://review.gerrithub.io/c/hanwen/go-fuse/+/1194109/2