hanwen / go-fuse

FUSE bindings for Go
Other
2.04k stars 327 forks source link

regression: 05a3771 "fuse: Enable parallel lookup and readdir by default" causes hangs #281

Closed rfjakob closed 4 years ago

rfjakob commented 5 years ago

@navytux looks like there is a problem with https://github.com/hanwen/go-fuse/commit/05a3771cbea72f2f7d50be19daa19f3dff2d6163 .

I have a gocryptfs user running Ubuntu 16.04 (kernel 4.15.0) reporting that filesystems can no longer be unmounted because "target is busy".

I have reproduced the issue in a VM and have bisected it (bisect log) down to the commit mentioned above.

The issue is accompanied by a kernel hung task warning:

[14379.956556] INFO: task pool:3748 blocked for more than 120 seconds.

The task that calls itself pool here is actually /usr/lib/gvfs/gvfs-udisks2-volume-monitor which is something in Gnome. It seems to be stuck in the lstat syscall, but as far as I can see in the kernel backtrace, it seems to be deadlocked at fuse_lock_inode !? I'll try to decode the backtrace, can this be a kernel bug?

hanwen commented 5 years ago

looks like a race condition in the kernel? options for one (go-fuse) mount should not leak into other (gvfs, libfuse) mounts.

navytux commented 5 years ago

@rfjakob, @hanwen, thanks for the report and appologize for the delay with replying. I have some experience with debugging such issues. In my case it turned out to the following: kernel FUSE takes locks on the pages when it does IO. for example when a page is faulted in, the page is locked and FUSE module sends READ request. Then if e.g. a filesystem sends notification to kernel which will take lock on the same page - deadlock will happen. I feel what happens here is very similar to this, though probably it is not exactly the deadlock in between READ and NOTIFY (other operations takes looks too). Please see, if you are interested, https://lab.nexedi.com/kirr/wendelin.core/blob/t/wcfs/notes.txt#L156.

@rfjakob, could you please provide a minimal reproducer? I think I should be able to look into it - what happens - especially that with https://lab.nexedi.com/kirr/wendelin.core/blob/t/t/qemu-runlinux it is possible to work on a kernel like if it was just a regular problem. I do not promise fast reply, but I should be hopefully able to eventually investigate.

navytux commented 5 years ago
navytux commented 5 years ago

I could reproduce the problem under the same VM @rfjakob used in https://github.com/rfjakob/gocryptfs/issues/381. I got the filesystem stuck not only on unmount, but also on e.g. plain echo aaa >bbb but not always. I did not debugged in full, but I suspect it is a kernel bug. Reason for this:

$ git log -p v4.15..v4.15.18 -- fs/fuse/
# empty

i.e. there were no fuse patches applied to stable v4.15.y series, however

$ git log -p v4.15..v5.1-rc3 --grep="stable@" -- fs/fuse/
commit a2ebba824106dabe79937a9f29a875f837e1b6d4
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Jan 16 10:27:59 2019 +0100

    fuse: decrement NR_WRITEBACK_TEMP on the right page

    NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not
    the page cache page.

    Fixes: 8b284dc47291 ("fuse: writepages: handle same page rewrites")
    Cc: <stable@vger.kernel.org> # v3.13
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index ffaffe18352a..a59c16bd90ac 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1782,7 +1782,7 @@ static bool fuse_writepage_in_flight(struct fuse_req *new_req,
        spin_unlock(&fc->lock);

        dec_wb_stat(&bdi->wb, WB_WRITEBACK);
-       dec_node_page_state(page, NR_WRITEBACK_TEMP);
+       dec_node_page_state(new_req->pages[0], NR_WRITEBACK_TEMP);
        wb_writeout_inc(&bdi->wb);
        fuse_writepage_free(fc, new_req);
        fuse_request_free(new_req);

commit 9509941e9c534920ccc4771ae70bd6cbbe79df1c
Author: Jann Horn <jannh@google.com>
Date:   Sat Jan 12 02:39:05 2019 +0100

    fuse: call pipe_buf_release() under pipe lock

    Some of the pipe_buf_release() handlers seem to assume that the pipe is
    locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page
    without taking any extra locks. From a glance through the callers of
    pipe_buf_release(), it looks like FUSE is the only one that calls
    pipe_buf_release() without having the pipe locked.

    This bug should only lead to a memory leak, nothing terrible.

    Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jann Horn <jannh@google.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index fc29264011a6..809c0f2f9942 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2077,8 +2077,10 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,

    ret = fuse_dev_do_write(fud, &cs, len);

+   pipe_lock(pipe);
    for (idx = 0; idx < nbuf; idx++)
        pipe_buf_release(pipe, &bufs[idx]);
+   pipe_unlock(pipe);

 out:
    kvfree(bufs);

commit 8a3177db59cd644fde05ba9efee29392dfdec8aa
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Jan 16 10:27:59 2019 +0100

    cuse: fix ioctl

    cuse_process_init_reply() doesn't initialize fc->max_pages and thus all
    cuse bases ioctls fail with ENOMEM.

    Reported-by: Andreas Steinmetz <ast@domdv.de>
    Fixes: 5da784cce430 ("fuse: add max_pages to init_out")
    Cc: <stable@vger.kernel.org> # v4.20
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 76baaa6be393..c2d4099429be 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -628,6 +628,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
    get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
    fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
    fc->user_ns = get_user_ns(user_ns);
+   fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);

@@ -1162,7 +1163,6 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
    fc->user_id = d.user_id;
    fc->group_id = d.group_id;
    fc->max_read = max_t(unsigned, 4096, d.max_read);
-   fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;

    /* Used by get_root_inode() */
    sb->s_fs_info = fc;

commit 97e1532ef81acb31c30f9e75bf00306c33a77812
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Jan 16 10:27:59 2019 +0100

    fuse: handle zero sized retrieve correctly

    Dereferencing req->page_descs[0] will Oops if req->max_pages is zero.

    Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
    Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
    Fixes: b2430d7567a3 ("fuse: add per-page descriptor <offset, length> to fuse_req")
    Cc: <stable@vger.kernel.org> # v3.9
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index a5e516a40e7a..fc29264011a6 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1742,7 +1742,6 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
    req->in.h.nodeid = outarg->nodeid;
    req->in.numargs = 2;
    req->in.argpages = 1;
-   req->page_descs[0].offset = offset;
    req->end = fuse_retrieve_end;

    index = outarg->offset >> PAGE_SHIFT;
@@ -1757,6 +1756,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,

        this_num = min_t(unsigned, num, PAGE_SIZE - offset);
        req->pages[req->num_pages] = page;
+       req->page_descs[req->num_pages].offset = offset;
        req->page_descs[req->num_pages].length = this_num;
        req->num_pages++;

commit 2e64ff154ce6ce9a8dc0f9556463916efa6ff460
Author: Chad Austin <chadaustin@fb.com>
Date:   Mon Dec 10 10:54:52 2018 -0800

    fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS

    When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection.

    Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this
    incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent
    to userspace.

    Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR
    inside of fuse_file_put.

    Fixes: 7678ac50615d ("fuse: support clients that don't implement 'open'")
    Cc: <stable@vger.kernel.org> # v3.14
    Signed-off-by: Chad Austin <chadaustin@fb.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index dc4e83d8ace7..e909678afa2d 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1243,7 +1243,7 @@ static int fuse_dir_open(struct inode *inode, struct file *file)

 static int fuse_dir_release(struct inode *inode, struct file *file)
 {
-   fuse_release_common(file, FUSE_RELEASEDIR);
+   fuse_release_common(file, true);

    return 0;
 }
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 677c51341e96..ffaffe18352a 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -89,12 +89,12 @@ static void fuse_release_end(struct fuse_conn *fc, struct fuse_req *req)
    iput(req->misc.release.inode);
 }

-static void fuse_file_put(struct fuse_file *ff, bool sync)
+static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
 {
    if (refcount_dec_and_test(&ff->count)) {
        struct fuse_req *req = ff->reserved_req;

-       if (ff->fc->no_open) {
+       if (ff->fc->no_open && !isdir) {
            /*
             * Drop the release request when client does not
             * implement 'open'
@@ -247,10 +247,11 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
    req->in.args[0].value = inarg;
 }

-void fuse_release_common(struct file *file, int opcode)
+void fuse_release_common(struct file *file, bool isdir)
 {
    struct fuse_file *ff = file->private_data;
    struct fuse_req *req = ff->reserved_req;
+   int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE;

    fuse_prepare_release(ff, file->f_flags, opcode);

@@ -272,7 +273,7 @@ void fuse_release_common(struct file *file, int opcode)
     * synchronous RELEASE is allowed (and desirable) in this case
     * because the server can be trusted not to screw up.
     */
-   fuse_file_put(ff, ff->fc->destroy_req != NULL);
+   fuse_file_put(ff, ff->fc->destroy_req != NULL, isdir);
 }

 static int fuse_open(struct inode *inode, struct file *file)
@@ -288,7 +289,7 @@ static int fuse_release(struct inode *inode, struct file *file)
    if (fc->writeback_cache)
        write_inode_now(inode, 1);

-   fuse_release_common(file, FUSE_RELEASE);
+   fuse_release_common(file, false);

    /* return value is ignored by VFS */
    return 0;
@@ -302,7 +303,7 @@ void fuse_sync_release(struct fuse_file *ff, int flags)
     * iput(NULL) is a no-op and since the refcount is 1 and everything's
     * synchronous, we are fine with not doing igrab() here"
     */
-   fuse_file_put(ff, true);
+   fuse_file_put(ff, true, false);
 }
 EXPORT_SYMBOL_GPL(fuse_sync_release);

@@ -808,7 +809,7 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
        put_page(page);
    }
    if (req->ff)
-       fuse_file_put(req->ff, false);
+       fuse_file_put(req->ff, false, false);
 }

 static void fuse_send_readpages(struct fuse_req *req, struct file *file)
@@ -1461,7 +1462,7 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
        __free_page(req->pages[i]);

    if (req->ff)
-       fuse_file_put(req->ff, false);
+       fuse_file_put(req->ff, false, false);
 }

 static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
@@ -1620,7 +1621,7 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
    ff = __fuse_write_file_get(fc, fi);
    err = fuse_flush_times(inode, ff);
    if (ff)
-       fuse_file_put(ff, 0);
+       fuse_file_put(ff, false, false);

    return err;
 }
@@ -1941,7 +1942,7 @@ static int fuse_writepages(struct address_space *mapping,
        err = 0;
    }
    if (data.ff)
-       fuse_file_put(data.ff, false);
+       fuse_file_put(data.ff, false, false);

    kfree(data.orig_pages);
 out:
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index afe1f231c758..2f2c92e6f8cb 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -822,7 +822,7 @@ void fuse_sync_release(struct fuse_file *ff, int flags);
 /**
  * Send RELEASE or RELEASEDIR request
  */
-void fuse_release_common(struct file *file, int opcode);
+void fuse_release_common(struct file *file, bool isdir);

 /**
  * Send FSYNC or FSYNCDIR request

commit ebacb81273599555a7a19f7754a1451206a5fc4f
Author: Lukas Czerner <lczerner@redhat.com>
Date:   Fri Nov 9 14:51:46 2018 +0100

    fuse: fix use-after-free in fuse_direct_IO()

    In async IO blocking case the additional reference to the io is taken for
    it to survive fuse_aio_complete(). In non blocking case this additional
    reference is not needed, however we still reference io to figure out
    whether to wait for completion or not. This is wrong and will lead to
    use-after-free. Fix it by storing blocking information in separate
    variable.

    This was spotted by KASAN when running generic/208 fstest.

    Signed-off-by: Lukas Czerner <lczerner@redhat.com>
    Reported-by: Zorro Lang <zlang@redhat.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Fixes: 744742d692e3 ("fuse: Add reference counting for fuse_io_priv")
    Cc: <stable@vger.kernel.org> # v4.6

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index cc2121b37bf5..b52f9baaa3e7 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2924,10 +2924,12 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
    }

    if (io->async) {
+       bool blocking = io->blocking;
+
        fuse_aio_complete(io, ret < 0 ? ret : 0, -1);

        /* we have a non-extending, async request, so return */
-       if (!io->blocking)
+       if (!blocking)
            return -EIOCBQUEUED;

        wait_for_completion(&wait);

commit 2d84a2d19b6150c6dbac1e6ebad9c82e4c123772
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Nov 9 15:52:16 2018 +0100

    fuse: fix possibly missed wake-up after abort

    In current fuse_drop_waiting() implementation it's possible that
    fuse_wait_aborted() will not be woken up in the unlikely case that
    fuse_abort_conn() + fuse_wait_aborted() runs in between checking
    fc->connected and calling atomic_dec(&fc->num_waiting).

    Do the atomic_dec_and_test() unconditionally, which also provides the
    necessary barrier against reordering with the fc->connected check.

    The explicit smp_mb() in fuse_wait_aborted() is not actually needed, since
    the spin_unlock() in fuse_abort_conn() provides the necessary RELEASE
    barrier after resetting fc->connected.  However, this is not a performance
    sensitive path, and adding the explicit barrier makes it easier to
    document.

    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests")
    Cc: <stable@vger.kernel.org> #v4.19

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6fe330cc9709..a5e516a40e7a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -165,9 +165,13 @@ static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background)

 static void fuse_drop_waiting(struct fuse_conn *fc)
 {
-   if (fc->connected) {
-       atomic_dec(&fc->num_waiting);
-   } else if (atomic_dec_and_test(&fc->num_waiting)) {
+   /*
+    * lockess check of fc->connected is okay, because atomic_dec_and_test()
+    * provides a memory barrier mached with the one in fuse_wait_aborted()
+    * to ensure no wake-up is missed.
+    */
+   if (atomic_dec_and_test(&fc->num_waiting) &&
+       !READ_ONCE(fc->connected)) {
        /* wake up aborters */
        wake_up_all(&fc->blocked_waitq);
    }
@@ -2221,6 +2225,8 @@ EXPORT_SYMBOL_GPL(fuse_abort_conn);

 void fuse_wait_aborted(struct fuse_conn *fc)
 {
+   /* matches implicit memory barrier in fuse_drop_waiting() */
+   smp_mb();
    wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
 }

commit 7fabaf303458fcabb694999d6fa772cc13d4e217
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Nov 9 15:52:16 2018 +0100

    fuse: fix leaked notify reply

    fuse_request_send_notify_reply() may fail if the connection was reset for
    some reason (e.g. fs was unmounted).  Don't leak request reference in this
    case.  Besides leaking memory, this resulted in fc->num_waiting not being
    decremented and hence fuse_wait_aborted() left in a hanging and unkillable
    state.

    Fixes: 2d45ba381a74 ("fuse: add retrieve request")
    Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests")
    Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Cc: <stable@vger.kernel.org> #v2.6.36

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ae813e609932..6fe330cc9709 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1768,8 +1768,10 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
    req->in.args[1].size = total_len;

    err = fuse_request_send_notify_reply(fc, req, outarg->notify_unique);
-   if (err)
+   if (err) {
        fuse_retrieve_end(fc, req);
+       fuse_put_request(fc, req);
+   }

    return err;
 }

commit 908a572b80f6e9577b45e81b3dfe2e22111286b8
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Sep 28 16:43:22 2018 +0200

    fuse: fix blocked_waitq wakeup

    Using waitqueue_active() is racy.  Make sure we issue a wake_up()
    unconditionally after storing into fc->blocked.  After that it's okay to
    optimize with waitqueue_active() since the first wake up provides the
    necessary barrier for all waiters, not the just the woken one.

    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Fixes: 3c18ef8117f0 ("fuse: optimize wake_up")
    Cc: <stable@vger.kernel.org> # v3.10

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 34976b42f3e1..51eb602a435b 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -391,12 +391,19 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
    if (test_bit(FR_BACKGROUND, &req->flags)) {
        spin_lock(&fc->lock);
        clear_bit(FR_BACKGROUND, &req->flags);
-       if (fc->num_background == fc->max_background)
+       if (fc->num_background == fc->max_background) {
            fc->blocked = 0;
-
-       /* Wake up next waiter, if any */
-       if (!fc->blocked && waitqueue_active(&fc->blocked_waitq))
            wake_up(&fc->blocked_waitq);
+       } else if (!fc->blocked) {
+           /*
+            * Wake up next waiter, if any.  It's okay to use
+            * waitqueue_active(), as we've already synced up
+            * fc->blocked with waiters with the wake_up() call
+            * above.
+            */
+           if (waitqueue_active(&fc->blocked_waitq))
+               wake_up(&fc->blocked_waitq);
+       }

        if (fc->num_background == fc->congestion_threshold && fc->sb) {
            clear_bdi_congested(fc->sb->s_bdi, BLK_RW_SYNC);

commit 4c316f2f3ff315cb48efb7435621e5bfb81df96d
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Sep 28 16:43:22 2018 +0200

    fuse: set FR_SENT while locked

    Otherwise fuse_dev_do_write() could come in and finish off the request, and
    the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...))
    in request_end().

    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai
    Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts")
    Cc: <stable@vger.kernel.org> # v4.2

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c2af8042f176..34976b42f3e1 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1312,8 +1312,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
    }
    list_move_tail(&req->list, &fpq->processing);
    __fuse_get_request(req);
-   spin_unlock(&fpq->lock);
    set_bit(FR_SENT, &req->flags);
+   spin_unlock(&fpq->lock);
    /* matches barrier in request_wait_answer() */
    smp_mb__after_atomic();
    if (test_bit(FR_INTERRUPTED, &req->flags))

commit d2d2d4fb1f54eff0f3faa9762d84f6446a4bc5d0
Author: Kirill Tkhai <ktkhai@virtuozzo.com>
Date:   Tue Sep 25 12:52:42 2018 +0300

    fuse: Fix use-after-free in fuse_dev_do_write()

    After we found req in request_find() and released the lock,
    everything may happen with the req in parallel:

    cpu0                              cpu1
    fuse_dev_do_write()               fuse_dev_do_write()
      req = request_find(fpq, ...)    ...
      spin_unlock(&fpq->lock)         ...
      ...                             req = request_find(fpq, oh.unique)
      ...                             spin_unlock(&fpq->lock)
      queue_interrupt(&fc->iq, req);   ...
      ...                              ...
      ...                              ...
      request_end(fc, req);
        fuse_put_request(fc, req);
      ...                              queue_interrupt(&fc->iq, req);

    Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts")
    Cc: <stable@vger.kernel.org> # v4.2

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 675caed3e655..c2af8042f176 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1877,16 +1877,20 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,

    /* Is it an interrupt reply? */
    if (req->intr_unique == oh.unique) {
+       __fuse_get_request(req);
        spin_unlock(&fpq->lock);

        err = -EINVAL;
-       if (nbytes != sizeof(struct fuse_out_header))
+       if (nbytes != sizeof(struct fuse_out_header)) {
+           fuse_put_request(fc, req);
            goto err_finish;
+       }

        if (oh.error == -ENOSYS)
            fc->no_interrupt = 1;
        else if (oh.error == -EAGAIN)
            queue_interrupt(&fc->iq, req);
+       fuse_put_request(fc, req);

        fuse_copy_finish(cs);
        return nbytes;

commit bc78abbd55dd28e2287ec6d6502b842321a17c87
Author: Kirill Tkhai <ktkhai@virtuozzo.com>
Date:   Tue Sep 25 12:28:55 2018 +0300

    fuse: Fix use-after-free in fuse_dev_do_read()

    We may pick freed req in this way:

    [cpu0]                                  [cpu1]
    fuse_dev_do_read()                      fuse_dev_do_write()
       list_move_tail(&req->list, ...);     ...
       spin_unlock(&fpq->lock);             ...
       ...                                  request_end(fc, req);
       ...                                    fuse_put_request(fc, req);
       if (test_bit(FR_INTERRUPTED, ...))
             queue_interrupt(fiq, req);

    Fix that by keeping req alive until we finish all manipulations.

    Reported-by: syzbot+4e975615ca01f2277bdd@syzkaller.appspotmail.com
    Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts")
    Cc: <stable@vger.kernel.org> # v4.2

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 11ea2c4a38ab..675caed3e655 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1311,12 +1311,14 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
        goto out_end;
    }
    list_move_tail(&req->list, &fpq->processing);
+   __fuse_get_request(req);
    spin_unlock(&fpq->lock);
    set_bit(FR_SENT, &req->flags);
    /* matches barrier in request_wait_answer() */
    smp_mb__after_atomic();
    if (test_bit(FR_INTERRUPTED, &req->flags))
        queue_interrupt(fiq, req);
+   fuse_put_request(fc, req);

    return reqsize;

commit a2477b0e67c52f4364a47c3ad70902bc2a61bd4c
Author: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date:   Tue Jul 17 19:00:33 2018 +0300

    fuse: Don't access pipe->buffers without pipe_lock()

    fuse_dev_splice_write() reads pipe->buffers to determine the size of
    'bufs' array before taking the pipe_lock(). This is not safe as
    another thread might change the 'pipe->buffers' between the allocation
    and taking the pipe_lock(). So we end up with too small 'bufs' array.

    Move the bufs allocations inside pipe_lock()/pipe_unlock() to fix this.

    Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
    Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: <stable@vger.kernel.org> # v2.6.35
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ec83b107c1a0..4a9ace7280b9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1955,12 +1955,15 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
    if (!fud)
        return -EPERM;

+   pipe_lock(pipe);
+
    bufs = kmalloc_array(pipe->buffers, sizeof(struct pipe_buffer),
                 GFP_KERNEL);
-   if (!bufs)
+   if (!bufs) {
+       pipe_unlock(pipe);
        return -ENOMEM;
+   }

-   pipe_lock(pipe);
    nbuf = 0;
    rem = 0;
    for (idx = 0; idx < pipe->nrbufs && rem < len; idx++)

commit 63576c13bd17848376c8ba4a98f5d5151140c4ac
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Jul 26 16:13:11 2018 +0200

    fuse: fix initial parallel dirops

    If parallel dirops are enabled in FUSE_INIT reply, then first operation may
    leave fi->mutex held.

    Reported-by: syzbot <syzbot+3f7b29af1baa9d0a55be@syzkaller.appspotmail.com>
    Fixes: 5c672ab3f0ee ("fuse: serialize dirops by default")
    Cc: <stable@vger.kernel.org> # v4.7
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 56231b31f806..606909ed5f21 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -355,11 +355,12 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
    struct inode *inode;
    struct dentry *newent;
    bool outarg_valid = true;
+   bool locked;

-   fuse_lock_inode(dir);
+   locked = fuse_lock_inode(dir);
    err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name,
                   &outarg, &inode);
-   fuse_unlock_inode(dir);
+   fuse_unlock_inode(dir, locked);
    if (err == -ENOENT) {
        outarg_valid = false;
        err = 0;
@@ -1340,6 +1341,7 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx)
    struct fuse_conn *fc = get_fuse_conn(inode);
    struct fuse_req *req;
    u64 attr_version = 0;
+   bool locked;

    if (is_bad_inode(inode))
        return -EIO;
@@ -1367,9 +1369,9 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx)
        fuse_read_fill(req, file, ctx->pos, PAGE_SIZE,
                   FUSE_READDIR);
    }
-   fuse_lock_inode(inode);
+   locked = fuse_lock_inode(inode);
    fuse_request_send(fc, req);
-   fuse_unlock_inode(inode);
+   fuse_unlock_inode(inode, locked);
    nbytes = req->out.args[0].size;
    err = req->out.h.error;
    fuse_put_request(fc, req);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 582b1756a011..f78e9614bb5f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -975,8 +975,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,

 void fuse_set_initialized(struct fuse_conn *fc);

-void fuse_unlock_inode(struct inode *inode);
-void fuse_lock_inode(struct inode *inode);
+void fuse_unlock_inode(struct inode *inode, bool locked);
+bool fuse_lock_inode(struct inode *inode);

 int fuse_setxattr(struct inode *inode, const char *name, const void *value,
          size_t size, int flags);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0115c2f0a428..2dbd487390a3 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -357,15 +357,21 @@ int fuse_reverse_inval_inode(struct super_block *sb, u64 nodeid,
    return 0;
 }

-void fuse_lock_inode(struct inode *inode)
+bool fuse_lock_inode(struct inode *inode)
 {
-   if (!get_fuse_conn(inode)->parallel_dirops)
+   bool locked = false;
+
+   if (!get_fuse_conn(inode)->parallel_dirops) {
        mutex_lock(&get_fuse_inode(inode)->mutex);
+       locked = true;
+   }
+
+   return locked;
 }

-void fuse_unlock_inode(struct inode *inode)
+void fuse_unlock_inode(struct inode *inode, bool locked)
 {
-   if (!get_fuse_conn(inode)->parallel_dirops)
+   if (locked)
        mutex_unlock(&get_fuse_inode(inode)->mutex);
 }

commit e8f3bd773d22f488724dffb886a1618da85c2966
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Jul 26 16:13:11 2018 +0200

    fuse: Fix oops at process_init_reply()

    syzbot is hitting NULL pointer dereference at process_init_reply().
    This is because deactivate_locked_super() is called before response for
    initial request is processed.

    Fix this by aborting and waiting for all requests (including FUSE_INIT)
    before resetting fc->sb.

    Original patch by Tetsuo Handa <penguin-kernel@I-love.SKAURA.ne.jp>.

    Reported-by: syzbot <syzbot+b62f08f4d5857755e3bc@syzkaller.appspotmail.com>
    Fixes: e27c9d3877a0 ("fuse: fuse: add time_gran to INIT_OUT")
    Cc: <stable@vger.kernel.org> # v3.19
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index eeab70e7904d..0115c2f0a428 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -391,11 +391,6 @@ static void fuse_put_super(struct super_block *sb)
 {
    struct fuse_conn *fc = get_fuse_conn_super(sb);

-   fuse_send_destroy(fc);
-
-   fuse_abort_conn(fc, false);
-   fuse_wait_aborted(fc);
-
    mutex_lock(&fuse_mutex);
    list_del(&fc->entry);
    fuse_ctl_remove_conn(fc);
@@ -1212,16 +1207,25 @@ static struct dentry *fuse_mount(struct file_system_type *fs_type,
    return mount_nodev(fs_type, flags, raw_data, fuse_fill_super);
 }

-static void fuse_kill_sb_anon(struct super_block *sb)
+static void fuse_sb_destroy(struct super_block *sb)
 {
    struct fuse_conn *fc = get_fuse_conn_super(sb);

    if (fc) {
+       fuse_send_destroy(fc);
+
+       fuse_abort_conn(fc, false);
+       fuse_wait_aborted(fc);
+
        down_write(&fc->killsb);
        fc->sb = NULL;
        up_write(&fc->killsb);
    }
+}

+static void fuse_kill_sb_anon(struct super_block *sb)
+{
+   fuse_sb_destroy(sb);
    kill_anon_super(sb);
 }

@@ -1244,14 +1248,7 @@ static struct dentry *fuse_mount_blk(struct file_system_type *fs_type,

 static void fuse_kill_sb_blk(struct super_block *sb)
 {
-   struct fuse_conn *fc = get_fuse_conn_super(sb);
-
-   if (fc) {
-       down_write(&fc->killsb);
-       fc->sb = NULL;
-       up_write(&fc->killsb);
-   }
-
+   fuse_sb_destroy(sb);
    kill_block_super(sb);
 }

commit b8f95e5d13f5f0191dcb4b9113113d241636e7cb
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Jul 26 16:13:11 2018 +0200

    fuse: umount should wait for all requests

    fuse_abort_conn() does not guarantee that all async requests have actually
    finished aborting (i.e. their ->end() function is called).  This could
    actually result in still used inodes after umount.

    Add a helper to wait until all requests are fully done.  This is done by
    looking at the "num_waiting" counter.  When this counter drops to zero, we
    can be sure that no more requests are outstanding.

    Fixes: 0d8e84b0432b ("fuse: simplify request abort")
    Cc: <stable@vger.kernel.org> # v4.2
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c8b197e4af9a..ec83b107c1a0 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -127,6 +127,16 @@ static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background)
    return !fc->initialized || (for_background && fc->blocked);
 }

+static void fuse_drop_waiting(struct fuse_conn *fc)
+{
+   if (fc->connected) {
+       atomic_dec(&fc->num_waiting);
+   } else if (atomic_dec_and_test(&fc->num_waiting)) {
+       /* wake up aborters */
+       wake_up_all(&fc->blocked_waitq);
+   }
+}
+
 static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
                       bool for_background)
 {
@@ -175,7 +185,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
    return req;

  out:
-   atomic_dec(&fc->num_waiting);
+   fuse_drop_waiting(fc);
    return ERR_PTR(err);
 }

@@ -285,7 +295,7 @@ void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req)

        if (test_bit(FR_WAITING, &req->flags)) {
            __clear_bit(FR_WAITING, &req->flags);
-           atomic_dec(&fc->num_waiting);
+           fuse_drop_waiting(fc);
        }

        if (req->stolen_file)
@@ -371,7 +381,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
    struct fuse_iqueue *fiq = &fc->iq;

    if (test_and_set_bit(FR_FINISHED, &req->flags))
-       goto out_put_req;
+       goto put_request;

    spin_lock(&fiq->waitq.lock);
    list_del_init(&req->intr_entry);
@@ -400,7 +410,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
    wake_up(&req->waitq);
    if (req->end)
        req->end(fc, req);
-out_put_req:
+put_request:
    fuse_put_request(fc, req);
 }

@@ -2143,6 +2153,11 @@ void fuse_abort_conn(struct fuse_conn *fc, bool is_abort)
 }
 EXPORT_SYMBOL_GPL(fuse_abort_conn);

+void fuse_wait_aborted(struct fuse_conn *fc)
+{
+   wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
+}
+
 int fuse_dev_release(struct inode *inode, struct file *file)
 {
    struct fuse_dev *fud = fuse_get_dev(file);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5256ad333b05..582b1756a011 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -862,6 +862,7 @@ void fuse_request_send_background_locked(struct fuse_conn *fc,

 /* Abort all requests */
 void fuse_abort_conn(struct fuse_conn *fc, bool is_abort);
+void fuse_wait_aborted(struct fuse_conn *fc);

 /**
  * Invalidate inode attributes
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index a24df8861b40..eeab70e7904d 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -394,6 +394,8 @@ static void fuse_put_super(struct super_block *sb)
    fuse_send_destroy(fc);

    fuse_abort_conn(fc, false);
+   fuse_wait_aborted(fc);
+
    mutex_lock(&fuse_mutex);
    list_del(&fc->entry);
    fuse_ctl_remove_conn(fc);

commit 45ff350bbd9d0f0977ff270a0d427c71520c0c37
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Jul 26 16:13:11 2018 +0200

    fuse: fix unlocked access to processing queue

    fuse_dev_release() assumes that it's the only one referencing the
    fpq->processing list, but that's not true, since fuse_abort_conn() can be
    doing the same without any serialization between the two.

    Fixes: c3696046beb3 ("fuse: separate pqueue for clones")
    Cc: <stable@vger.kernel.org> # v4.2
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8564d91c7d41..c8b197e4af9a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2150,9 +2150,15 @@ int fuse_dev_release(struct inode *inode, struct file *file)
    if (fud) {
        struct fuse_conn *fc = fud->fc;
        struct fuse_pqueue *fpq = &fud->pq;
+       LIST_HEAD(to_end);

+       spin_lock(&fpq->lock);
        WARN_ON(!list_empty(&fpq->io));
-       end_requests(fc, &fpq->processing);
+       list_splice_init(&fpq->processing, &to_end);
+       spin_unlock(&fpq->lock);
+
+       end_requests(fc, &to_end);
+
        /* Are we the last open device? */
        if (atomic_dec_and_test(&fc->dev_count)) {
            WARN_ON(fc->iq.fasync != NULL);

commit 87114373ea507895a62afb10d2910bd9adac35a8
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Jul 26 16:13:11 2018 +0200

    fuse: fix double request_end()

    Refcounting of request is broken when fuse_abort_conn() is called and
    request is on the fpq->io list:

     - ref is taken too late
     - then it is not dropped

    Fixes: 0d8e84b0432b ("fuse: simplify request abort")
    Cc: <stable@vger.kernel.org> # v4.2
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c6b88fa85e2e..8564d91c7d41 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -371,7 +371,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
    struct fuse_iqueue *fiq = &fc->iq;

    if (test_and_set_bit(FR_FINISHED, &req->flags))
-       return;
+       goto out_put_req;

    spin_lock(&fiq->waitq.lock);
    list_del_init(&req->intr_entry);
@@ -400,6 +400,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
    wake_up(&req->waitq);
    if (req->end)
        req->end(fc, req);
+out_put_req:
    fuse_put_request(fc, req);
 }

@@ -2105,6 +2106,7 @@ void fuse_abort_conn(struct fuse_conn *fc, bool is_abort)
                set_bit(FR_ABORTED, &req->flags);
                if (!test_bit(FR_LOCKED, &req->flags)) {
                    set_bit(FR_PRIVATE, &req->flags);
+                   __fuse_get_request(req);
                    list_move(&req->list, &to_end1);
                }
                spin_unlock(&req->waitq.lock);
@@ -2131,7 +2133,6 @@ void fuse_abort_conn(struct fuse_conn *fc, bool is_abort)

        while (!list_empty(&to_end1)) {
            req = list_first_entry(&to_end1, struct fuse_req, list);
-           __fuse_get_request(req);
            list_del_init(&req->list);
            request_end(fc, req);
        }

commit 543b8f8662fe6d21f19958b666ab0051af9db21a
Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date:   Tue May 1 13:12:14 2018 +0900

    fuse: don't keep dead fuse_conn at fuse_fill_super().

    syzbot is reporting use-after-free at fuse_kill_sb_blk() [1].
    Since sb->s_fs_info field is not cleared after fc was released by
    fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds
    already released fc and tries to hold the lock. Fix this by clearing
    sb->s_fs_info field after calling fuse_conn_put().

    [1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658db

    Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Reported-by: syzbot <syzbot+ec3986119086fe4eec97@syzkaller.appspotmail.com>
    Fixes: 3b463ae0c626 ("fuse: invalidation reverse calls")
    Cc: John Muir <john@jmuir.com>
    Cc: Csaba Henk <csaba@gluster.com>
    Cc: Anand Avati <avati@redhat.com>
    Cc: <stable@vger.kernel.org> # v2.6.31
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 48baa26993f3..061500c72608 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1193,6 +1193,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
    fuse_dev_free(fud);
  err_put_conn:
    fuse_conn_put(fc);
+   sb->s_fs_info = NULL;
  err_fput:
    fput(file);
  err:

commit 6becdb601bae2a043d7fb9762c4d48699528ea6e
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu May 31 12:26:10 2018 +0200

    fuse: fix control dir setup and teardown

    syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1].
    Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode()
    failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to
    clear d_inode(dentry)->i_private field.

    Fix by only adding the dentry to the array after being fully set up.

    When tearing down the control directory, do d_invalidate() on it to get rid
    of any mounts that might have been added.

    [1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6
    Reported-by: syzbot <syzbot+32c236387d66c4516827@syzkaller.appspotmail.com>
    Fixes: bafa96541b25 ("[PATCH] fuse: add control filesystem")
    Cc: <stable@vger.kernel.org> # v2.6.18
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index 78fb7a07c5ca..0b694655d988 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -211,10 +211,11 @@ static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
    if (!dentry)
        return NULL;

-   fc->ctl_dentry[fc->ctl_ndents++] = dentry;
    inode = new_inode(fuse_control_sb);
-   if (!inode)
+   if (!inode) {
+       dput(dentry);
        return NULL;
+   }

    inode->i_ino = get_next_ino();
    inode->i_mode = mode;
@@ -228,6 +229,9 @@ static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
    set_nlink(inode, nlink);
    inode->i_private = fc;
    d_add(dentry, inode);
+
+   fc->ctl_dentry[fc->ctl_ndents++] = dentry;
+
    return dentry;
 }

@@ -284,7 +288,10 @@ void fuse_ctl_remove_conn(struct fuse_conn *fc)
    for (i = fc->ctl_ndents - 1; i >= 0; i--) {
        struct dentry *dentry = fc->ctl_dentry[i];
        d_inode(dentry)->i_private = NULL;
-       d_drop(dentry);
+       if (!i) {
+           /* Get rid of submounts: */
+           d_invalidate(dentry);
+       }
        dput(dentry);
    }
    drop_nlink(d_inode(fuse_control_sb->s_root));

commit 8a301eb16d99983a4961f884690ec97b92e7dcfe
Author: Tejun Heo <tj@kernel.org>
Date:   Fri Feb 2 09:54:14 2018 -0800

    fuse: fix congested state leak on aborted connections

    If a connection gets aborted while congested, FUSE can leave
    nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
    wait spuriously which can lead to severe performance degradation.

    The leak is caused by gating congestion state clearing with
    fc->connected test in request_end().  This was added way back in 2009
    by 26c3679101db ("fuse: destroy bdi on umount").  While the commit
    description doesn't explain why the test was added, it most likely was
    to avoid dereferencing bdi after it got destroyed.

    Since then, bdi lifetime rules have changed many times and now we're
    always guaranteed to have access to the bdi while the superblock is
    alive (fc->sb).

    Drop fc->connected conditional to avoid leaking congestion states.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: Joshua Miller <joshmiller@fb.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: stable@vger.kernel.org # v2.6.29+
    Acked-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 686631f12001..e03ca14f40e9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -385,8 +385,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
        if (!fc->blocked && waitqueue_active(&fc->blocked_waitq))
            wake_up(&fc->blocked_waitq);

-       if (fc->num_background == fc->congestion_threshold &&
-           fc->connected && fc->sb) {
+       if (fc->num_background == fc->congestion_threshold && fc->sb) {
            clear_bdi_congested(fc->sb->s_bdi, BLK_RW_SYNC);
            clear_bdi_congested(fc->sb->s_bdi, BLK_RW_ASYNC);
        }

commit df0e91d488276086bc07da2e389986cae0048c37
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Feb 8 15:17:38 2018 +0100

    fuse: atomic_o_trunc should truncate pagecache

    Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the
    O_TRUNC flag in the OPEN request to truncate the file atomically with the
    open.

    In this mode there's no need to send a SETATTR request to userspace after
    the open, so fuse_do_setattr() checks this mode and returns.  But this
    misses the important step of truncating the pagecache.

    Add the missing parts of truncation to the ATTR_OPEN branch.

    Reported-by: Chad Austin <chadaustin@fb.com>
    Fixes: 6ff958edbf39 ("fuse: add atomic open+truncate support")
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Cc: <stable@vger.kernel.org>

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 24967382a7b1..7a980b4462d9 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1629,8 +1629,19 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
        return err;

    if (attr->ia_valid & ATTR_OPEN) {
-       if (fc->atomic_o_trunc)
+       /* This is coming from open(..., ... | O_TRUNC); */
+       WARN_ON(!(attr->ia_valid & ATTR_SIZE));
+       WARN_ON(attr->ia_size != 0);
+       if (fc->atomic_o_trunc) {
+           /*
+            * No need to send request to userspace, since actual
+            * truncation has already been done by OPEN.  But still
+            * need to truncate page cache.
+            */
+           i_size_write(inode, 0);
+           truncate_pagecache(inode, 0);
            return 0;
+       }
        file = NULL;
    }

i.e. there were many fuse-related fixes, and note in particular commit 63576c13bd17848376c8ba4a98f5d5151140c4ac (fuse: fix initial parallel dirops).

4.15.18 was released on April of 2018 and there were no more stable/4.15.y releases.


Checking Ubuntu kernel for Xenial, compared to v4.15.18 there is some stable patches for fs/fuse/ , but nothing related to parallel dirops:

kirr@deco:~/src/linux/linux$ git log -1 xenial/hwe
commit 79519363579528ec7f369844c8e3d3e8b46f9fa9 (tag: Ubuntu-hwe-4.15.0-47.50_16.04.1, xenial/hwe)
Author: Wen-chien Jesse Sung <jesse.sung@canonical.com>
Date:   Thu Mar 14 22:23:44 2019 +0800

    UBUNTU: Ubuntu-hwe-4.15.0-47.50~16.04.1

    Signed-off-by: Wen-chien Jesse Sung <jesse.sung@canonical.com>
kirr@deco:~/src/linux/linux$ git log -p v4.15.18..xenial/hwe -- fs/fuse/
commit e992e3521885b8bca22e51e4036f6a2a1088d028
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu May 31 12:26:10 2018 +0200

    fuse: fix control dir setup and teardown

    BugLink: http://bugs.launchpad.net/bugs/1807469

    commit 6becdb601bae2a043d7fb9762c4d48699528ea6e upstream.

    syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1].
    Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode()
    failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to
    clear d_inode(dentry)->i_private field.

    Fix by only adding the dentry to the array after being fully set up.

    When tearing down the control directory, do d_invalidate() on it to get rid
    of any mounts that might have been added.

    [1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6
    Reported-by: syzbot <syzbot+32c236387d66c4516827@syzkaller.appspotmail.com>
    Fixes: bafa96541b25 ("[PATCH] fuse: add control filesystem")
    Cc: <stable@vger.kernel.org> # v2.6.18
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Kamal Mostafa <kamal@canonical.com>
    Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index b9ea99c5b5b3..5be0339dcceb 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -211,10 +211,11 @@ static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
    if (!dentry)
        return NULL;

-   fc->ctl_dentry[fc->ctl_ndents++] = dentry;
    inode = new_inode(fuse_control_sb);
-   if (!inode)
+   if (!inode) {
+       dput(dentry);
        return NULL;
+   }

    inode->i_ino = get_next_ino();
    inode->i_mode = mode;
@@ -228,6 +229,9 @@ static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
    set_nlink(inode, nlink);
    inode->i_private = fc;
    d_add(dentry, inode);
+
+   fc->ctl_dentry[fc->ctl_ndents++] = dentry;
+
    return dentry;
 }

@@ -284,7 +288,10 @@ void fuse_ctl_remove_conn(struct fuse_conn *fc)
    for (i = fc->ctl_ndents - 1; i >= 0; i--) {
        struct dentry *dentry = fc->ctl_dentry[i];
        d_inode(dentry)->i_private = NULL;
-       d_drop(dentry);
+       if (!i) {
+           /* Get rid of submounts: */
+           d_invalidate(dentry);
+       }
        dput(dentry);
    }
    drop_nlink(d_inode(fuse_control_sb->s_root));

commit f3a3e0537dcd667abf7e44744db3bb4b7462a3a4
Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date:   Tue May 1 13:12:14 2018 +0900

    fuse: don't keep dead fuse_conn at fuse_fill_super().

    BugLink: http://bugs.launchpad.net/bugs/1807469

    commit 543b8f8662fe6d21f19958b666ab0051af9db21a upstream.

    syzbot is reporting use-after-free at fuse_kill_sb_blk() [1].
    Since sb->s_fs_info field is not cleared after fc was released by
    fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds
    already released fc and tries to hold the lock. Fix this by clearing
    sb->s_fs_info field after calling fuse_conn_put().

    [1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658db

    Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Reported-by: syzbot <syzbot+ec3986119086fe4eec97@syzkaller.appspotmail.com>
    Fixes: 3b463ae0c626 ("fuse: invalidation reverse calls")
    Cc: John Muir <john@jmuir.com>
    Cc: Csaba Henk <csaba@gluster.com>
    Cc: Anand Avati <avati@redhat.com>
    Cc: <stable@vger.kernel.org> # v2.6.31
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Kamal Mostafa <kamal@canonical.com>
    Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ba63dea1b057..91e3655526ec 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1183,6 +1183,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
    fuse_dev_free(fud);
  err_put_conn:
    fuse_conn_put(fc);
+   sb->s_fs_info = NULL;
  err_fput:
    fput(file);
  err:

commit 840c77082f93f9760b7d1f096abbd61586632790
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Feb 8 15:17:38 2018 +0100

    fuse: atomic_o_trunc should truncate pagecache

    BugLink: http://bugs.launchpad.net/bugs/1807469

    commit df0e91d488276086bc07da2e389986cae0048c37 upstream.

    Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the
    O_TRUNC flag in the OPEN request to truncate the file atomically with the
    open.

    In this mode there's no need to send a SETATTR request to userspace after
    the open, so fuse_do_setattr() checks this mode and returns.  But this
    misses the important step of truncating the pagecache.

    Add the missing parts of truncation to the ATTR_OPEN branch.

    Reported-by: Chad Austin <chadaustin@fb.com>
    Fixes: 6ff958edbf39 ("fuse: add atomic open+truncate support")
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Kamal Mostafa <kamal@canonical.com>
    Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d41559a0aa6b..fa4009761a7a 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1629,8 +1629,19 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
        return err;

    if (attr->ia_valid & ATTR_OPEN) {
-       if (fc->atomic_o_trunc)
+       /* This is coming from open(..., ... | O_TRUNC); */
+       WARN_ON(!(attr->ia_valid & ATTR_SIZE));
+       WARN_ON(attr->ia_size != 0);
+       if (fc->atomic_o_trunc) {
+           /*
+            * No need to send request to userspace, since actual
+            * truncation has already been done by OPEN.  But still
+            * need to truncate page cache.
+            */
+           i_size_write(inode, 0);
+           truncate_pagecache(inode, 0);
            return 0;
+       }
        file = NULL;
    }

commit c0e31b21449898319e1aea29d5c83c18214f8d67
Author: Tejun Heo <tj@kernel.org>
Date:   Fri Feb 2 09:54:14 2018 -0800

    fuse: fix congested state leak on aborted connections

    BugLink: http://bugs.launchpad.net/bugs/1807469

    commit 8a301eb16d99983a4961f884690ec97b92e7dcfe upstream.

    If a connection gets aborted while congested, FUSE can leave
    nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
    wait spuriously which can lead to severe performance degradation.

    The leak is caused by gating congestion state clearing with
    fc->connected test in request_end().  This was added way back in 2009
    by 26c3679101db ("fuse: destroy bdi on umount").  While the commit
    description doesn't explain why the test was added, it most likely was
    to avoid dereferencing bdi after it got destroyed.

    Since then, bdi lifetime rules have changed many times and now we're
    always guaranteed to have access to the bdi while the superblock is
    alive (fc->sb).

    Drop fc->connected conditional to avoid leaking congestion states.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: Joshua Miller <joshmiller@fb.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: stable@vger.kernel.org # v2.6.29+
    Acked-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Kamal Mostafa <kamal@canonical.com>
    Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c5b67efc9758..c3912517338f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -385,8 +385,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
        if (!fc->blocked && waitqueue_active(&fc->blocked_waitq))
            wake_up(&fc->blocked_waitq);

-       if (fc->num_background == fc->congestion_threshold &&
-           fc->connected && fc->sb) {
+       if (fc->num_background == fc->congestion_threshold && fc->sb) {
            clear_bdi_congested(fc->sb->s_bdi, BLK_RW_SYNC);
            clear_bdi_congested(fc->sb->s_bdi, BLK_RW_ASYNC);
        }

commit 45f23c59120f4a3c935ab3510c89e467da54bf17
Author: Seth Forshee <seth.forshee@canonical.com>
Date:   Thu Oct 2 15:51:41 2014 -0500

    UBUNTU: SAUCE: (namespace) fuse: Allow user namespace mounts

    Acked-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index e018dc3999f4..ba63dea1b057 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1212,7 +1212,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
 static struct file_system_type fuse_fs_type = {
    .owner      = THIS_MODULE,
    .name       = "fuse",
-   .fs_flags   = FS_HAS_SUBTYPE,
+   .fs_flags   = FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
    .mount      = fuse_mount,
    .kill_sb    = fuse_kill_sb_anon,
 };
@@ -1244,7 +1244,7 @@ static struct file_system_type fuseblk_fs_type = {
    .name       = "fuseblk",
    .mount      = fuse_mount_blk,
    .kill_sb    = fuse_kill_sb_blk,
-   .fs_flags   = FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
+   .fs_flags   = FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 };
 MODULE_ALIAS_FS("fuseblk");

commit 1223588451c6ae950fc254c7531a536a70f029b1
Author: Seth Forshee <seth.forshee@canonical.com>
Date:   Thu Oct 2 15:34:45 2014 -0500

    UBUNTU: SAUCE: (namespace) fuse: Restrict allow_other to the superblock's namespace or a descendant

    Unprivileged users are normally restricted from mounting with the
    allow_other option by system policy, but this could be bypassed
    for a mount done with user namespace root permissions. In such
    cases allow_other should not allow users outside the userns
    to access the mount as doing so would give the unprivileged user
    the ability to manipulate processes it would otherwise be unable
    to manipulate. Restrict allow_other to apply to users in the same
    userns used at mount or a descendant of that namespace. Also
    export current_in_userns() for use by fuse when built as a
    module.

    Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
    Acked-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index ad1cfac1942f..d41559a0aa6b 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1030,7 +1030,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
    const struct cred *cred;

    if (fc->allow_other)
-       return 1;
+       return current_in_userns(fc->user_ns);

    cred = current_cred();
    if (uid_eq(cred->euid, fc->user_id) &&

commit b4d1889491a0b5ceaacc0b850333bbc9d627d43a
Author: Seth Forshee <seth.forshee@canonical.com>
Date:   Thu Jun 26 11:58:11 2014 -0500

    UBUNTU: SAUCE: (namespace) fuse: Support fuse filesystems outside of init_user_ns

    In order to support mounts from namespaces other than
    init_user_ns, fuse must translate uids and gids to/from the
    userns of the process servicing requests on /dev/fuse. This
    patch does that, with a couple of restrictions on the namespace:

     - The userns for the fuse connection is fixed to the namespace
       from which /dev/fuse is opened.

     - The namespace must be the same as s_user_ns.

    These restrictions simplify the implementation by avoiding the
    need to pass around userns references and by allowing fuse to
    rely on the checks in inode_change_ok for ownership changes.
    Either restriction could be relaxed in the future if needed.

    For cuse the namespace used for the connection is also simply
    current_user_ns() at the time /dev/cuse is opened.

    Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index e9e97803442a..b1b8325915d7 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -48,6 +48,7 @@
 #include <linux/stat.h>
 #include <linux/module.h>
 #include <linux/uio.h>
+#include <linux/user_namespace.h>

 #include "fuse_i.h"

@@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
    if (!cc)
        return -ENOMEM;

-   fuse_conn_init(&cc->fc);
+   fuse_conn_init(&cc->fc, current_user_ns());

    fud = fuse_dev_alloc(&cc->fc);
    if (!fud) {
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 17f0d05bfd4c..c5b67efc9758 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -114,8 +114,8 @@ static void __fuse_put_request(struct fuse_req *req)

 static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-   req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
-   req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
+   req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
+   req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
    req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }

@@ -167,6 +167,10 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
    __set_bit(FR_WAITING, &req->flags);
    if (for_background)
        __set_bit(FR_BACKGROUND, &req->flags);
+   if (req->in.h.uid == (uid_t)-1 || req->in.h.gid == (gid_t)-1) {
+       fuse_put_request(fc, req);
+       return ERR_PTR(-EOVERFLOW);
+   }

    return req;

@@ -1222,6 +1226,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
    struct fuse_in *in;
    unsigned reqsize;

+   if (current_user_ns() != fc->user_ns)
+       return -EIO;
+
  restart:
    spin_lock(&fiq->waitq.lock);
    err = -EAGAIN;
@@ -1827,6 +1834,9 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
    struct fuse_req *req;
    struct fuse_out_header oh;

+   if (current_user_ns() != fc->user_ns)
+       return -EIO;
+
    if (nbytes < sizeof(struct fuse_out_header))
        return -EINVAL;

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 24967382a7b1..ad1cfac1942f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -858,8 +858,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
    stat->ino = attr->ino;
    stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
    stat->nlink = attr->nlink;
-   stat->uid = make_kuid(&init_user_ns, attr->uid);
-   stat->gid = make_kgid(&init_user_ns, attr->gid);
+   stat->uid = make_kuid(fc->user_ns, attr->uid);
+   stat->gid = make_kgid(fc->user_ns, attr->gid);
    stat->rdev = inode->i_rdev;
    stat->atime.tv_sec = attr->atime;
    stat->atime.tv_nsec = attr->atimensec;
@@ -1475,17 +1475,17 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
    return true;
 }

-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-              bool trust_local_cmtime)
+static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+              struct fuse_setattr_in *arg, bool trust_local_cmtime)
 {
    unsigned ivalid = iattr->ia_valid;

    if (ivalid & ATTR_MODE)
        arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
    if (ivalid & ATTR_UID)
-       arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
+       arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
    if (ivalid & ATTR_GID)
-       arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
+       arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
    if (ivalid & ATTR_SIZE)
        arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
    if (ivalid & ATTR_ATIME) {
@@ -1646,7 +1646,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,

    memset(&inarg, 0, sizeof(inarg));
    memset(&outarg, 0, sizeof(outarg));
-   iattr_to_fattr(attr, &inarg, trust_local_cmtime);
+   iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
    if (file) {
        struct fuse_file *ff = file->private_data;
        inarg.valid |= FATTR_FH;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d5773ca67ad2..364e65c8ed36 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -26,6 +26,7 @@
 #include <linux/xattr.h>
 #include <linux/pid_namespace.h>
 #include <linux/refcount.h>
+#include <linux/user_namespace.h>

 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -466,6 +467,9 @@ struct fuse_conn {
    /** The pid namespace for this mount */
    struct pid_namespace *pid_ns;

+   /** The user namespace for this mount */
+   struct user_namespace *user_ns;
+
    /** Maximum read size */
    unsigned max_read;

@@ -870,7 +874,7 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 /**
  * Initialize fuse_conn
  */
-void fuse_conn_init(struct fuse_conn *fc);
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns);

 /**
  * Release reference to fuse_conn
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 624f18bbfd2b..e018dc3999f4 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -171,8 +171,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
    inode->i_ino     = fuse_squash_ino(attr->ino);
    inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
    set_nlink(inode, attr->nlink);
-   inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
-   inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
+   inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
+   inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
    inode->i_blocks  = attr->blocks;
    inode->i_atime.tv_sec   = attr->atime;
    inode->i_atime.tv_nsec  = attr->atimensec;
@@ -477,7 +477,8 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
    return err;
 }

-static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
+static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
+             struct user_namespace *user_ns)
 {
    char *p;
    memset(d, 0, sizeof(struct fuse_mount_data));
@@ -513,7 +514,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
        case OPT_USER_ID:
            if (fuse_match_uint(&args[0], &uv))
                return 0;
-           d->user_id = make_kuid(current_user_ns(), uv);
+           d->user_id = make_kuid(user_ns, uv);
            if (!uid_valid(d->user_id))
                return 0;
            d->user_id_present = 1;
@@ -522,7 +523,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
        case OPT_GROUP_ID:
            if (fuse_match_uint(&args[0], &uv))
                return 0;
-           d->group_id = make_kgid(current_user_ns(), uv);
+           d->group_id = make_kgid(user_ns, uv);
            if (!gid_valid(d->group_id))
                return 0;
            d->group_id_present = 1;
@@ -565,8 +566,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
    struct super_block *sb = root->d_sb;
    struct fuse_conn *fc = get_fuse_conn_super(sb);

-   seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
-   seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
+   seq_printf(m, ",user_id=%u", from_kuid_munged(fc->user_ns, fc->user_id));
+   seq_printf(m, ",group_id=%u", from_kgid_munged(fc->user_ns, fc->group_id));
    if (fc->default_permissions)
        seq_puts(m, ",default_permissions");
    if (fc->allow_other)
@@ -597,7 +598,7 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq)
    fpq->connected = 1;
 }

-void fuse_conn_init(struct fuse_conn *fc)
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
 {
    memset(fc, 0, sizeof(*fc));
    spin_lock_init(&fc->lock);
@@ -621,6 +622,7 @@ void fuse_conn_init(struct fuse_conn *fc)
    fc->attr_version = 1;
    get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
    fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
+   fc->user_ns = get_user_ns(user_ns);
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);

@@ -630,6 +632,7 @@ void fuse_conn_put(struct fuse_conn *fc)
        if (fc->destroy_req)
            fuse_request_free(fc->destroy_req);
        put_pid_ns(fc->pid_ns);
+       put_user_ns(fc->user_ns);
        fc->release(fc);
    }
 }
@@ -1061,7 +1064,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)

    sb->s_flags &= ~(SB_NOSEC | SB_I_VERSION);

-   if (!parse_fuse_opt(data, &d, is_bdev))
+   if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
        goto err;

    if (is_bdev) {
@@ -1086,8 +1089,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
    if (!file)
        goto err;

-   if ((file->f_op != &fuse_dev_operations) ||
-       (file->f_cred->user_ns != &init_user_ns))
+   /*
+    * Require mount to happen from the same user namespace which
+    * opened /dev/fuse to prevent potential attacks.
+    */
+   if (file->f_op != &fuse_dev_operations ||
+       file->f_cred->user_ns != sb->s_user_ns)
        goto err_fput;

    fc = kmalloc(sizeof(*fc), GFP_KERNEL);
@@ -1095,7 +1102,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
    if (!fc)
        goto err_fput;

-   fuse_conn_init(fc);
+   fuse_conn_init(fc, sb->s_user_ns);
    fc->release = fuse_free_conn;

    fud = fuse_dev_alloc(fc);
navytux commented 5 years ago

/cc @Canonical-kernel

navytux commented 5 years ago

By the way: libfuse enables parallel dirops whenever kernel declares support for it:

https://github.com/libfuse/libfuse/blob/fuse-3.4.2-4-g4ebf27a/lib/fuse_lowlevel.c#L1902-L1903

It thus should not be go-fuse alone who should be facing such problem. I wonder though why there is no similar (to my knowledge) libfuse issue (/cc @Nikratio).

Or probably it is still not a kernel bug, but some locking graph becoming a cycle due to e.g. in-gocryptfs locking...

hanwen commented 5 years ago

should we disable parallel dirops for now?

rfjakob commented 5 years ago

Or maybe gate by kernel version?

rfjakob commented 5 years ago

Yes, https://git.kernel.org/linus/63576c13bd17848376c8ba4a98f5d5151140c4ac really looks like the culprit. I'll try to apply it to the Ubuntu kernel.

navytux commented 5 years ago

I suggest we indeed consider disabling parallel dirops, but first check if applying https://git.kernel.org/linus/63576c13bd helps. @rfjakob, as you will be trying it, and if it will indeed make gocryptfs work, it would be very useful if we could have a small self-contained unit test that excersizes the problem. Then we should be sure whether parallel dirops work on a particular kernel by running just go-fuse tests. It would be good, if the issue could indeed be fixed for Xenial, to include the patch (and better other FUSE patches too) into Ubuntu kernel. /cc @Canonical-kernel

rfjakob commented 5 years ago

I compiled and tested inside the Ubuntu 16.04 osboxes VM:

And I'll try to come up with a small test case.

Nikratio commented 5 years ago

It's quite possible that there is a similar issue in libfuse, but no one has run into it yet.

navytux commented 5 years ago

@rfjakob, @Nikratio, thanks for feedback. Looking forward to the test case.

rfjakob commented 5 years ago

(last comment deleted, wrong thread, sorry)

Strace of non-hanging gvfs-udisks2-volume-monitor: https://gist.github.com/rfjakob/902cf036095cebfa3bd48e20ad140530

rfjakob commented 5 years ago

Got it. Testcase is in the PR https://github.com/hanwen/go-fuse/pull/288 .

navytux commented 5 years ago

@rfjakob, thanks for the test. I confirm it soometimes hang for me under qemu-runlinux with 4.15.18 kernel. It hangs only if I run the whole nodefs testsuite, and does not hang if I run only TestParallelDiropsHang. There should be something to improve on the testcase, but anyway it is a good start. Thanks for preparing it.

navytux commented 5 years ago

Reported bug to Ubuntu about their Bionic and Xenial/HWE kernels missing needed FUSE patches:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1823972

navytux commented 5 years ago

Update: stable fuse patches are applied to bionic/master-next. Xenial/hwe is not (yet ?) updated.

http://patchwork.ozlabs.org/patch/1084833/#2157577 https://lists.ubuntu.com/archives/kernel-team/2019-April/100347.html

navytux commented 5 years ago

Update: fuse patches propagated to xenial/hwe kernel as well. Both bionic and xenial/hwe updates are not yet released as packages.

navytux commented 5 years ago

Update: Ubuntu released kernel package in -proposed and they ask us to confirm whether the bug is gone or not:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1823972/comments/22

@rfjakob, could you please test proposed kernel? They say if we don't test in 5 working days they will drop fuse fixes from intended to release kernel (if I understood correctly).

Thanks beforehand, Kirill

rfjakob commented 5 years ago

Tested. I left

while sleep 0.1 ; do go test -run TestParallelDiropsHang ; done

for a while. This would usually hang after <5 iterations, we are at 100 now. Looks good.

$ apt-cache policy linux-generic-hwe-16.04
linux-generic-hwe-16.04:
  Installed: 4.15.0.49.70
  Candidate: 4.15.0.49.70
  Version table:
 *** 4.15.0.49.70 500
        500 http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     4.15.0.48.69 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages

$ uname -a
Linux osboxes 4.15.0-49-generic #52~16.04.1-Ubuntu SMP Thu Apr 25 18:54:26 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
navytux commented 5 years ago

@rfjakob thanks for verifying and for providing feedback to upstream. Let's hope that fixed kernel will now propagate to both bionic and to xenial/hwe.

navytux commented 5 years ago

Upstream claims that fixed kernel package has been released: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1823972/comments/24

@rfjakob, could you please confirm we are done now?

navytux commented 5 years ago

@rfjakob, could you please confirm we are done now?

ping, @rfjakob.

rfjakob commented 5 years ago

Hmm, this is somewhat messed up here

osboxes@osboxes:~$ sudo apt install linux-generic-hwe-16.04
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 linux-generic-hwe-16.04 : Depends: linux-headers-generic-hwe-16.04 (= 4.15.0.50.71) but 4.15.0.51.72 is to be installed
E: Unable to correct problems, you have held broken packages.

osboxes@osboxes:~$ sudo apt-cache policy linux-generic-hwe-16.04
linux-generic-hwe-16.04:
  Installed: (none)
  Candidate: 4.15.0.50.71
  Version table:
     4.15.0.50.71 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages

osboxes@osboxes:~$ apt-cache policy linux-headers-generic-hwe-16.04
linux-headers-generic-hwe-16.04:
  Installed: 4.15.0.51.72
  Candidate: 4.15.0.51.72
  Version table:
 *** 4.15.0.51.72 100
        100 /var/lib/dpkg/status
     4.15.0.50.71 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
rfjakob commented 5 years ago

Ok, uninstalling linux-headers-generic-hwe-16.04 fixed that problem, because the installed version was from xenial-proposed. I'm at 4.15.0-51-generic now and

while sleep 0.1 ; do go test -run TestParallelDiropsHang ; done

runs fine. Closing the ticket.

navytux commented 5 years ago

@rfjakob, thanks for confirming.

rfjakob commented 4 years ago

Oh dear, this thing is back, this time on 4.15.0-1028-gcp that Travis CI seems to be running: https://travis-ci.org/hanwen/go-fuse/jobs/639412963#L13

navytux commented 4 years ago

@rfjakob, thanks for heads up. If I read it correctly 4.15.0-1028-gcp was released 1 year ago in Feb 2019 (https://packages.ubuntu.com/xenial/linux-image-4.15.0-1028-gcp -> changelog), while Ubuntu kernel team picked the patch into 4.4 only in July and the patch was also included into 4.15.0-1031.33~16.04.1 (http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-gcp/linux-gcp_4.15.0-1052.56/changelog ; search for "fuse: fix initial parallel dirops" there) only in April.

I also see that linux-gcp package in Xenial is marked to receive security updates and its latest version depends on 4.15.0.1052.66 . From this point of view, isn't the image that was used on Travis CI too outdated, because it was missing even so many security fixes? In other words the kernel there is just too old and by updating the kernel to latest LTS minor version there, the bug should be gone.

Kirill

rfjakob commented 4 years ago

What do you think of https://github.com/rfjakob/go-fuse/commit/3a025b27d3b78387a9de75a862e8f76b3febbaa5 and leaving all this 4.15 misery behind?

navytux commented 4 years ago

I think it is ok. 4.15 is not maintained as LTS by upstream kernel team, and imho Ubuntu just cannot keep up on cherry-picking everything. I don't know why they based Xenial hwe on 4.15 instead of 4.14 which is maintained as LTS by upstream kernel team.

rfjakob commented 4 years ago

Pushed to https://review.gerrithub.io/c/hanwen/go-fuse/+/483578 for review

navytux commented 4 years ago

I think we should close this now again.

navytux commented 4 years ago

I think we should close this now again.

ping @rfjakob, @hanwen.

navytux commented 4 years ago

Thanks