osxfuse / osxfuse

FUSE extends macOS by adding support for user space file systems
https://osxfuse.github.io/
Other
8.74k stars 514 forks source link

FUSE filesystem hangs irrevocably on typo in mount options, blocks shell #793

Closed DanielSmedegaardBuus closed 6 months ago

DanielSmedegaardBuus commented 3 years ago

Hi :)

I was playing around with squashfs-fuse, and created a test squashfs, pictures.squashfs, then mounted it successfully under ~/mnt, and had a look around.

I then wanted to find out how to make everything r/w (I'll be using UnionFS on top of squashfs, and I want everything to be writable to everyone). Muddled around on the interwebs, then tried mounting with this typo:

squashfuse -o dmesk=0000,fmask=0111 pictures.squashfs mnt

Which resulted in fuse: unknown option 'dmesk=0000'. And then zsh hung. Like, I can't type anything. Weird. Opened a new shell. The instant I start zsh, it hangs (I assume due to oh-my-zsh doing some autocomplete stuff as soon as I start pressing keys). Using bash, I can type things — and commands — but anything that "touches" the mountpoint hangs irrevocably. ls, umount, sudo umount -f, diskutil unmount force, you name it. Everything hangs.

I've tried killing processes that relate to fuse or mount, but to no avail. They just go into zombie mode, and everything still hangs. The main culprit being, daniel 20158 0.0 0.0 0 0 s003 ?E 1:39PM 0:00.00 (mount_macfuse).

How on earth do I recover from this? Pardon me if this sounds abrasive, it isn't meant to, but isn't the main point of FUSE to avoid stuff like this? That is, keep away from kernel-space so you cannot cause lockups like this?

Do I really have to reboot? Because I'm halfway through a 4-terabyte file operation that I would then have to start all over, losing about 20 hours of computational work :/ That's okay for now, but if this is expected behavior going forward, that's pretty depressing news, because I have big plans for squashfs and unionfs on this rig :D

Here's hoping this is a fixable bug :)

MacFUSE 4.0.5, squashfuse 0.1.103.

Cheers :)

dlaxar commented 3 years ago

Hi,

it's probably not very helpful but I've had a remarkably similar issue with ssfs + osxfuse (both via Homebrew today). In fact even successful mounts seem to "ruin" mountpoints/folders forever. ls, mv, ... all of them hang. Next step I can't open any windows on my machine, then the browser tabs hang (although top gives me plenty of free CPU, mem, ...). At last the whole file system in Finder is unuseable. Only a hard reboot "fixes" it.

Let me know if there's anything I can do to help Daniel

ahuber21 commented 3 years ago

I experience the same issues and I noticed that in these cases the macfuse process is waiting in uninterruptible sleep. I'd be happy to help if there's anything I can do.

invokermoon commented 3 years ago

Hi,

it's probably not very helpful but I've had a remarkably similar issue with ssfs + osxfuse (both via Homebrew today). In fact even successful mounts seem to "ruin" mountpoints/folders forever. ls, mv, ... all of them hang. Next step I can't open any windows on my machine, then the browser tabs hang (although top gives me plenty of free CPU, mem, ...). At last the whole file system in Finder is unuseable. Only a hard reboot "fixes" it.

Let me know if there's anything I can do to help Daniel

I experienced the same issues in Mac book pro 16 (11.2.3 Big Sur). When I install the macFuse and sshfs, when I want type sshfs with a long parameters, my Mac will block in most Application, and disconnect with Internet。

kavehv commented 3 years ago

I run into the same issues with sshfs + macfuse (4.1.0) on 11.2.3. I'm also confused as I thought the whole point of non-kernel-space drivers like this was to prevent something like this from killing the system slowly like this.

ashleyvansp commented 3 years ago

I have the same experience with SSHFS 2.5 / macFUSE 4.1.0 / Mac OS Catalina 10.15.7. When using sshfs Terminal hangs and my Mac blocks most other applications & internet access until I do a hard reboot.

NelsonVides commented 3 years ago

I've had the same issues on osxfuse, macfuse, macos catalina, and bigsur, and many combinations of the previous thereof. The whole OS filesystem hangs and the computer doesn't even turn off, all I can do is a hard restart, not even hours of waiting can fix the issue.

epaterlini commented 3 years ago

Same problem here. If I try to mount an sshfs folder and make a typo in the password, the system also hangs. The system hangs if I don't nicely unmount sshfs folder. Reboot is the only solution....very hard to work in this scenario...

Please help!!!

riga commented 3 years ago

Have to pile up on this one ...

I tested with a fresh MacOS 11.4 (BigSur) on a M1 MBP, a non-M1 MBP and a M1 Mini, all with latest XCode. sshfs --version tells me

SSHFS version 2.5 (OSXFUSE SSHFS 2.5.0)
FUSE library version: 2.9.9
fuse: no mount point

and I checked both the currently recommended 4.1.2 release and the 4.2.0 pre-release. As soon as I start to mount the normal way via

sshfs HOST:REMOTE_DIR LOCAL_DIR -o reconnect,cache=yes,follow_symlinks

and the password prompt appears, my system already blocks, i.e. mails will not load, I cannot open new browser tabs, etc. After entering my password everything behaves normal again, but if I decide to cancel the process (ctrl+c) instead of typing my password, the blocking behavior remains in place. There is no sshfs process running after cancelling (as expected) but I see a macfuse zombie process which is not reacting to any kill signal,

> ps ax | grep fuse
 6883 s001  ?E     0:00.00 (mount_macfuse)

It's directly assigned to launchd so there is in fact no other option than rebooting the machine, which is not a very pleasant workflow.

I tend to think that it's not just me, since also none of my colleagues managed to get it working properly (all with the same minimal setup). Also, I'm using sshfs / {osx,mac}fuse since many years and I've always been very happy with it :)

@bfleischer Do you have any idea of what might be going on? As my entire research workflow depends heavily on sshfs, I'm happy to help investigating / testing.

aknoerig commented 3 years ago

One more here. In my case, the cause turned out to be a duplicate entry in my .ssh/config file. Once I cleaned that up, everything worked fine again.

Also looking at other reported issues, it seems that it's crucial to not cause any error (password, vpn connection, ssh config,...) during the connection process.

invokermoon commented 3 years ago

Need help...

mforbes commented 3 years ago

This is still a major issue. For the record, I am using Mojave 0.14.6 (18G9323)

$ sshfs --version
SSHFS version 2.5 (OSXFUSE SSHFS 2.5.0)
FUSE library version: 2.9.9
schthms commented 2 years ago

any news on this one?

msilin commented 2 years ago

Any workaround for this?

johnsalmon commented 2 years ago

This is not a problem with sshfs or squashfuse. Even the most trivial FUSE filesystems, compiled from hello.c, null.c or hello_ll.c in the examples/ directory of the fuse library can cause system-wide trouble.

Lacking the source code to both MacOS and MacFUSE, it's difficult to be sure what's going on. After compiling and running some code based on hello_ll.c with some extra logging thrown in, my best guess is that when fuse_mount() is called, MacOS internal data structures are updated to record the existence of the new filesystem. System calls, e.g., mount(2), statfs(2), probably others, expect all mounted fileystem to be "working", and if one of them fails to respond, the system call may hang. It's only a matter of time before critical system utilities, e.g.,

/System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Support/mds /usr/libexec/diskarbitrationd /System/Library/CoreServices/coreservicesd

hang on a call to mount(2) or statfs(2), at which point the entire OS becomes unstable. It can't even be shut down cleanly because the shutdown process calls mount(2). The only recourse is a hard power-cycle.

I have a hack that papers over the problem for filesystems that use the lowlevel API. The diff below against example/hello_ll.c captures the essence of it. But it only prevents the most egregious issue (a non-priviliged user locking up the entire OS simply by mis-spelling a command-line option). And only for filesystems that use the lowlevel API (not sshfs). Worse, it doesn't really solve the problem. There are other ways to easily and accidentally lock up the entire OS. E.g., if the init, getattr or statfs callbacks hang or fail to return (which can happen easily with a remote filesystem).

If you compile hello_ll.c with the patch below, then you can safely run it with mis-spelled or unknown arguments. E.g.,

hello_ll -oWILL_HANG_MACHINE_IF_NOT_PATCHED mountpoint

But if you try that without the patch, you're likely to find yourself with a machine that can only be shut down with a hard power cycle.

It's hard to even suggest a fix. One way might be to impose a timeout on the kernel-driver side. I.e., if a request remains outstanding for too long (a few minutes?) then the driver deems the user-space layer "non-responsive" and replies to the kernel with an appropriate error (EIO, ENOTSUP, ENOTCONN). The goal would be to prevent mount(2), statfs(2) and others from hanging indefinitely, even if the user-space daemon has gone AWOL.

aleksandrs-ledovskis commented 2 years ago

@johnsalmon Thank you for bringing this detailed technical insight to light. You write about some special patch of "hello_ll.c" but no file/diff is to be found in your post. Could you share that too?

johnsalmon commented 2 years ago

Sorry, this should have been appended to my last post.

--- a/hello_ll.c    2021-10-17 11:51:45.000000000 -0400
+++ b/hello_ll.c    2021-10-17 11:47:23.000000000 -0400
@@ -167,15 +167,49 @@
        if (se != NULL) {
            if (fuse_set_signal_handlers(se) != -1) {
                fuse_session_add_chan(se, ch);
                err = fuse_session_loop(se);
                fuse_remove_signal_handlers(se);
                fuse_session_remove_chan(ch);
            }
            fuse_session_destroy(se);
+       }else{
+           // On MacOS, once we've called fuse_mount, we *MUST*
+           // commit to actually mounting a filesystem.  That
+           // means going through *ALL* the steps of
+           //  fuse_lowlevel_new
+           //  fuse_session_add_chan
+           //  fuse_session_loop
+           //  fuse_session_remove_chan
+           //  fuse_session_destroy
+           //  fuse_unmount.
+           //
+           // Furthermore, we MUST allow the filesystem to "run"
+           // for long enough that various daemons (finder,
+           // CoreLibraryServices, etc., etc.) can talk to
+           // it.  They don't have to learn anything from it!
+           // It's OK for them to get ENOTSUP.  But they
+           // MUST GET SOMETHING.  So this looks like about
+           // the least we can do.
+           char* av[]={argv[0], "-d", NULL}; // the "-d" is unnecessary, but informative
+           struct fuse_args null_args = FUSE_ARGS_INIT(2, av);
+           struct fuse_lowlevel_ops null_ll_oper = {};
+           se = fuse_lowlevel_new(&null_args, &null_ll_oper, sizeof(null_ll_oper), NULL);
+           fuse_session_add_chan(se, ch);
+           // Starting the loop is "easy".  But how do we get
+           // out?  Forking a process to call unmount after a
+           // couple of seconds seems to do the trick...
+           if(fork() == 0){
+           sleep(2);
+           unmount(mountpoint, 0);
+           exit(0);
+           }
+           err = fuse_session_loop(se);
+           fuse_session_remove_chan(ch);
+           fuse_session_destroy(se);
        }
        fuse_unmount(mountpoint, ch);
    }
    fuse_opt_free_args(&args);

    return err ? 1 : 0;
 }
mikecuoco commented 2 years ago

I'm having the same issue with macFuse 4.4.0 on Monterey Version 12.3.1. I'm using the following sshfs version.

$ sshfs --version
SSHFS version 2.5 (OSXFUSE SSHFS 2.5.0)
FUSE library version: 2.9.9

Has this been addressed?

ksherlock commented 2 years ago

An alternative approach, which works for both low level and high level, is to use fuse_opt_parse to whitelist known good options. Any unspecified options will call the callback function with FUSE_OPT_KEY_OPT so you can detect it early. The downside is there seem to be around 150 valid options (but then again, chances are nobody knows or cares about 95% of them).

static const struct fuse_opt my_options[] = {
    FUSE_OPT_KEY("rdonly",       FUSE_OPT_KEY_KEEP),
    FUSE_OPT_KEY("rw",           FUSE_OPT_KEY_KEEP),
    FUSE_OPT_KEY("ro",           FUSE_OPT_KEY_KEEP),
    FUSE_OPT_KEY("-f",           FUSE_OPT_KEY_KEEP),
    FUSE_OPT_KEY("-s",           FUSE_OPT_KEY_KEEP),
    FUSE_OPT_KEY("-d",           FUSE_OPT_KEY_KEEP),
    FUSE_OPT_KEY("debug",        FUSE_OPT_KEY_KEEP),
    ...

    // ... other options you'll process yourself

    FUSE_OPT_END
};

static int my_options_proc(void *data, const char *arg, int key, struct fuse_args *outargs) {

    if (key == FUSE_OPT_KEY_OPT) {
        warnx("unknown option '%s'", arg);
        return -1;
    }
    // normal option processing ... 
}

int main(int argc, char **argv) {
    struct fuse_args args = FUSE_ARGS_INIT(argc, argv);

    if (fuse_opt_parse(&args, NULL, my_options, my_options_proc) < 0) {
        help();
        exit(EX_USAGE);
    }
    // ....
}
augustebaum commented 1 year ago

Hello, any news on this?

samskiter commented 1 year ago

Think I'm hitting this issue RN. basic commands like lsof are hanging completely - guess I need to shut down...

ghost commented 1 year ago

This is a rather serious issue, it still exists on macFUSE 4.5.0 and macOS Ventura/Sonoma. I keep hitting it with my NTFS external HDD and cloud storage using rclone. In my case the issue is not a typo in the mount command, but rather the remote becoming unresponsive.

@bfleischer Any chance of taking a look at this, perhaps implementing a timeout in the kernel driver as @johnsalmon suggested? I'd take a stab at it myself (although I have no experience in macOS kernel extension development) but unfortunately macFUSE is no longer open source. This is one of the disadvantages of going closed source, if you don't have the time issues simply go unfixed for years instead of someone else being able to pick up the slack for you. I could even consider making a small donation if you could try to fix this issue, but nothing huge because I'm still a student with not much income.

coolcoder613eb commented 7 months ago

Sorry for not adding anything to the discussion, but please fix this!

bfleischer commented 6 months ago

The issue will be addressed by the next macFUSE release. I've added a 10 seconds timeout for the FUSE_INIT handshake. This should give file systems plenty of time to respond to the FUSE_INIT message.

When fuse_mount() is called, the file system is mounted and a FUSE_INIT message is sent to the daemon running in user space. When a new volume is mounted, macOS immediately calls VFSOP_GETATTR. The attributes we return from this call are being cached by macOS and cannot be updated after the handshake has been completed. That's why we need to block VFSOP_GETATTR and most other file system operations until the the FUSE_INIT handshake has been completed. Earlier macFUSE releases waited indefinitely for the handshake to complete. This caused the hangs. Now, in case the handshake has not been completed after 10 seconds, the file system is marked as dead. Dead file systems can be accessed and unmounted without any hangs.

Just for clarification, by default, there is a 60 seconds timeout for messages that the kernel extension sends to the daemon running in user space (See https://github.com/macfuse/macfuse/wiki/Mount-Options#daemon_timeout). FUSE_INIT is exempt from this timeout.

bfleischer commented 6 months ago

The new 4.6.2 release addresses the hang. From the release notes:

Perform FUSE_INIT handshake synchronously during mount(2) operation. When performing the handshake synchronously, we avoid hangs and we are able to error out of the mount process in case the handshake fails.

We used to perform the handshake asynchronously. This has drawbacks and introduced several challenges. Most importantly, there is no guarantee the handshake will ever be completed by the file system daemon. This could result in lingering mount points.

Performing the handshake synchronously makes initialization more robust.

Performing the handshake synchronously means the volume will not even be mounted in case of typos in the mount options. This is a better solution than just adding a timeout for the handshake.

Please let me know in case you still run into mount issues with the new release.