rpwoodbu / mosh-chrome

Mosh for Chrome
GNU General Public License v3.0
372 stars 59 forks source link

Abort with mosh 1.3.0 (v0.5.1) on an ARM Chromebook #151

Open geofft opened 7 years ago

geofft commented 7 years ago

Sorry that I don't have anything more informative here, but I was scrolling and saw a message saying that the mosh client had aborted. I figured I might as well report it in case anyone else sees it.

Are there crash logs or anything on my device that I can get you? Or can I run in a mode that prints stack traces?

The client is app version 0.5.1, mosh version 1.3.0, on CrOS 57.0.2987.146 on a non-jailbroken Asus C100 (veyron_minnie). The server is mosh 1.2.4a (Debian 1.2.4a-1+b2) on Debian Jessie on x86_64 Linux, which was running screen -dr, and alpine inside that screen.

keithw commented 7 years ago

(Sorry, got confused what repo this was in. I don't have much useful to contribute, except to mention that upstream mosh has historically been pretty anti-core-dump for security/paranoia reasons, e.g. https://github.com/mobile-shell/mosh/blob/master/src/crypto/crypto.cc#L291-L308 . But I think we're open to changing this, especially if there's demand/a good reason.)

rpwoodbu commented 7 years ago

Thanks for the report.

What do you mean when you say you were "scrolling"? How reproducible is this? ARM is finicky; this might go away with just a rebuild, sadly.

The only place there might be extra debugging information is in the JavaScript Console for the "mosh_window.html", which you can find via chrome://extensions. But most anything interesting would probably spew into the terminal as well.

geofft commented 7 years ago

Oh, that was unclear, by "scrolling" I mean that I was scrolling through a mail message within alpine, using the up and down keys and letting the application scroll its buffer. (I think it does the thing where curses defines a sub-region of the window to scroll).

This happened after a couple of hours of use without previous incident. I have not been able to reproduce this, either by looking at the same email or doing anything else.

Unfortunately I already closed the terminal window that had aborted and printed the "Mosh NaCl has crashed" text, so I'm not sure if the console still has the right info. Here's the stuff that's in it currentl, if it's useful:

hterm_all.js:186 init: hterm
mosh_window.js:248 Uncaught TypeError: this.moshNaCl_.postMessage is not a function
    at mosh.CommandInstance.onTerminalResize_ (mosh_window.js:248)
    at hterm.Terminal.IO.onTerminalResize_ (hterm_all.js:13671)
    at hterm.Terminal.realizeSize_ (hterm_all.js:11309)
    at hterm.Terminal.onResize_ (hterm_all.js:13466)
    at notifyList (hterm_all.js:8044)
mosh_window.js:248 Uncaught TypeError: this.moshNaCl_.postMessage is not a function
    at mosh.CommandInstance.onTerminalResize_ (mosh_window.js:248)
    at hterm.Terminal.IO.onTerminalResize_ (hterm_all.js:13671)
    at hterm.Terminal.realizeSize_ (hterm_all.js:11309)
    at hterm.Terminal.onResize_ (hterm_all.js:13466)
    at notifyList (hterm_all.js:8044)
mosh_window.html:1 Error in response to storage.get: TypeError: Cannot read property 'length' of undefined
    at Object.lib.encodeUTF8 (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/hterm_all.js:4412:26)
    at hterm.Terminal.IO.print.hterm.Terminal.IO.writeUTF16 (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/hterm_all.js:13717:22)
    at Object.callback (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/mosh_window.js:112:30)
    at mosh.CommandInstance.run (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/mosh_window.js:110:23)
    at hterm.Terminal.runCommandClass (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/hterm_all.js:11053:16)
    at hterm.Terminal.terminal.onTerminalReady (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/mosh_window.js:34:14)
    at hterm.Terminal.<anonymous> (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/hterm_all.js:10524:37)
    at hterm.Terminal.<anonymous> (chrome-extension://hmgggebkhjjkiimkjlknpdgapncghehh/hterm_all.js:10915:7)
12mosh_window.js:181 getpid()
mosh_window.js:181 Mosh(): Calling mosh_main

(followed by a bunch of normal logging output of calls to sigaction and unimplemented DEC private modes)

geofft commented 7 years ago

@keithw I'm more interested in a backtrace / some details of the exception than a full coredump. (Also it's possible that there were exception details, but they were printed wherever my cursor was and I didn't see them... it would be nice to log these to the JS console or somewhere a little more persistent.)

rpwoodbu commented 7 years ago

As you probably guessed, all those errors are "normal" (yeah, sorry for the noise... the JS could stand some cleanup).

If it does happen to you again, definitely check the JS console. If the underlying mosh code spits out anything to STDERR, it'll end up there. (It also puts it in the terminal, but as you suggest, that could end up not being noticed or being unreadable).

geofft commented 7 years ago

I got an abort again. I don't know if it's the same one, but here's the error:

assertion "!islast" failed: file "external/mosh/src/network/network.cc", line 452, function: string Network::Connection::recv()

** abort() called

Mosh NaCl crashed.
Press "x" to close the window.

The JS console has the same messages.

I was actually not using mosh at all during this time, but watching video in the browser.

This assertion is making sure that we don't get EAGAIN on recv() on ... the most recent port? Or something? I don't totally understand what's happening in mobile-shell/mosh@c0092a6e, and also I would not be surprised at all if NaCl and/or newlib is just giving us spurious EAGAINs for fun. @keithw, what's it checking for / is it reasonable to just remove the assert for the Chrome build?

keithw commented 7 years ago

This code is part of the Mosh client's use of multiple datagram sockets, because it wants to start hopping local ports if the connection dies (e.g. in the case of NAT timeout), while still listening on the old ports for a little bit in case the server sent a datagram there.

When select tells the client that one of the sockets is readable, eventually we end up in Connection::recv, where it loops through all of the sockets and calls recvmsg on all of them. For all but the last one, it does a nonblocking read (MSG_DONTWAIT); for the very last socket it does a blocking read because... well, honestly, I think just to match the previous behavior before we added this feature. (It doesn't really matter that it's a blocking read because select told us that one of these sockets was readable, although I have since read that at least on some platforms it is possible to get a spurious wakeup.)

Anyway, the assertion is complaining that it got an EAGAIN or EWOULDBLOCK in the case of a blocking recvmsg, and this is not permissible (http://pubs.opengroup.org/onlinepubs/9699919799/functions/recvmsg.html).

keithw commented 7 years ago

I don't think simply removing that assertion is going to fix the problem, because then you're going to end up at https://github.com/mobile-shell/mosh/blob/mosh-1.3.2/src/network/network.cc#L463 assert( false ).

But maybe it should just say if ( islast ) { throw; } else { continue; } and let the UI display the exception. That would at least make it nonfatal. We could be Moshy and make the exception taunt the user about their non-POSIX platform. :-)

cgull commented 7 years ago

Looking at this again, I want to mention that WinSock only supports non-blocking sockets, and it's entirely plausible that a POSIX-like implementation on Windows might occasionally leak an EAGAIN/EWOULDBLOCK.

I did end up making a change that throws for the never-got-anything case, but I didn't do anything for the original issue here. My thinking is that we should change it too-- I think we should be more robust/resilient in our error handling, and reserve assert() for pure logic errors internal to Mosh. Or just change the logic to always use non-blocking reads, that will probably simplify it anyway.

rpwoodbu commented 7 years ago

I haven't reviewed this issue carefully, but to @cgull's comment, I would add that the Pepper API (which is what I must use from NaCl) also only supports non-blocking sockets. The primary contribution in Mosh for Chrome is a POSIX "adapter" which provides blocking sockets on top of the more JS-esque Pepper API. So it is entirely likely there's a bug in there where I'm spitting out EAGAIN or EWOULDBLOCK by accident.

geofft commented 7 years ago

Aha! Well, if I'm reading this code right, Selector::Select will return if any fd registered with the selector has activity, whether or not it's an argument to the current call to select, no?

https://github.com/rpwoodbu/mosh-chrome/blob/master/mosh_nacl/pepper_posix_selector.cc#L88

That means the EAGAIN can be reproduced by trying to do a blocking read on a network socket while I type something on stdin, or maybe cause a SIGWINCH or something.

Would it help to try running mosh-chrome with a custom binary inside instead of mosh-client?

cgull commented 7 years ago

I committed a change to Mosh that simply removes the use of a blocking recvmsg(). I didn't see much point in doing the same thing two different ways, if one of them causes pain. I think that'll make everybody happy.