Closed zombiezen closed 8 years ago
Does this happen every time? Does this happen only when connecting to specific servers?
I don't have another server I can check against at the moment, but it does happen every time I connect to this particular server. It was working fine before the update.
This doesn't happen for me, and I've been running this code for many weeks (it was in the "dev" release). I believe it is something about your server config, but I need to narrow down what that might be before I can realistically fix it. There were a lot of changes I made to the C++ code, so I'm not totally surprised there's some breakage.
Alternately, if I could have shell access to the machine, I might be able to debug it. I know that's a huge ask. We can talk privately to get it setup, and I can send you my public SSH key to put in ~/.ssh/authorized_keys.
Note to self: See if I can get a stack trace on crash in the JS log.
Actually, it looks like the crash happens before you enter your password... is that correct? If so, maybe all I need is the hostname of the machine to reproduce the problem. You probably don't want to put that in the bug either, so we can discuss privately.
I use public key auth only, so I don't type in a password. I'll contact you off-thread with details.
FTR, here's what we know from our off-thread discussion:
I am able to connect to the VM you setup, and I cannot replicate the issue with either version. Furthermore, it is quite peculiar that the main release would behave differently from the dev release, as they are both currently built from the same source (and the build differences are purely cosmetic).
All this leads me to believe that the problem is with your installation of the app. I would like you to try a few things:
I'll try 1&2 when I get home. Number 3 still causes crash.
1) Crash. 2) No crash (Linux x86-64, version 43.0.2357.37). 3) Crash.
The Chrome installation having this problem is ChromeOS version 43.0.2357.32 (ARM). I'm wondering if that is the issue.
Interesting suggestion about ARM. I have an ARM Chromebook l keep around just for testing such things, but I hadn't tested this build. I just tested it on 43.0.2357.32 beta, but no crash when connecting to your VM.
I want to doublecheck: If you login with a different account on the same Chromebook and install the app, it still crashes?
One last thing: Try logging into the account you created for me to ensure that there's nothing special about your account.
Correct, on my same Chromebook and installing the app on a different account, it still crashes.
I logged into your account on the GCE instance and it crashes.
OK, well, I'm running out of ideas, other than to blame your Chromebook. Maybe Chrome caches compiles, even across profiles, and even after reinstalling the app? Gosh, that's hard to believe. But I'm not left with much else. I'd be very interested to know if anyone else is experiencing this problem.
I will be updating Mosh for Chrome soon to incorporate an outstanding pull request. It'll be interesting to see if pushing an update fixes your problem. In the interim, I suggest just using the dev release; it is always built from head, and is not for experiments, so it should be relatively stable.
I'm having the same issue on my chromebook. Version 41.0.2272.102 Platform 6680.78.0 (Official Build) stable-channel daisy_spring Firmware Google_Spring.3824.129.0
I tried in another profile on this same chromebook, which appears to have a different version, 0.2.8.30. That version/profile works for me.
My regular profile with 0.2.9.17 fails in the same manner as the reporter. I am not using keys in this case.
I do have other machines I could attempt against if it will help,
@jsimpsoncd Thanks for the report... that's very helpful. Interesting that it is also an ARM machine (just like the one I have). Can you log into the other profile with 0.2.8.30, and give it some time to update to the new version, then see if the problem appears there? Also, can you install the dev version to see if that breaks?
@rpwoodbu so after letting it update, fails on my other profile as well. And from the logs on the server I do actually see what looks like a successful login before it bails on the client side. I'll add the results of using the dev version once I've completed that.
I'm a bit unclear on how to utilize the dev release on a chromebook. I'm prodding around to figure out how, and that's fine, but if you fill in some details I'll probably get there sooner.
@jsimpsoncd Do you mean where to get the dev version? Search the Chrome Web Store for "Mosh (dev)".
Sorry, didn't realize it was that simple. That said, dev fails with "Resolution failed: -110"
That sounds like a separate problem. Can you ssh to the machine with Secure Shell or the ssh CLI in crosh?
I tried from the other profile on this chromebook. That failed with the same NaCl crash as before when using dev mosh.
I can ssh using the "secure shell" chrome extension to the box in question.
Check your spelling of the hostname when connecting with the dev version. "Resolution failed: -110" usually means you spelled it wrong (or that there's no A record at that name, perhaps just AAAA).
My HP Chomebook 11 gets the same NaCL crash (regardless of server), Mosh (dev) works fine.
Thanks, @scottweston, for that report! It is really useful to have more folks indicating this is a wider problem. The thing is, though, this is just getting weirder and weirder.
I just downloaded both .crx files from the Chrome Web Store (through the developer's console), and compared them. They are nearly identical, except for the expected changes in the manifest.json (i.e., version number and name), and a few innocuous timestamps in the hterm files. Importantly, the .pexe binaries are exactly the same.
And now I'm getting a report that Chrome 41 on ARM is having this problem as well as Chrome 43. @jsimpsoncd can you doublecheck that? All other reports have been Chrome 43 (beta). @scottweston, what version are you running? If we really are seeing this on different versions of Chrome, then that might absolve Chrome.
What would be left, then, is some externality. I did make a change which added a new field for the mosh-server command. Maybe the crash is happening in there. Maybe I'm not converting the old field info properly, and I'm sending something to NaCl that it can't swallow. So I need to know these things:
NB: The app being installed (perhaps due to Chrome Sync) is distinct from the app having been run. The app won't leave a trace until it has been run.
In the mean time, I'll stare at the code and see if I can find a potential crash bug. And again, thanks to all for your reporting and diligence.
Extra credit: Open the JS console on the Mosh background page (get to it from chrome://extensions) and run this command:
chrome.storage.local.get(["field_server-command", "field_remote-command", "field_command"], function(o) { console.log(o); });
...and paste the output. You may need to expand to see all the output. You may not get any output, or you may only get a line or two.
I get the same crash in 0.2.9.16 mosh (dev) under my main profile. No remote commands setup.
I've now had the same issue on both chrome 41 and 42.
@rpwoodbu Correct. When I tested on Linux, I had never run Mosh for Chrome on that machine.
I did not override the remote command at that time, but I had overridden it in the past. Strange that it would keep that across installs, since I had fully uninstalled/reinstalled Mosh.
Output from extra credit:
Object {}
(I confirmed that this is the output for an empty Object in Chrome.)
@jsimpsoncd This is the first I'm hearing that the dev version crashes. Is anyone else experiencing this? It would actually make a lot more sense if it crashed just the same as the stable version.
Thanks for the details about the remote commands. I was really hoping that was it, but it doesn't seem like it.
I'm going to spend some time this weekend to see if I can teach PNaCl to output a stack trace on crashes. That will help a lot. More to come. Thanks for hanging in there.
I have managed to reproduce the issue with my HP Chromebook 11 (ARM running Chrome OS 43 beta), connecting to my own server. I even did it with the dev release. So I'm in a much better position to run down this problem.
I spent the better part of today arguing with NaCl, trying to get any kind of debugging support (I've gotten by with printf-style debugging so far). It seems to border on the impossible to get a stack trace from the stupid thing without connecting with gdb. After a lot of effort, I finally managed to get gdb working, but it required me to update to a newer NaCl SDK, and I haven't been able to reproduce the crash with that build. I'll try more tomorrow, but maybe the solution will simply be to release a build with the newer SDK. I've had crash bugs on ARM before that didn't happen on x86_64 (although never when using PNaCl (Portable Native Client, one binary to rule them all), so I'm not terribly surprised that this has only been seen on ARM.
More to come tomorrow.
I have released 0.2.9.62 to "Mosh (dev)". It changes these things:
I can't make it crash, but then again, I can't seem to get the stable version of Mosh to crash, whereas you folks can. So please do your best to crash this one. What happened to me with the old dev version is that it worked, and then suddenly it started crashing, and never stopped until it loaded the update. So keep banging on it! If we can't make it crash in a few days, then I'll do a release to stable.
I did note one useful tidbit while observing the old dev version crash: It gets as far as starting mosh-server
remotely, but then crashes without ever sending a single packet to it. That makes me wonder if the crash is happening at thread creation time. This is the area where I had seen ARM instability in the past. This gives me hope that building against a newer Pepper/SDK will solve the problem.
Anyone still seeing crashes? I'm still planning on doing a stable release tomorrow unless I hear that this doesn't help.
0.2.9.62 in Mosh dev does not crash for me.
Sadly I haven't had time to do more debugging, but I did manage to install the latest dev version on my Chromebook - it's been stable, no crashes for me.
OK, this is rolling out now to the stable track. Thanks for all your help. I hope this doesn't come up again, but if it does, please mention it on this bug, and I'll reopen it.
I'm not sure this is fixed. I just got a comment on the web store saying that this is crashing on ARM still.
I hadn't switched back to stable from using dev, just update stable to the latest one and can confirm it's crashing on my Chromebook 11. Looks to be the same error as before:
init: hterm
Error in response to storage.get: TypeError: Cannot read property 'length' of undefined
at Object.lib.encodeUTF8 (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/hterm/hterm_deps.js:4388:26)
at hterm.Terminal.IO.print.hterm.Terminal.IO.writeUTF16 (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/hterm/hterm.js:7787:22)
at Object.callback (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/mosh_window.js:117:30)
at mosh.CommandInstance.run (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/mosh_window.js:115:23)
at hterm.Terminal.runCommandClass (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/hterm/hterm.js:5256:16)
at terminal.onTerminalReady (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/mosh_window.js:40:14)
at null.<anonymous> (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/hterm/hterm.js:4762:37)
at null.<anonymous> (chrome-extension://ooiklbnjmhbcgemelgfhaeaocllobloj/hterm/hterm.js:5135:7)
getpid()
Mosh(): Calling mosh_main
NativeClient: NaCl module crashed
Mosh NaCl crashed.
Object__proto__: Object__defineGetter__: __defineGetter__() { [native code] }__defineSetter__: __defineSetter__() { [native code] }__lookupGetter__: __lookupGetter__() { [native code] }__lookupSetter__: __lookupSetter__() { [native code] }constructor: Object() { [native code] }hasOwnProperty: hasOwnProperty() { [native code] }isPrototypeOf: isPrototypeOf() { [native code] }propertyIsEnumerable: propertyIsEnumerable() { [native code] }toLocaleString: toLocaleString() { [native code] }toString: toString() { [native code] }valueOf: valueOf() { [native code] }arguments: nullcaller: nulllength: 0name: "valueOf"__proto__: Empty() {}<function scope>get __proto__: __proto__() { [native code] }arguments: nullcaller: nulllength: 0name: "__proto__"__proto__: Empty() {}<function scope>set __proto__: __proto__() { [native code] }arguments: nullcaller: nulllength: 1name: "__proto__"__proto__: Empty() {}<function scope>
(captured via chrome://inspect)
Perhaps another useful data point I just noticed: the ssh part is working - I can see it successfully authing on the server but interestingly there is no mosh-server
spawned like I would expect to see. I'll try turning up my sshd
logging and see I can catch what it does on the server side before it crashes.
If anyone is feeling adventurous, the new version of both stable and dev tracks has a binary with debug symbols in it, so if you want to try to get a stack trace with GDB, that'd be wonderful. I'll also try to repro on my HP11 when I get a chance.
Here's very short instructions (probably missing a step, but hopefully will get you going):
Then, assuming you got Mosh from Chrome and built it, you'd start the debugger like so (sitting in the src/ directory):
$ ../build/nacl_sdk/pepper_43/toolchain/linux_x86_newlib/bin/x86_64-nacl-gdb -x script.gdb
Then type "continue", and when it crashes, type "backtrace". I worry that it'll be a big pile of uselessness due to problems with NaCl debugging, but worth a shot.
@scottweston I noticed the same thing, and mentioned it a few days ago in this bug. It gets as far as actually starting mosh-server
, and you can see from the JS log you posted that it is, in fact, trying to execute the mosh client code just before it crashes.
One thing I haven't tried, but is worth a shot: What happens if you use another ssh client to start mosh-server
, then start Mosh for Chrome in manual mode, copy/pasting the key and port material? Does it crash in that case? I suspect it will, and if so, will absolve all the ssh code.
@rpwoodbu you are correct :) And it crashes regardless of if the MOSH_KEY is correct or not.
I managed to make a new build crash on my HP11. I rebuilt it (no changes), reloaded it, and no more crash. This has got to be a Chrome bug. Of course, the debugger was no help. And of course, when I built it with no optimizations, etc., it didn't crash. But maybe if I kept trying that, it would. Ugh.
I'm asking around among the NaCl experts to see if I can get any guidance.
Today I tried to get on again and could not from either of my profiles with either mosh or mosh dev. I reset the chromebook back to factory, and after doing so mosh fails but mosh dev does work.
This is an HP CB2
Thanks for that update, @jsimpsoncd. Unfortunately, I'm not going to have any time to spend on this for a number of days, maybe a week. If anyone else wants to try to delve into it, please "check my math" on the use of pthreads, particularly as it relates to thread creation and any special considerations in NaCl.
Chrome Info: Version 43.0.2357.65 beta Platform 6946.44.0 (Official Build) beta-channel daisy_freon Firmware Google_Snow.2695.132.0
chronos@localhost / $ uname -a Linux localhost 3.8.11 #1 SMP Tue May 12 21:41:03 PDT 2015 armv7l SAMSUNG EXYNOS5 (Flattened Device Tree) GNU/Linux
Linux SSH server info: OpenSSH_6.2p2, OpenSSL 1.0.1e 11 Feb 2013
goliath:~ # uname -a Linux goliath 3.11.6-4-desktop #1 SMP PREEMPT Wed Oct 30 18:04:56 UTC 2013 (e6d4a27) x86_64 x86_64 x86_64 GNU/Linux
Mosh Info: v0.2.9.17 v0.2.11.72 Both exhibit this crash using password credentials. Different accounts tried. v0.2.9.62 works fine.
Hope this helps with the debugging.
Experiencing the same error for mosh and mosh dev. I get the same chrome://inspect stacktrace @scottweston reported.
CHROME VERSION: 42.0.2311.153 GOOGLE_RELEASE: 6812.88.0 hw_platform: SAMSUNG EXYNOS5 (Flattened Device Tree)
mosh v0.2.11.52 mosh v0.2.11.72 mosh-dev v0.2.9.62 all have the problem.
I don't think it matters but I'm using keys not passwords.
(I'll add that mosh was working for me before the update.)
Been broken for me for a couple weeks now.
As indicated in my recent post, I haven't had time to deal with this for the last week or so. I may be able to start taking a look at it again this weekend. Bear in mind, though, that this looks like a Chrome bug, so the best I can hope to do is work around it, but it isn't easy. I don't use ARM Chromebooks (because they're terrible), but I know a lot of folks do; I want to help ARM users, but I'm not feeling the same pain. This is an open source project, and help on this is greatly appreciated.
I'm pushing out v0.2.11.90 to the dev release right now. Please test and report so I can get this out to stable! I have a good idea what's going on now.
It would seem that the PNaCl translation step, which takes a portable bitcode file and compiles it for the target architecture, is to blame. I have been distributing the bitcode file, allowing the browser to do this translation itself (this is why it sometimes takes a while to start after an update). But I have shown that the translator does not always build correct binaries.
I can (and do now) do this translation step a priori, and include binaries for each architecture in one package. But unfortunately, the pnacl-translate tool that comes with the SDK is just as broken as that which comes with the browser. But at least now I can test what I build and make sure that it doesn't crash, and just rebuild if it does, instead of depending on the whims of the translator on everyone's machines.
I now have compiled binaries, built from the same bitcode, that reliably crash or do not crash, so I'll bring it up with the NaCl folks.
Thanks for all your patience. I'm sorry that I haven't had much time to work on this, but as you can see, this is a weird one... and in the words of Han Solo, "It's not my fault!".
@rpwoodbu Thanks so much for spending the time to track this down! This sounds like it will benefit more Chrome apps than just Mosh.
You rock @rpwoodbu. Thanks for making such an awesome tool and being so on it.
I confirm it's fixed in dev.
It works for me as well, running on a Samsung Chromebook on ARM-basis. The error first appeared after a ChromeOS update, so I assume that providing a correctly linked binary resolved the problem.
Thanks, @rpwoodbu. As already mentioned: you rock! :-)
The update seems to have crashed my Mosh client.
Log from
mosh_window.html
:Output displayed in terminal window: