microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.26k stars 812 forks source link

Chrome fails to launch #648

Closed Manouchehri closed 8 years ago

Manouchehri commented 8 years ago

Chrome fails to launch.

10.0.14385

/usr/bin/google-chrome

https://gist.githubusercontent.com/Manouchehri/da19ce4b9874c0a893210fa509fc1a33/raw/5fea50f4e81233e1cbe65ccdd92a295352ae9197/strace-chrome-wsl-fail.txt

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo apt-get update
sudo apt-get install google-chrome-unstable
imkow commented 8 years ago

As BashOnWindows is intended to development purposes, rather than normal usage scenarios with GUIs, I think this isn't a problem. Except it's not about GUIs.

Manouchehri commented 8 years ago

@imkow GUIs definitely work. See #637.

Hopper

Gabrielcarvfer commented 8 years ago

The problem is related to socketpair creation. As Chromium relies on sockets for IPC in Linux/BSD/MacOS, you can expect other chromium based stuff to fail (Chrome/Opera/Spotify/...).

aseering commented 8 years ago

For what it's worth, there are a number of existing bugs filed in this bug tracker against WSL sockets. (The WSL folks have been working on fixing them.) If you have time to look through them, it would be interesting to know if this is a known or new underlying issue.

bitcrazed commented 8 years ago

@Manouchehri While we greatly appreciate your taking the time to stretch Bash/WSL and investigate this issue, we are being quite explicit in that we do not support GUI/desktop apps; we are only supporting command-line applications at this time.

That several GUI apps do run (to a greater or lesser degree), is purely a side-effect of the team improving our syscall implementation over time.

We'll be sure to (very) publicly announce if this position changes, but expect that we'll only be supporting command-line scenarios and tools for the foreseeable future.

Manouchehri commented 8 years ago

Well, I was trying to pinpoint the issue with sockets rather than the GUI..

I'll write an example that doesn't use a GUI since apparently that's too distracting.

Gabrielcarvfer commented 8 years ago

Tried some stuff using socketpair, but it worked.

The only thing for sure is that the error happens in a check at line 40 from https://github.com/scheib/chromium/blob/master/content/browser/renderer_host/render_sandbox_host_linux.cc

Gabrielcarvfer commented 8 years ago

Seems related to #546 and #514.

therealkenc commented 8 years ago

There are lots of missing pieces before Chrome will run on WSL. This issue as raised by @Manouchehri is not well formed and I'm glad this has been marked bydesign.

Anyway, I hit the following for starters:

  1. dir_reader_linux.h: WSL doesn't like something about casting tolinux_dirent* (I think; I hit this before I started taking notes). #1769. Can work-around by using DirReaderFallback.
  2. render_sandbox_host_linux.cc: Half-close on socketpair(AF_UNIX, SOCK_SEQPACKET, ...). #546. This can be worked around easily as it happens by just not calling shutdown().
  3. thread_helpers.cc (44): Check failed: 3UL <= task_stat.st_nlink (3 vs. 1). #1267. This module queries proc/self/task/ and doesn't like what it sees. Code in question looks like:

       // At least "..", "." and the current thread should be present.
       CHECK_LE(3UL, task_stat.st_nlink);    // <--- this fails on WSL, commented out check.
       // Counting threads via /proc/self/task could be racy. For the purpose of`
       // determining if the current proces is monothreaded it works: if at any
       // time it becomes monothreaded, it'll stay so.`
       return task_stat.st_nlink == 3;       // <--- wrong return value on WSL (hacked to <= 3)
  4. stack_trace_posix.cc (584): Failed to parse the contents of /proc/self/maps. I haven't looked into why. #708
  5. file_path_watcher_linux.cc: general unhappiness due to lack of inotify in WSL #216. Replace with file_path_watcher_stub.cc
  6. The work around in #637 doesn't take because chrome wants the unix socket for dbus, period. #376 and #1326 etc. Can probably be worked-around but I didn't bother, because...
  7. address_tracker_linux.cc(172): Could not bind NETLINK socket: Invalid argment. #468

I quit at the NETLINK issue. There are some others I haven't listed; notably chrome has it's own hand rolled version of strace built into the debug build that doesn't work for reasons.

therealkenc commented 8 years ago

I should add that the Mozilla people deserve a big kudos for writing sane portable posix code #481. Chrome has no business firing up a file watcher, having a whole IP address tracking subsytem, or poking around /proc. It's a browser, Google.

Manouchehri commented 8 years ago

I disagree that it's a poor question; opening this issue got you to clearly identify ~7 issues for WSL.

therealkenc commented 8 years ago

No, it didn't. Not even a little bit. Four of the seven issues are known to the WSL team. That Google dicks with Linux kernel internals in (1), (3), and (4) is on Google not Microsoft. Seeing what it would take to get Chrome working was an exercise of no value to the WSL team. If any of those three were even marginally inside WSL's scope (like #546) I would have opened an actionable and reproducible test case.

Gabrielcarvfer commented 8 years ago

Google guys don't agree with you. It's a whole OS inside a browser. xD

therealkenc commented 8 years ago

And to be clear, sane is defined as: a test case that demonstrates a failing syscall or /proc /dev /sys interface that blocks use of "common Linux tools (sed, awk, etc.) and many Linux-first tools (Ruby, Python, etc.) to use their toolchain on Windows".

Maybe if the WSL team added a <blink> tag to the FAQ that would help? If you don't know how to post a test case then continue using Ubuntu on a VM with samba shares like everyone else stuck in cross-platform hell.

Manouchehri commented 8 years ago

@therealkenc I am literally the person who added some of the strace guidelines to CONTRIBUTING.md (https://github.com/Microsoft/BashOnWindows/pull/642).. No need for the snide comments.

Anyway, I don't get why asking about the world's most popular browser is so upsetting. It was going to be asked sooner or later.

aseering commented 8 years ago

Folks, this is getting a smidgeon heated...

First -- thanks @Manouchehri for reporting this! It is a thing that WSL cannot currently do. While it's not on the short-term road map, I for one do hope that WSL eventually reaches a level of completeness where it's able to run apps like Chrome.

It's not something that is expected to work in the Anniversary Release. But as has been pointed out on other tickets, there are WSL users who (rightly or wrongly) expect browsers to work now. This ticket documents what their actual status is.

@therealkenc -- your definition of "sane" matches what's expected to work for Windows Anniversary Edition. But this bug tracker has a broader scope than that; it is currently described as encompassing future and long-term work as well. If you would like to narrow its definition to match your description, please post a pull request modifying README.md.

benhillis commented 8 years ago

This is getting out of hand, there's no reason for personal attacks. I'm going to lock this thread and let people calm down.

russalex commented 8 years ago

Thanks @aseering and @benhillis.

We appreciate people pushing the limits to what WSL can do. This tracker has become one of the best places where people can show off and ask questions about areas that are not officially supported (examples #637, #611, and #481). I know I get a kick out of reading these and do not want us to do anything to stifle the creativity.

For the issue, @Manouchehri, thank you for reporting it. If nothing else people who try and run Chrome can see other people's experience. We all know that Chrome is not a priority, but it is still good information to have. Also, for the record, thank you for the help with CONTRIBUTING.md.

The Windows Anniversary Update is pretty well locked down at this point. We have a blog post talking a bit about what comes next. Naturally our priority list will grow for the next release. No spoilers on our conversations but I can promise that we're looking at this forum and the User Voice. More information on that when we have something fun to share.

Krzysztof-Cieslak commented 8 years ago

Hey,

Sorry to revive already closed issue, but haven't wanted to create another issue for basically same problem.

I'd like to just point out that this issue is connected not only with Chrome, but also with any other application using Chromium engine - including any Electron application.

This last group includes popular text editors such as Atom... and amazing MSFT product - VS Code. It would be really nice if you've reconsider fixing this issue to support development tools created by other teams in Microsoft. :)

Best regards, Chris.

iz0eyj commented 8 years ago

Alot of programs fails on shutdown, please fix it (or waiting for the fix, simulate a correct return value)

aseering commented 8 years ago

@iz0eyj -- thanks for your comment! It's very general, though... Are you referring specifically to programs that use Chrome or the Chromium engine?

If so, could you please post more details? If not, or if you're not sure, could you please open a new issue and fill out the issue template?

iz0eyj commented 8 years ago

@aseering in my case the problem is with Vivaldi Browser, which uses exactly like Chrome engine Blink (derived from Webkit), but probably also with Opera you will have the same error.

This is the end of his strace (shutdown at the second row):

socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, [6, 7]) = 0 shutdown(6, SHUT_RD) = -1 EINVAL (Invalid argument) rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 tgkill(2512, 2512, SIGABRT) = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2512, si_uid=1000} --- +++ killed by SIGABRT (core dumped) +++ Annullato (core dump creato) federico@BESTIA:~/Scaricati$

therealkenc commented 8 years ago

Ben says #546 is better in 14915.

iz0eyj commented 8 years ago

@therealkenc I confirm that close seems OK with 14915, but I got a new issue (not socket related) starting Vivaldi Browser. I opened the thread.

therealkenc commented 8 years ago

I've got a partial port of Chrome 49.0.2623.75 limping on 14915 (barely). It is too unstable to be usable at the moment, but if I make further progress I'll post patches and build instructions.

chrome-wsl

mgiuffrida commented 8 years ago

@therealkenc Please do post what you've managed to get so far! I'd be interested to see how far this could get and if there are any underlying bugs/assumptions in Chromium that would be feasible to fix, opinions about reading /proc files aside :-)

therealkenc commented 8 years ago

I'll be sure to post progress if I make any. Right now I am blocked on:

futex(0x7fffda346fbc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7fffda346f90, 2) = -1 
   ENOSYS (Function not implemented)

...with the code in question being pthread_cond_broadcast() here. This new problem is not quite the same as the futex problem in #1006, which is easy to work around.

Which is not all that noteworthy in of itself; it is just where I happened to stop. I've been posting issues if/when I make the effort to do test cases the team can use.

therealkenc commented 7 years ago

I looked at this again last weekend and made a little more progress. My current patch against 52.0.2743.82 is here. The patch is enough to get chrome running without --single-process (ie with multiprocess), by working around #514 and #706. It still has too many problems to be usable, but it is just usable enough to send this post.

The FUTEX_CMP_REQUEUE_PRIVATE #1006 problem I mentioned above was a false positive of sorts. It's still a missing WSL feature, but glibc pthreads seems to fallback awkwardly (I'd hate to say gracefully) to FUTEX_WAKE_PRIVATE, which is supported.

This patch is also enough to get Electron 1.3.8 to launch, but (sadly) not enough for VS Code to fire up yet.

wsl-electron

aseering commented 7 years ago

Ooh, nice progress!

mgiuffrida commented 7 years ago

FWIW the /proc/sys/kernel/shmmax read was removed from chrome back in June.

therealkenc commented 7 years ago

I noticed that, which is part of the reason I didn't submit a bug. Unfortunately the change only landed in Chrome 53 (stable) in September, and VS Code and Atom are still on 52.

Silliest magic number in the history of computing. 18446744073692774399 (or 0xFFFFFFFFFEFFFFFF). That's right: eighteen quintillion, four hundred forty-six quadrillion, seven hundred forty-four trillion, seventy-three billion, six hundred ninety-two million, seven hundred seventy-four thousand, three hundred ninety-nine. Twenty significant digits.

Five min fix if the team wants to pick it up. Has to be done eventually anyway.

tobireif commented 7 years ago

Could I install Chrome on Windows (independent of Bash-On-Windows), and then start that Chrome from the Bash shell? (In oder to open eg .html files from within the shell.)

Also, I have this in my .vimrc, and would hope it will work when I switch to Windows / Bash-On-Windows:

" Go to URL ("Go to Link"):
if has("macunix")
  :map gl :!open -a "Google Chrome" '<cWORD>'<CR>
elseif has("win32") || has("win64")
  :map gl :!chrome "<C-R><C-A>"<CR>
else
  :map gl :!chrome "<cWORD>" &<CR>
endif
tobireif commented 7 years ago

P.S. I want to start GUI apps as Windows apps (installed on Windows), not as Linux apps.

therealkenc commented 7 years ago

Sure, the following from the command prompt will get you right back here. You need to be on an insider build 14951 or better which has the Win32 interop stuff.

"/mnt/c/Program Files (x86)/Google/Chrome/Application/chrome.exe" \
    --profile-directory="Default" https://github.com/Microsoft/BashOnWindows/issues/648
tobireif commented 7 years ago

@therealkenc Thanks!

therealkenc commented 7 years ago

Got a bit further this weekend. Chrome 53.0.2785.143 now runs somewhat stable on 15042 with this small patch. Stable enough to play youtube videos anyway, which was a non-starter in November. It still does an 'awe snap' if you try hard enough. As you can see from the patch, the main blocker for chrome running out of the box remains udev.

The patch is a lot smaller because #1326 was squashed (thanks Sunil). I can't explain precisely what caused the stability improvement. The patch doesn't work around anything new, and the previous one worked around #1326 already. Could be any number of random WSL fixes since November. Could be changes in chromium since 52.0.2743.82.

VSCode, unfortunately, remains obstinate. This is frustrating, because "it should be working now". Sigh. I'll bang on it some more.

ob-screencap:

chrome-wsl-feb27

therealkenc commented 7 years ago

I have written a stub library for libudev.so.1 that enables stock Chrome for Ubuntu to run on WSL 15046. I had mentioned doing this back in September #1121 (message), but Chrome was blocking on bigger problems at the time and can be compiled without udev support (notably for Android). There are still two blocking bugs outstanding, but it turns out they only manifest in debug builds (DCHECK fails). You can head on over to the github page to download and build if you feel lucky.

therealkenc commented 7 years ago

A binary package is also now ready. Low-tech steps for chrome on WSL:

sudo apt-get install ubuntu-desktop wget libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
wget https://github.com/therealkenc/libudev-stub/releases/download/v0.9.0/libudev-stub-0.9.0-WSL.deb
sudo dpkg -i libudev-stub-0.9.0-WSL.deb
google-chrome --no-sandbox

There is (at least) one annoyance, being that chrome does not exit properly when closed. If this happens, hit ctrl-c a couple of times and then run killall chrome nacl_helper to clean up the process leftovers.

So, "workaround-available". Or something.

mgiuffrida commented 7 years ago

@therealkenc surprised you're building with nacl. I'd assume it would be easier to pare the build down.

What about use_udev = false, would that help? there's more stuff like this in build/args/headless.gn; see also gn args <outdir> --list

iz0eyj commented 7 years ago

@therealkenc strange error because X-Server is running: google-chrome --no-sandbox [20419:20419:0305/141529.840803:ERROR:browser_main_loop.cc(272)] Gtk: cannot open display:

I needed also "sudo apt-get install -f" for dependancies

aseering commented 7 years ago

@therealkenc -- ooh, really cool stuff! Thanks for sharing!

therealkenc commented 7 years ago

@iz0eyj I also got the dependency problem and hoped libappindicator1 libindicator7 in the dependencies above would help. I don't think that's a "WSL thing"; the problem pops up for Real Linux if you search google. If you happened to catch what apt-get install -f fixed, please post it and maybe we can get a definitive no-error install recipe.

That is a strange error if X is running. You'd expect that error if the display variable was not set (export DISPLAY=localhost:0). If you just do an xlogo immediately after the fail does it pop up? Or firefox, which gets installed with ubuntu-desktop and also uses Gtk? The error here is from Gtk not Chrome proper.

If firefox runs but chrome does not, try adding --disable-gpu to the chrome commandline. That's a hail mary guess though, because it isn't getting far enough for GPU stuff to choke.

therealkenc commented 7 years ago

@mgiuffrida I am not building it at all now. The libudev-stub is so people can run Google branded chrome out of the box. Yes 'use_udev=0' is the way to go if you want to build chromium yourself, but you'll need this patch I posted earlier, which works around 'use_udev=0' build failures on Desktop Linux. Also note building chromium on WSL is no fun right now because of #1726, and gn has issues, I think because of #1353. GYP builds still work for the time being.

therealkenc commented 7 years ago

@mgiuffrida - I just realized you are the Google guy who posted earlier (welcome!). The 'udev=0' build error with 'OS=="linux"' is with removable_storage_provider_linux.cc and serial_service_impl.cc. It would be nice if those were squashed upstream, as well as for other platforms without udev like (apparently) flatpak. I posted a comment in crbug 284898.

iz0eyj commented 7 years ago

@therealkenc strange... Firefox starts with "export DISPLAY=0.0" but not Chrome. This is the result:

federico@Bestia:~$ google-chrome --no-sandbox
[180:199:0307/132642.951848:ERROR:bus.cc(427)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: File o directory non esistente
Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:108, function pa_mutex_unlock(). Aborting.
Annullato (core dump creato)

Sorry but I don't remember what libraries fixed using -f because I forgot to write it... I know, I'm stupid! I can try to unistall WSL and restart from scratch. Would be possible to do it without install Ubuntu Dssktop? I use Xfce4 (multi window) mixed with the Win 10 Desktop.

Let me know if you need the test from scratch.

iz0eyj commented 7 years ago

P.S.: Great work @therealkenc!!!!

therealkenc commented 7 years ago

Right; a few tips (I'll put up some docs in the repo when I get a chance). If you have problems with pulsecore #486 (message) just do a apt-get purge pulseaudio so chrome doesn't try to use it. [edit] PulseAudio 8 on Xenial 16.04 seems to be an issue, so you'll need a patched version. See instructions below.

Not only is it possible to run without Ubuntu desktop, I haven't even tried using it with Unity and Compiz. I tried (and failed) to explain this over in #637 (message and message). Installing the ubuntu-desktop package is just to get the libraries chrome expects on your system. Your mixed Xfce4 setup should fly.

Your display variable should properly be localhost:0 (ie 127.0.0.1:0), not :0. The latter might work, but technically means X11 over AF_UNIX on /tmp/.X11-unix/X0, not AF_INET over localhost:6000.

Finally, you'll have a better (ie more stable) experience with the latest mesa llvmpipe. I built my own but there's a ppa here. You can see what you are running with glxinfo -B.

No need to track down the install -f further. It appears to be a common enough problem on Real Ubuntu and that's the standard fix. Thanks for trying it out!

therealkenc commented 7 years ago

Oh one more important tip I forgot to mention. Anecdotally, apt-get scripts seem prone to blowing away libudev-stub's libudev.so.1 symbolic link when installing or updating anything that depends on udev. It's just the one link, so you can fix it up manually, or re-install the libudev-stub .deb. You'll know when it happens because chrome just segfaults.

iz0eyj commented 7 years ago
federico@Bestia:~$ sudo apt-get purge pulseaudio
[sudo] password di federico:
Lettura elenco dei pacchetti... Fatto
Generazione albero delle dipendenze
Lettura informazioni sullo stato... Fatto
I seguenti pacchetti sono stati installati automaticamente e non sono più richiesti:
  libfftw3-single3 libpulsedsp libwebrtc-audio-processing-0 pulseaudio-utils rtkit
Usare "sudo apt autoremove" per rimuoverli.
I seguenti pacchetti saranno RIMOSSI:
  libcanberra-pulse* pulseaudio* pulseaudio-module-x11*
0 aggiornati, 0 installati, 3 da rimuovere e 0 non aggiornati.
Dopo quest'operazione, verranno liberati 4.136 kB di spazio su disco.
Continuare? [S/n] s
(Lettura del database... 145904 file e directory attualmente installati.)
Rimozione di libcanberra-pulse:amd64 (0.30-2.1ubuntu1)...
Rimozione di pulseaudio-module-x11 (1:8.0-0ubuntu3.2)...
Rimozione di pulseaudio (1:8.0-0ubuntu3.2)...
Eliminazione dei file di configurazione di pulseaudio (1:8.0-0ubuntu3.2)...
Elaborazione dei trigger per man-db (2.7.5-1)...
federico@Bestia:~$
federico@Bestia:~$ google-chrome --no-sandbox
[20854:20854:0308/142535.026220:ERROR:browser_main_loop.cc(272)] Gtk: cannot open display:
federico@Bestia:~$ export DISPLAY=localhost:0
federico@Bestia:~$ google-chrome --no-sandbox
[20866:20877:0308/142710.449012:ERROR:bus.cc(427)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: File o directory non esistente
Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:108, function pa_mutex_unlock(). Aborting.
Annullato (core dump creato)
federico@Bestia:~$

@therealkenc: Interesting thing, X was running for Xfce but Chrome needs "export DISPLAY=localhost:0" to find it.

Yesterday I then uninstalled Ubuntu Desktop is possible that the problem lies; Tomorrow I will try to install it again and repeat the test, but I think the best thing is to start from scratch with a blank WSL and documenting each step. The browser is something very much carrying the first that users install. But the ideal would be to make it go with Xfce and maybe some additional library because for some install more than 1 GB of stuff then no use when employing the two mixed environments could be a problem.

I'm sorry for my bad english.

therealkenc commented 7 years ago

Starting from scratch doesn't hurt but takes effort. Note I did not have to uninstall pulseaudio, but for some reason the mutex bug doesn't manifest here. So I am a bit stumped.

Possibly a way forward is just to compile pulse yourself, with the one-line fix for HAVE_PTHREAD_PRIO_INHERIT here. There are reasonable build instructions here. You don't need to actually run an audio server. We're just trying to make sure #1006 isn't tripped.

But to be clear I am not doing this either. I haven't actually run audio since middle of last year. I am going to be really sad if after all this, it is silly audio that is blocking. Maybe I should just bundle up libpulse.so in the libudev-stub installer. The whole point was not having to compile stuff.

No worries on your English. Il tuo inglese è migliore del mio italiano.