microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.36k stars 816 forks source link

Envoy pre-compiled binaries not working on Windows Subsystem for Linux #4538

Closed ravgupms closed 4 years ago

ravgupms commented 5 years ago

When I'm trying to run Envoy on WSL, its crashing whenever I hit a request. On talking to Envoy team, they mentioned that WSL doesn't implement getsockname system call correctly in WSL 1

Repro steps:

1) On a Windows machine, enable Windows Subsystem for Linux: link 2) Install Envoy using following steps. https://www.getenvoy.io/platforms/envoy/ubuntu/ 3) Download sample google example config 4) Run Envoy: envoy -c google_com_proxy.v2.yaml 5) Hit curl http://localhost:10000/

image

ravgupms commented 5 years ago

Please find strace logs below: stracelogs.zip

ravgupms commented 5 years ago

I looked further into Envoy code and looks like its failing at setsockopt system call.

Failed assertion: image https://github.com/envoyproxy/envoy/blob/v1.11.1/source/common/network/connection_impl.cc#L239

setsockopt call before the assertion: image

and I can see it being called in the strace logs as well. image

therealkenc commented 5 years ago

It would take a look at the calls on fd 36 leading up to the EINVAL in @ravgupmps' screencap (I haven't).

Mostly going to be academic in this case. That's #ifdef __APPLE__ is telling here. The WSL1 TCP stack is really a Windows TCP stack, and it behaves subtly different in some edge cases versus Real Linux. This is one such case, apparently, although I can't find a dupe in a quick search. The realistic solutions here are going to be: (a) comment out that RELEASE_ASSERT() and recompile. (b) Use WSL2 which has hits own Real Linux TCP stack. Thanks for the dive @ravgupms and @Biswa96.

dceravigupta commented 5 years ago

@therealkenc given WSL1 is still supported, are there any plan to fix such issues? Or moving to WSL2, which is not officially release yet, is the only option?

therealkenc commented 5 years ago

is the only option?

There were two options presented.

I have no special insights on future plans for open WSL1 issues. They are all unique. For this one, some deductive reasoning might be applied given a combination of: (1) A hypothetical fix could not appear in 19H2 since that window closed. (2) WSL2 is out in the next official release cycle 20H1. (3) There isn't yet a root cause fix identified here because no one has dug far enough to find out why a bare minimal test case passes but that TCP_NODELAY in the screencap doesn't (4) The hard-death behavior is at best iffy since there is an Apple work-around to ignore the setsockopt fail outright (5) This isn't the only WSL1 networking gap (6) This (whatever 'this' is exactly) wasn't reported until 2019, meaning it can't be a widespread hard blocker on a large number of apps. (7) The likelihood of a mad rush of thumbs up on the OP or a flood of +1 Me2 is probably...small.

Noting I say "might be applied" because stuff does get fixed all the time. 11,000 likes on the Envoy hub. Fails in a massively popular project like that certainly have a better chance than some niche use case. Bonne chance.