microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.4k stars 819 forks source link

WSL2 localhost access is intermittent with stuck connections #4340

Closed benc-uk closed 4 years ago

benc-uk commented 5 years ago

Please fill out the below information:

Opening a URL served via a Node.js app using localhost, from a Windows browser results in page/URL never loading. It will spin indefinitely trying to load, rather than getting an error that site/page can not be found

Hitting stop and then refresh in browser will result in page loading OK Page will continue to load OK, if you send requests to it quickly, if you wait a few seconds the problem will return and the URL and site will never load

I have verified the following:

This seems to be a TCP socket issue with the way WSL 2 handles the new localhost bridging out to Windows

It is trivial to reproduce. Install Node.js and run the following simple server

const http = require('http')
const requestHandler = (request, response) => {
  console.log(`### Request for ${request.url}`)
  response.end('<h1>Hello Node.js Server!</h1>')
}
const server = http.createServer(requestHandler)
server.listen(3000, '0.0.0.0', (err) => {
  if (err) return console.log('### Something bad happened', err)
  console.log(`### Server is listening on 3000`)
})

See our contributing instructions for assistance.

benc-uk commented 5 years ago

Here is a short screen capture illustrating what I am seeing

benc-uk commented 5 years ago

Confirmation this is a TCP keep alive issue Setting the following

server.keepAliveTimeout = 0

Resolves the issue

The setting keepAliveTimeout is described in more detail here but it changes how TCP keepalives are handled, setting to zero disables keepalives

But this change shouldn't be necessary and has other implications

mdragosv commented 5 years ago

10.0.18945.1001 (I am using WSL 2 mode) Same here

benc-uk commented 5 years ago

Phew thought I was going crazy! It's such a specific narrow problem...

I'm sure it's WSL 2 at fault here, as accessing over the WSL eth0 IP the problem goes away, so it's somehow related to whatever network trickery they use to get localhost working

stefankummer commented 5 years ago

Facing the same issue with apache.

Windows build : 10.0.18945.1001 with Ubuntu on WSL2 fresh install.

Accessing a website on localhost works only once in two. Like benc-uk with nodejs.

Turning off KeepAlive in apache conf resolve the issue.

benc-uk commented 5 years ago

Thanks
I can also reproduce this, got the same problem with Apache/httpd. Good to know it's not just Node.js at fault here

Seeing the same "one request in every two" weird behavior

benhillis commented 5 years ago

Thanks for posting. I will dig into this.

ghost commented 5 years ago

I'm facing a similar issue, using NodeJs (v12.4.0) it only works once after that I get ERR_CONNECTION_REFUSED, the same in my Elixir(1.9.1)-Phoenix project. Direct connection using the ip works just fine.

benc-uk commented 5 years ago

Just confirms it happens with Java Tomcat too

karthikv commented 5 years ago

I've been experiencing this issue with node.js servers as well. Thanks @benc-uk for mentioning the stopgap solution with keepAliveTimeout.

For those using webpack-dev-server who may find this issue, you can set the keepAliveTimeout in your webpack.config.js using the devServer.before hook, as shown below. Then your assets should be served without issue.

devServer: {
  before: function(app, webpackServer) {
    // We override the listen() function to set keepAliveTimeout.
    // See: https://github.com/microsoft/WSL/issues/4340
    // Original listen(): https://github.com/webpack/webpack-dev-server/blob/f80e2ae101e25985f0d7e3e9af36c307bfc163d2/lib/Server.js#L744
    const { listen } = webpackServer
    webpackServer.listen = function(...args) {
      const server = listen.call(this, ...args)
      server.keepAliveTimeout = 0
      return server
    }
  }
}
ghost commented 5 years ago

I'm not sure, but my problem has been softened with this: sudo -i && echo 200000 > /proc/sys/net/ipv4/tcp_keepalive_time && echo 100 > /proc/sys/net/ipv4/tcp_keepalive_intvl && echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes In Elixir everything is working until react-apolo crash the connection somehow.

benc-uk commented 5 years ago

Thanks @karthikv I've used a similar workaround when serving directly in Express

var server = require('http').createServer(app);
server.keepAliveTimeout = 0; // This is a workaround for WSL v2 issues
server.listen(port);

Which I'm using rather than the more normal app.listen(port);

I'd rather not have to workaround this in all my code, and I'm sure it has performance implications

cvkmohan commented 5 years ago

I am facing the same problem with rails application. Reloading problem. This keepaliveTimeOut is node specific. Anything that needs to be done for a rails app? I did try persistent_timeout 0 for puma configuration but not working.

tomfakes commented 5 years ago

I am facing the same problem with rails application. Reloading problem. This keepaliveTimeOut is node specific. Anything that needs to be done for a rails app? I did try persistent_timeout 0 for puma configuration but not working.

For Rails with Puma, I'm using the eth0 IP address without problem. Localhost works for a few requests and then stops. The IP address works pretty well so far.

Edit: Actually, IP address almost works great. I have a subdomain I need to access in my app, `foo.localhost:3000' works nicely in Chrome (except that localhost is now completely broken). Now I'm manually changing the Windows HOSTS file (\Windows\System32\drivers\etc\hosts for reference) to make this work

ghost commented 5 years ago

@tomfakes, try this: https://github.com/shayne/wsl2-hacks This one, install a service to update hosts file automatically: https://github.com/shayne/go-wsl2-host

villasv commented 5 years ago

@benc-uk and @karthiv suggestion did not work for me with restify. I was able to get my things done by setting that on Firefox itself...

tomfakes commented 5 years ago

I just upgraded to 18956 and my 'localhost' access from Windows to Ubuntu seems to be much more reliable so far

xylinq commented 5 years ago

I also updated to 18956, but didn't see any changes, this problem still exists.

koopmac commented 5 years ago

Confirming the same issue using flask and docker-compose. Sporadic connection from localhost and interestingly no connection at all when I try 127.0.0.1. Both domains work fine on docker desktop.

tomfakes commented 5 years ago

I've mostly switched to using named hosts in the Windows hosts file. This week I'm working on multiple cross-domain apps, so having a broken localhost actually works to my advantage.

But for short tests, localhost is more reliable this version than it was in 18950.

montymole commented 5 years ago

still breaks in 18956

kabascolby commented 5 years ago

same Issue with phpmyadmin weird when you manipulating data.

coffenbacher commented 5 years ago

This defeats the purpose of the localhost feature for me; in fact, it's even worse than not having it, since I wasted a lot of time reloading and then experimenting before finding this issue. So hopefully this gets a high priority, fixing it is more important than adding the initial feature was, at least for my use-case!

That said, localhost access is crucial overall, so I really hope this gets fixed rather than removed. Thanks to the WSL team for noting the need in the first place 😄

dbroadhurst commented 5 years ago

This is a critical issue. I can't even connect Studio3T to my docker MongoDB on localhost. Frustratingly my App ran for a while and then stopped. Using the VM IP works so I guess I could update my host's file but the IP can change so that's going leave me needing to keep updating the host's file.

Got to say the overall developer experience when using VSCode is awesome, really can't wait until everything is working.

jordaofranca commented 5 years ago

It is happening to me with both webpack-dev-server and apache, after I updated to build 18956.

benhillis commented 5 years ago

I believe I have a fix for this. Does somebody have a test app that I could use to validate this? I've tried running nodejs servers and clicking refresh and that seems to be working great with my fix.

benc-uk commented 5 years ago

@benhillis Venerable old Apache httpd exhibits the behavior and is easy to test in Docker

docker run --rm -d -p 8080:80 -v "$PWD":/usr/local/apache2/htdocs/ httpd

(Note. don't add -it to the run command it makes httpd fall over for some reason)

benhillis commented 5 years ago

Thanks @benc-uk, my change seems to be working very well with httpd. I'm hopeful this will resolve any issues... ETA for Insider build a few weeks.

coffenbacher commented 5 years ago

Awesome news @benhillis ! Thanks for working on this one. Beggars can't be choosers over here, but if there's any option for a expedited release, you can say at least one user is clamoring for it 😄 This is super annoying for my team on a daily basis. A few weeks is infinitely better than languishing forever though, so great work already.

leofabri commented 5 years ago

I'm facing the same problem. I didn't have much time to dig into it but always having to use the local machine IP is really annoying.

I'm trying to serve Angular 8 apps through WSL2.

$ ng serve

The app is successfully compiled and I get this:

** Angular Live Development Server is listening on localhost:4200, open your browser on http://localhost:4200/ **

Unfortunately, if I visit localhost:4200 I just get a message telling me that the page is unreachable. I also tried to compile with a different --host (127.0.0.1, etc.) but the only one that's working is the address of eth0.

I don't really understand why this happens with ng client. When I serve Docker containers or Go applications, localhost works great. Is there any known solution?

kangzj commented 5 years ago

Awesome news @benhillis ! Thanks for working on this one. Beggars can't be choosers over here, but if there's any option for a expedited release, you can say at least one user is clamoring for it 😄 This is super annoying for my team on a daily basis. A few weeks is infinitely better than languishing forever though, so great work already.

Count me in for clamoring :-D

andresclari commented 5 years ago

On 18963, the connection using direct vm IP on app.test address seems worse and slower. Testing without that hack still doesn't work either.

dbroadhurst commented 5 years ago

@benhillis is it possible to send a message with the version number when the fix is released?

leofabri commented 5 years ago

Is my problem with Angular a common issue? Or did you guys get localhost up and running with ng-cli?

benc-uk commented 5 years ago

@leofabri It's a common issue with a many different services that serve HTTP requests, you basically won't get it working (well) with localhost until it's fixed. Using the eth0 IP is a workaround, e.g.

alias wsl-ip="ip addr show eth0 | grep \"inet 1\" | awk '{print \$2}' | cut -d/ -f1"
ng serve --host $(wsl-ip)

Hopefully in the next couple of Insiders builds this will be fixed and closed

benc-uk commented 5 years ago

Just updated to 18970 hoping it would be fixed. I spotted the WSL kernel version number got a bump to 4.19.59 so I was optimistic, however the problem is still happening 😢

benhillis commented 5 years ago

@benc-uk - A fix is on the way.

dbroadhurst commented 5 years ago

I get mixed results with most services but connecting to a MongoDB with localhost never works. Tried docker and installing on the host. VM IP works

mongodb://localhost:27017/my-db

dsmaher commented 5 years ago

I was having the same issue with kubectl port-forward ... sometimes it works, other times it doesn't. Seems like it works until the port forward times out, and the next run of kubectl port forward doesn't work. I found a workaround that will keep me going until the fix makes it's way to an Insider build... you can run:

wsl.exe --shutdown

and that will kill off the wsl kernel (and all shell instances attached to it). When you restart your shell, everything works again. It's annoying if you have several shell instances running and Docker Desktop WSL2 mode doesn't recover well, but it's workable until a fix lands.

mmoskal commented 5 years ago

Same thing here with 18970 and a node server. I set the keep alive timeout and also the /proc writes mentioned above. It works for a couple minutes, and then it stops. wsl.exe --shutdown helps for another couple minutes. This is with https://github.com/microsoft/pxt-arcade. @benhillis if you need someone internal to test let me know (mimoskal@ms).

benhillis commented 5 years ago

Build 18975 should improve this experience greatly.

benc-uk commented 5 years ago

Just tested on 18975 and my Node app is back to normal, removed my workaround code and all is good. Tried a Java Spring Boot app (Tomcat) and also no problems. Also repeated my Apache 2 test

All good! 😍

Thanks for fixing!

tuananh commented 5 years ago

I tested with Node and ruby. All works now. Thanks

thomasaull commented 5 years ago

I just installed 18980 today and experiencing the same issues… Starting up a simple PHP Development Server with php -S 0.0.0.0:8001. My page lodas, but the Spinner in the browser never stops. Also every request in the PHP log uses another port (not sure if this is the default behaviour though):

grafik

tuananh commented 5 years ago

I just installed 18980 today and experiencing the same issues… Starting up a simple PHP Development Server with php -S 0.0.0.0:8001. My page lodas, but the Spinner in the browser never stops. Also every request in the PHP log uses another port (not sure if this is the default behaviour though):

grafik

yep, issue seems to come back with 18980 build

thomasaull commented 5 years ago

Any chance I can downgrade to 18975? 😅 @benc-uk Please consider reopening this issue

coffenbacher commented 5 years ago

Wow - thanks for the heads up for the rest of us at least - anyone that hasn't updated already, the 7-day update pause seems like a good choice here.

onomatopellan commented 5 years ago

No problem for me on 18980 but maybe because I'm just using the simple python3 -m http.server for testing.

thomasaull commented 5 years ago

@coffenbacher you could also switch to the slow ring and go back to fast as soon as this issue is resolved again I guess

coffenbacher commented 5 years ago

Slow ring wouldn't downgrade my Windows? WSL seems really stable on 18975 for my use-case so that might be a good option actually.