microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.45k stars 822 forks source link

X apps running in WSL randomly stop displaying on X Server in Windows host #5339

Closed gitbls closed 9 months ago

gitbls commented 4 years ago

Environment

Windows build number: Version 10.0.19041.264
Your Distribution version: Debian from the Windows Store with apt update/upgrade done
Whether the issue is on WSL 2 and/or WSL 1: WSL 2

Steps to reproduce

1) Start an Xserver on the Windows system. I've reproduced this with both Xming and the Cygwin Xserver and with 3 or 4 different X apps (xterm, xclock, xcolors, for instance)

2) Start the Debian WSL app using the icon created when Debian installed

3) ping hostip (192.168.xx.32 in my case) to confirm. It works.

4) DISPLAY=192.168.xx.32:0 xterm &

Expected behavior

The X app displays on the host through the Xserver. X app continues to run until I kill the Xserver or the app. Happiness throughout the kingdom.

Actual behavior

At some random point later, the xterm window simply disppears, with no action on my part. This seems to take a few hours sometimes.

A ps aux in Debian still shows the xterm process as active.

I have tried doing some logging to capture the exact timing, but since the xterm and bash processes are still running, it's difficult to determine exactly when the app stopped displaying.

gitbls commented 4 years ago

I should add that other xterms (running on other non-WSL systems) displaying on the same Xserver continue to work correctly and do not disconnect when the Debian xterm does. And, same behavior exhibited on two different systems.

therealkenc commented 4 years ago

image

gitbls commented 4 years ago

Thanks for the response. So, it's something I'm doing, but I can't imagine what.

I set up a new desktop system since I wrote this issue 2 days ago, and the new system has the same behavior! Here are some screen shots that I hope help demonstrate it.

I'd definitely appreciate assistance or suggestions to chase this down!

image

image

image

therealkenc commented 4 years ago

So, it's something I'm doing, but I can't imagine what.

Not necessarily. I just don't have a repro (which is, counterintuitively, unfortunate).

Okay so if I am following, the two xterms (PID 88 and 122) were alive at 3:30, but one disappeared by 4:47. But the xterms are still running. That is good context, thank you.

Do a fav and re-run the same test as I did: which was, start a xterm & from a command prompt and go to bed. Whatever a 'doit' is, don't. Before and after the bedtime interval, run netstat on both sides to see what those TCP connections are up to, like this:

image

My run was with with VcXsrv. You can try that too, although I don't seriously think that's the variable.

If it survives the night maybe we can start identifying the variable. If it doesn't survive, and you didn't take any action (for real) on either the Windows side or the WSL side before looking at the screen to see if it was still alive, this gets more forlorn to track down.

gitbls commented 4 years ago

Thx! I was going to script something in doit, but didn't, so doit didn't do it ;)

I installed the latest vcxsrv and fired off a test with 2 xterms and a starting netstat. Will see how it does in the morning and update.

gitbls commented 4 years ago

As I mentioned, I started 2 xterms. After 12 hours, one of the xterms had disappeared from the screen, and the other one was still alive and working.

bls@MS-scout~> date Sat 06 Jun 2020 06:36:32 PM PDT bls@MS-scout~> netstat > netstat-start.log bls@MS-scout~> date Sun 07 Jun 2020 07:37:51 AM PDT bls@MS-scout~> netstat > netstat-end.log bls@MS-scout~> jobs [1]- Running xterm & [2]+ Running xterm & bls@MS-scout~> cat netstat-start.log Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 172.21.137.111:50756 scout.starwhite.net:x11 ESTABLISHED tcp 0 0 172.21.137.111:50754 scout.starwhite.net:x11 ESTABLISHED Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path bls@MS-scout~> cat netstat-end.log Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 172.21.137.111:50756 scout.starwhite.net:x11 ESTABLISHED tcp 0 0 172.21.137.111:50754 scout.starwhite.net:x11 ESTABLISHED Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path bls@MS-scout~>

gitbls commented 4 years ago

Re-opening since I hit the wrong button. Sigh!

therealkenc commented 4 years ago

netstat.exe /an | grep 6000

gitbls commented 4 years ago

From the Windows side:

c:\bls> netstat /an | grep 6000 TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP 127.0.0.1:6000 127.0.0.1:64630 ESTABLISHED TCP 127.0.0.1:6000 127.0.0.1:64631 ESTABLISHED TCP 127.0.0.1:6000 127.0.0.1:64632 ESTABLISHED TCP 127.0.0.1:64630 127.0.0.1:6000 ESTABLISHED TCP 127.0.0.1:64631 127.0.0.1:6000 ESTABLISHED TCP 127.0.0.1:64632 127.0.0.1:6000 ESTABLISHED TCP [::]:6000 [::]:0 LISTENING

c:\bls>

therealkenc commented 4 years ago

Thanks. First, what's the third thing you launched (not that it matters much). And what else do you have running. pstree -p.

With VcXsrv at least (probably the other servers too) you get two stacked taskbar icons. Do you have two (or three) or did the icons go down with the missing xterm. You can also maybe look at the VcXsrv logs and see if they show anything interesting. Looks like this:

image

I'll do a run with more than one X11 client myself tonight (if I remember). Don't have high hopes, I forget to close X clients all the time (OTOH I probably wouldn't notice if one disapeared). But at least we'll be in sync.

gitbls commented 4 years ago

The only other thing running is another Debian console window (created from the Debian item in the menu). Here's pstree -A -p

bls@MS-scout/bls> pstree -A -p init(1)-+-init(8)---init(9)-+-bash(10) | -ssh-agent(62) |-init(163)---init(164)---bash(165)---pstree(243) -{init}(7) bls@MS-scout/bls>

There were 2 stacked taskbar icons for the xterms when I started them, but I didn't notice if there were one or two this morning. Guessing there was only one, since the window had disappeared.

The VcXsrv log doesn't have anything useful, just the usual X garp.

One interesting thing: the 2nd xterm window went away this morning while I was out of the office, and there were two actual 'xterm: fatal IO error 110 (connection timed out) error in the Debian console window, and as expected, the jobs no longer show up with a 'jobs' command.

I'll do a Windows reboot and restart the test later this morning...let me know if there's anything else you'd like me to do for this one,.

therealkenc commented 4 years ago

Well you've gotta have the two xterms at least (although I guess they are long gone).

We're going to need to start over. I only just clued in that your netstat.exe has no connections to WSL, only three spurious 127.0.0.1. Your starting position should look like this:

image

That's two sockets on the WSL side, and matching two on the Windows side. The point is to get a before and after, and see if the Windows side is still established or went away.

therealkenc commented 4 years ago

xterm: fatal IO error 110 (connection timed out) error in the Debian console

That's useful, but also complicates things. That isn't the same situation as yesterday; the xterms were both still running and did not error out yesterday. Normally with that error above I'd shrug and say "dunno ask the people who maintain your X server of choice". But that's not it. Frustratingly, it is also hard to finger-point WSL networking (which is the usual suspect and this thread would be shorter). If your whole network went down (because the box hibernated is popular) it would take down both clients not just one. And they wouldn't be running.

gitbls commented 4 years ago

I'm setting up to do a full end-to-end clean run test with plenty of netstats along the way. Will report back with details. I realize that this isn't super-easy to solve, and really appreciate your time on this.

A couple of questions: I'm using DISPLAY set to the Windows LAN IP address (192.168.92.7 in my case). Is that the case for your test as well? Is 172.26.16.1 the LAN IP address of the Windows host? Also, it appears that you have VcXsrv listening on 0.0.0.0, while mine is on 127.0.0.1. What are you using as the Listen address for VcXsrv.

Finally, I have not added any port proxies in the Hyper-V network infrastructure. I assume you don't have any either?

As far as the xterm connection timed out error, I've seen that happen a couple of other times, not at the same time as the xterm window disappears. Very strange!

therealkenc commented 4 years ago

Is that the case for your test as well? Is 172.26.16.1 the LAN IP

Yes that is the only way to do it. I would have added the DISPLAY= line in my screencap yesterday (almost stepped back and did) but I already knew you made it that far.

Also, it appears that you have VcXsrv listening on 0.0.0.0, while mine is on 127.0.0.1.

No yours is listening on zeros also, unless you pasted incorrectly. I am not even sure if there is a way to get VcXsrv to do otherwise without extraordinary steps (additional commandline flags maybe) but there is no way you are doing that.

image

Finally, I have not added any port proxies in the Hyper-V network infrastructure. I assume you don't have any either?

No nothing special (which is where you want to be too).

gitbls commented 4 years ago

Yes, you're correct, of course, that I'm listening on 0.0.0.0 as well, and I've not added any VcXsrv command-line switches.

I ran another test last night. netstats follow. I started the test just after 8pm, and both xterms had disappeared from the screen when I re-checked at 9:47pm, but the xterms were still running as previously. By the following morning, the xterms had actually exited with fatal error 110 (connection timed out). At each step, in the Debian WSL I captured the output of:

netstat -an | grep 6000
netstat.exe /an | grep 6000 | grep -v "127.0.0.1"

Stage starting - VcXsrv started, but no X apps started Sun 07 Jun 2020 08:04:01 PM PDT WSL (no connections) Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP [::]:6000 [::]:0 LISTENING

Stage xt1-started - After starting the first xterm Sun 07 Jun 2020 08:04:50 PM PDT WSL tcp 0 0 172.22.242.165:48242 192.168.92.7:6000 ESTABLISHED Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP 192.168.92.7:6000 192.168.92.7:60852 ESTABLISHED TCP [::]:6000 [::]:0 LISTENING

Stage xt2-started - After starting the second xterm Sun 07 Jun 2020 08:05:39 PM PDT WSL tcp 0 0 172.22.242.165:48242 192.168.92.7:6000 ESTABLISHED tcp 0 0 172.22.242.165:48244 192.168.92.7:6000 ESTABLISHED Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP 192.168.92.7:6000 192.168.92.7:60852 ESTABLISHED TCP 192.168.92.7:6000 192.168.92.7:60854 ESTABLISHED TCP [::]:6000 [::]:0 LISTENING

Stage xt-disconnected - Both xterms had disappeared from the screen Sun 07 Jun 2020 09:47:00 PM PDT WSL tcp 0 0 172.22.242.165:48242 192.168.92.7:6000 ESTABLISHED tcp 0 0 172.22.242.165:48244 192.168.92.7:6000 ESTABLISHED Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP [::]:6000 [::]:0 LISTENING

Stage xt-fataled-out - Boh xterms had exited with fatal error 110 Mon 08 Jun 2020 07:07:57 AM PDT WSL (no connections) Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP [::]:6000 [::]:0 LISTENING

therealkenc commented 4 years ago

Thank-you for the detailed follow up. My two xterms have been up for ~24 hours now, but we both pretty much knew that would be the case going in.

Your results are what I suspected (but needed to see). It is about the worst-case scenario for tracking down your variable. The Windows end of the TCP connection went away, but didn't do a close handshake (that the WSL side received anyway). Analogy here is if you had the X server on one machine, the client xterm on another machine, and unplugged the ethernet cable. That's why I wanted to pursue this. It isn't (scare quote) "WSL" in the Linux kernel sense. It isn't your X server. It sure ain't xterm.

The question is how your rig got in such a state, and needless to say I can't guess and neither can you. End-game here is to confirm, which absolute metaphysical certainty, you aren't running anything third-party like AV software or VPN software or firewall software, or anything third-party that would install a Windows kernel driver. Then submit network logs following (9) in contributing.md. The "Recreate your problem in the 'Additional Details' section" phase, which for you seems to take a couple of hours, is doing the above again. That's tricky (and unusual) because more commonly a network problem manifests hard and fast. But collecting network logs is about the last chance saloon here.

You could in principle try to collect wireshark logs on port 6000 on both sides, but frankly I don't think that would demonstrate anything we don't already know by deduction.

Your best hope, really, is to get some me2s that help identify the variable. I'll drop the need-repro tag since you've done all you can, short the network logs via feedback hub.

gitbls commented 4 years ago

Thanks! Will sort that out and get it done in the next couple of days, and will close this once I finish.

WSLUser commented 4 years ago

The first thing I'd check for any WSL2 related issues is memory consumption. Even network tools requires some memory to start with and WSL2 will by default use everything at it's disposal (which is everything Windows has). Try limiting your RAM by half using .wslconfig. Then see if it dies overnight or if it takes less time before being consumed.

gitbls commented 4 years ago

@WSLUser Thanks for your perspective on this. The systems that are exhibiting this have plenty of physical memory. One has 16GB, one has 32GB, and one has 64GB. I don't see any indication that it's a memory problem. Task Manager still shows plenty of free memory.

WSLUser commented 4 years ago

Glad to hear it. That just helps narrow the culprit ever so slightly.

gitbls commented 4 years ago

Feedback submitted: Link to Feedback details @therealkenc should this issue be closed now?

therealkenc commented 4 years ago

Open or closed is pretty academic in this instance. Close if you want it off your books (are moving on). Leaving it open is fine too. If nothing comes of this (ie a total absence of me2s arrive that are plausibly the same problem as you) it will get closed at some interminate time in the future either way. Your pleasure.

alexvorobiev commented 4 years ago

I have similar issue but only when Windows goes to sleep. I then found this FAQ for x410: https://x410.dev/cookbook/wsl/using-x410-with-wsl2 at the very end.

therealkenc commented 4 years ago

The LZ for sleep/standby is #5021.

therealkenc commented 4 years ago

Linking #4675 as possibly (?) related.

aspen commented 4 years ago

I seem to be having a possibly similar issue on 19042.487 and WSL2 (Ubuntu) with VcXSrv.

Some windows (an xfce4-terminal and IntelliJ) seem to occasionally disappear. The first time this happened there was a system sleep involved, but just now it happened maybe only 10 minutes into using the applications. No notable errors in VcXSrv or WSL but I too will see what I can find.

gkamendje commented 4 years ago

I have the same problem. I am running Win 10 2004 Build 19041.508 with WSL2 Ubuntu 18.04. My X11 apps crash with Fatal IO error 110 (Connection timed out) on X server. I noticed that it happens most of the time when I recover from hibernation. At the beginning I thought it was an issue with the X server (Mobaxterm) that I was using. I switched to VcXSrv and I had the same issue. Although quite rare, I noticed that the problem sometime happens when I do not put the machine in hibernation. I had never seen the issue while I was running with WSL1.

alexvorobiev commented 4 years ago

The current workaround seems to be to use X2Go. I haven't tried that yet.

nbdd0121 commented 4 years ago

This and #4675 both seems to be Windows dropping non-loopback TCP connections when wake-up from hibernation or when network changes. Because WSL2 networking goes through the WSL bridge, it is not loopback so Windows will reset it. AFAIK there is no easy way to change this behaviour.

The obvious workaround is using xpra or x2go which will reconnect when connection goes out. However in my experience they have much higher latency, are not as stable as raw X11 server and I frequently run into issues and need to restart/reconnect.

I ended up writing my own workaround which uses AF_VSOCK https://github.com/nbdd0121/x11-over-vsock.

gitbls commented 4 years ago

This and #4675 both seems to be Windows dropping non-loopback TCP connections when wake-up from hibernation or when network changes. Because WSL2 networking goes through the WSL bridge, it is not loopback so Windows will reset it. AFAIK there is no easy way to change this behaviour.

The obvious workaround is using xpra or x2go which will reconnect when connection goes out. However in my experience they have much higher latency, are not as stable as raw X11 server and I frequently run into issues and need to restart/reconnect.

I ended up writing my own workaround which uses AF_VSOCK https://github.com/nbdd0121/x11-over-vsock.

While this may be your problem, in my case (the base note) there is no hibernation or network change involved at all.

nbdd0121 commented 4 years ago

While this may be your problem, in my case (the base note) there is no hibernation or network change involved at all.

Based on the discussion above I thought your issue is also network related? Sometimes a network change might be difficult to notice (e.g. WiFi reassociation, IP change, etc). I previously also had seemingly random X connection drop (my PC is connected via WiFi), and with my workaround I haven't observe any drop so far.

gitbls commented 4 years ago

While this may be your problem, in my case (the base note) there is no hibernation or network change involved at all.

Based on the discussion above I thought your issue is also network related? Sometimes a network change might be difficult to notice (e.g. WiFi reassociation, IP change, etc). I previously also had seemingly random X connection drop (my PC is connected via WiFi), and with my workaround I haven't observe any drop so far.

It obviously IS network-related, but nothing to do with a network change since all the systems are hardwired on a reliable LAN. I have seen it on 3 different systems here, at different times. My not-very-educated guess is that it's something to do with the Hyper-V switch. Hopefully @therealkenc and the rest of the team will reveal the solution to this mystery someday. Keep those cards and letters coming!

JBPlinn commented 4 years ago

Same issue: terminal closes unexpectedly after a certain amount of time...

vkopanja commented 3 years ago

Happens to me as well, using WSL2 on Ubuntu. I run my IntelliJ IDEA, and sometimes it just randomly closes and I have no way to reopen it.

kportertx commented 3 years ago

Also same issue on WSL2 & ubuntu20, ~but I didn't start noticing it until I needed to connect to a VPN (OpenVPN) for work~.

~I also installed Docker-desktop around the same time I setup the VPN. Today the issue continued without the VPN software, so I uninstalled Docker-desktop. My Linux windows were closing about every 2 hours - it has now been about 4 hours since uninstalling docker and my windows are all still running.~ Eventually happened again.

kportertx commented 3 years ago

~Solved (I hope). Issue seems to be caused by X11 timeouts. I've added ForwardX11Timeout 14d to ~/.ssh/config and my windows was survived for 7 hours (so far).~

~Note, setting ForwardX11Timeout to 0 will disable the timeout functionality, which I'll likely change to later.~

~My ~/.ssh/config:~

cat ~/.ssh/config
ForwardAgent yes
ForwardX11Timeout 14d

https://github.com/microsoft/WSL/issues/5339#issuecomment-740924145

gitbls commented 3 years ago

Solved (I hope). Issue seems to be caused by X11 timeouts. I've added ForwardX11Timeout 14d to ~/.ssh/config and my windows was survived for 7 hours (so far).

Note, setting ForwardX11Timeout to 0 will disable the timeout functionality, which I'll likely change to later.

My ~/.ssh/config:

cat ~/.ssh/config
ForwardAgent yes
ForwardX11Timeout 14d

Great to hear this solves your problem, but my issue had no ssh involved. That said, I haven't tried to repro my problem in a while, so giving it another go. Will report back in a few days after I see how it does.

kportertx commented 3 years ago

~Are you sure you aren't using ssh. I have little experience with remote x sessions, but according to this semi-reliable source you have to do additional configuration to allow non-ssh connections.~

~Btw, just checked my machine after being afk for 2.5 hrs, still running.~

https://github.com/microsoft/WSL/issues/5339#issuecomment-740924145

gitbls commented 3 years ago

Are you sure you aren't using ssh. I have little experience with remote x sessions, but according to this semi-reliable source you have to do additional configuration to allow non-ssh connections.

Btw, just checked my machine after being afk for 2.5 hrs, still running.

Yes, 100% sure. I open a Debian window and from that command line I start an xterm that displays on my xming Xserver running on the Windows box. The xterm comes up fine. There is no ssh running anywhere in this process.

Unfortunately, after about 2 hours, the xterm windows disappeared. The xterm processes still show up in a 'ps ux' done in the Debian command window, though, just like they did before. So, in spite of my fervent hope that this magically got corrected, no joy.

kportertx commented 3 years ago

Yes, 100% sure. I open a Debian window and from that command line I start an xterm that displays on my xming Xserver running on the Windows box. The xterm comes up fine. There is no ssh running anywhere in this process.

You are right, I'm basically doing the same as you describe and ssh isn't involved (confirmed being clear text with wireshark). So the reason for this issue disappearing for me is a bit of a mystery.

kportertx commented 3 years ago

Possibly related.

I noticed when I disconnect from WiFi that my X11 windows would close. This was because I was using the IP address assigned to this interface for the DISPLAY variable. Switching to the IP address assigned to the virtual interface labeled "WSL" resolves this particular issue.

gitbls commented 3 years ago

Possibly related.

I noticed when I disconnect from WiFi that my X11 windows would close. This was because I was using the IP address assigned to this interface for the DISPLAY variable. Switching to the IP address assigned to the virtual interface labeled "WSL" resolves this particular issue.

Not sure what you're saying here. My windows host has 3 IP addresses: Ethernet adapter: 192.168.92.8 #This is on my local LAN vEthernet (default switch): 172.19.160.1 vEthernet (WSL) 172.31.48.1

My Debian WSL2 instance has an IP address of 172.31.50.148. In that console session, I tried

DISPLAY=ddd.ddd.ddd.ddd:0 xterm &

For each of the IP addresses visible in the HOST (the first 3 above). The only one that started an xterm was when I connected to the Ethernet adapter (192.168.92.8).

So, I was unable to replicate your scenario and use the vEthernet (WSL) IP address. It didn't error out immediately, just sat there doing nothing for a while, and then failed out with "Can't open display". No surprise, since my X Server is listening on 192.168.92.8

Where is your X Server listening, and how did you get it to listen on the vEthernet(WSL) IP Address?

kportertx commented 3 years ago

I'm now using my local equivalent to vEthernet (WSL) 172.31.48.1. Apparently my xserver is listening on that interface as well.

kportertx commented 3 years ago

I'm not sure where the interface the xserver is listening on is configured. I'm using GWSL which uses VCXSRV.

vinodatpu commented 3 years ago

Me2. I have Windows 10 Pro 20H2 (OS build 19042.746) with WSL2 and Ubuntu 20.04 guest. Windows host has a static IP=192.168.1.10 and WSL2-Ubuntu IP and subnet keeps changing every few days. I am running MobaXterm v20.6 and its builtin Xserver on Windows host.

X apps (xterm, xfig, emacs) on WSL2-ubuntu having DISPLAY=192.168.1.10:0.0 randomly stop displaying on X server within a few hours. Ubuntu ps shows the client apps are still running. The same apps running on a physical machine CentOS 7 that is on the same subnet as Windows 10, connecting to the same X server, do not experience this problem. This is certainly related to WSL2.

vkopanja commented 3 years ago

I can confirm this happens to me 100% of the time, when my IP changes (i.e. network changes from ethernet to WiFi and vice versa).

simoncrook commented 3 years ago

This is what works for me:

1) create a bash script with following (call it sysctl.sh and place in $HOME):

#!/bin/bash sysctl -w net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5 net.ipv4.tcp_keepalive_time=300

2) Change permissions and ownership:

sudo chown root:root sysctl.sh sudo chmod 755 sysctl.sh

3) Modify sudoers file to allow users to run this (use sudo visudo). Ensure it has following line, where translates to your username: <username> ALL=(root) NOPASSWD:/home/<username>>/sysctl.sh

4) Finally, update your ~/.bashrc file to include:

sudo ./sysctl.sh

Without these changes my bash login shell and terminal (ie terminator) will close after say 30 minutes of inactivity. With these changes my bash login shell and terminal stays open until I close it. Currently its been open for days.

Note: My DISPLAY variable is set to

host.docker.internal:0.0

and my vcxsrv runs using:

vcxsrv.exe :0 -ac -lesspointer -multimonitors -multiwindow -clipboard -nowgl -dpi auto

Hope this works for your wsl2 vm and windows vcxsrv.

gitbls commented 3 years ago

This is what works for me:

  1. create a bash script with following (call it sysctl.sh and place in $HOME):

#!/bin/bash sysctl -w net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5 net.ipv4.tcp_keepalive_time=300

  1. Change permissions and ownership:

sudo chown root:root sysctl.sh sudo chmod 755 sysctl.sh

  1. Modify sudoers file to allow users to run this (use sudo visudo). Ensure it has following line, where translates to your username: <username> ALL=(root) NOPASSWD:/home/<username>>/sysctl.sh
  2. Finally, update your ~/.bashrc file to include:

sudo ./sysctl.sh

Without these changes my bash login shell and terminal (ie terminator) will close after say 30 minutes of inactivity. With these changes my bash login shell and terminal stays open until I close it. Currently its been open for days.

Note: My DISPLAY variable is set to

host.docker.internal:0.0

and my vcxsrv runs using:

vcxsrv.exe :0 -ac -lesspointer -multimonitors -multiwindow -clipboard -nowgl -dpi auto

Hope this works for your wsl2 vm and windows vcxsrv.

WOW!!! Thank you for this! After making this change, it ran overnight, which it has NEVER done before. I'll be cautiously optimistic for a bit longer, but I think this works.

I did make a slight change to your guide, though. I created /etc/sysctl.d/fixX.conf with the contents

net.ipv4.tcp_keepalive_intvl=60
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_keepalive_time=300

This is loaded automatically by sysctl when WSL restarts (need to do a wsl --shutdown if it's already up and running). No need to make any additional scripts or modify ~/.bashrc.

EDIT: Well, darn, per #4232 sysctl values aren't processed. Apologies for the diversion on this. Sorting out how to get the sysctl values loaded seamlessly doesn't detract from the correctness of the fix. Will report back in a few days after more run time.

Thanks again! This is spectacular.

simoncrook commented 3 years ago

Glad to help :-)