Closed gitbls closed 9 months ago
I should add that other xterms (running on other non-WSL systems) displaying on the same Xserver continue to work correctly and do not disconnect when the Debian xterm does. And, same behavior exhibited on two different systems.
Thanks for the response. So, it's something I'm doing, but I can't imagine what.
I set up a new desktop system since I wrote this issue 2 days ago, and the new system has the same behavior! Here are some screen shots that I hope help demonstrate it.
I'd definitely appreciate assistance or suggestions to chase this down!
So, it's something I'm doing, but I can't imagine what.
Not necessarily. I just don't have a repro (which is, counterintuitively, unfortunate).
Okay so if I am following, the two xterms (PID 88 and 122) were alive at 3:30, but one disappeared by 4:47. But the xterms are still running. That is good context, thank you.
Do a fav and re-run the same test as I did: which was, start a xterm &
from a command prompt and go to bed. Whatever a 'doit
' is, don't. Before and after the bedtime interval, run netstat
on both sides to see what those TCP connections are up to, like this:
My run was with with VcXsrv. You can try that too, although I don't seriously think that's the variable.
If it survives the night maybe we can start identifying the variable. If it doesn't survive, and you didn't take any action (for real) on either the Windows side or the WSL side before looking at the screen to see if it was still alive, this gets more forlorn to track down.
Thx! I was going to script something in doit, but didn't, so doit didn't do it ;)
I installed the latest vcxsrv and fired off a test with 2 xterms and a starting netstat. Will see how it does in the morning and update.
As I mentioned, I started 2 xterms. After 12 hours, one of the xterms had disappeared from the screen, and the other one was still alive and working.
bls@MS-scout~> date Sat 06 Jun 2020 06:36:32 PM PDT bls@MS-scout~> netstat > netstat-start.log bls@MS-scout~> date Sun 07 Jun 2020 07:37:51 AM PDT bls@MS-scout~> netstat > netstat-end.log bls@MS-scout~> jobs [1]- Running xterm & [2]+ Running xterm & bls@MS-scout~> cat netstat-start.log Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 172.21.137.111:50756 scout.starwhite.net:x11 ESTABLISHED tcp 0 0 172.21.137.111:50754 scout.starwhite.net:x11 ESTABLISHED Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path bls@MS-scout~> cat netstat-end.log Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 172.21.137.111:50756 scout.starwhite.net:x11 ESTABLISHED tcp 0 0 172.21.137.111:50754 scout.starwhite.net:x11 ESTABLISHED Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path bls@MS-scout~>
Re-opening since I hit the wrong button. Sigh!
netstat.exe /an | grep 6000
From the Windows side:
c:\bls> netstat /an | grep 6000 TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP 127.0.0.1:6000 127.0.0.1:64630 ESTABLISHED TCP 127.0.0.1:6000 127.0.0.1:64631 ESTABLISHED TCP 127.0.0.1:6000 127.0.0.1:64632 ESTABLISHED TCP 127.0.0.1:64630 127.0.0.1:6000 ESTABLISHED TCP 127.0.0.1:64631 127.0.0.1:6000 ESTABLISHED TCP 127.0.0.1:64632 127.0.0.1:6000 ESTABLISHED TCP [::]:6000 [::]:0 LISTENING
c:\bls>
Thanks. First, what's the third thing you launched (not that it matters much). And what else do you have running. pstree -p
.
With VcXsrv at least (probably the other servers too) you get two stacked taskbar icons. Do you have two (or three) or did the icons go down with the missing xterm. You can also maybe look at the VcXsrv logs and see if they show anything interesting. Looks like this:
I'll do a run with more than one X11 client myself tonight (if I remember). Don't have high hopes, I forget to close X clients all the time (OTOH I probably wouldn't notice if one disapeared). But at least we'll be in sync.
The only other thing running is another Debian console window (created from the Debian item in the menu). Here's pstree -A -p
bls@MS-scout/bls> pstree -A -p init(1)-+-init(8)---init(9)-+-bash(10) | -ssh-agent(62) |-init(163)---init(164)---bash(165)---pstree(243) -{init}(7) bls@MS-scout/bls>
There were 2 stacked taskbar icons for the xterms when I started them, but I didn't notice if there were one or two this morning. Guessing there was only one, since the window had disappeared.
The VcXsrv log doesn't have anything useful, just the usual X garp.
One interesting thing: the 2nd xterm window went away this morning while I was out of the office, and there were two actual 'xterm: fatal IO error 110 (connection timed out) error in the Debian console window, and as expected, the jobs no longer show up with a 'jobs' command.
I'll do a Windows reboot and restart the test later this morning...let me know if there's anything else you'd like me to do for this one,.
Well you've gotta have the two xterms at least (although I guess they are long gone).
We're going to need to start over. I only just clued in that your netstat.exe
has no connections to WSL, only three spurious 127.0.0.1
. Your starting position should look like this:
That's two sockets on the WSL side, and matching two on the Windows side. The point is to get a before and after, and see if the Windows side is still established or went away.
xterm: fatal IO error 110 (connection timed out) error in the Debian console
That's useful, but also complicates things. That isn't the same situation as yesterday; the xterms were both still running and did not error out yesterday. Normally with that error above I'd shrug and say "dunno ask the people who maintain your X server of choice". But that's not it. Frustratingly, it is also hard to finger-point WSL networking (which is the usual suspect and this thread would be shorter). If your whole network went down (because the box hibernated is popular) it would take down both clients not just one. And they wouldn't be running.
I'm setting up to do a full end-to-end clean run test with plenty of netstats along the way. Will report back with details. I realize that this isn't super-easy to solve, and really appreciate your time on this.
A couple of questions: I'm using DISPLAY set to the Windows LAN IP address (192.168.92.7 in my case). Is that the case for your test as well? Is 172.26.16.1 the LAN IP address of the Windows host? Also, it appears that you have VcXsrv listening on 0.0.0.0, while mine is on 127.0.0.1. What are you using as the Listen address for VcXsrv.
Finally, I have not added any port proxies in the Hyper-V network infrastructure. I assume you don't have any either?
As far as the xterm connection timed out error, I've seen that happen a couple of other times, not at the same time as the xterm window disappears. Very strange!
Is that the case for your test as well? Is 172.26.16.1 the LAN IP
Yes that is the only way to do it. I would have added the DISPLAY=
line in my screencap yesterday (almost stepped back and did) but I already knew you made it that far.
Also, it appears that you have VcXsrv listening on 0.0.0.0, while mine is on 127.0.0.1.
No yours is listening on zeros also, unless you pasted incorrectly. I am not even sure if there is a way to get VcXsrv to do otherwise without extraordinary steps (additional commandline flags maybe) but there is no way you are doing that.
Finally, I have not added any port proxies in the Hyper-V network infrastructure. I assume you don't have any either?
No nothing special (which is where you want to be too).
Yes, you're correct, of course, that I'm listening on 0.0.0.0 as well, and I've not added any VcXsrv command-line switches.
I ran another test last night. netstats follow. I started the test just after 8pm, and both xterms had disappeared from the screen when I re-checked at 9:47pm, but the xterms were still running as previously. By the following morning, the xterms had actually exited with fatal error 110 (connection timed out). At each step, in the Debian WSL I captured the output of:
netstat -an | grep 6000
netstat.exe /an | grep 6000 | grep -v "127.0.0.1"
Stage starting - VcXsrv started, but no X apps started Sun 07 Jun 2020 08:04:01 PM PDT WSL (no connections) Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP [::]:6000 [::]:0 LISTENING
Stage xt1-started - After starting the first xterm Sun 07 Jun 2020 08:04:50 PM PDT WSL tcp 0 0 172.22.242.165:48242 192.168.92.7:6000 ESTABLISHED Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP 192.168.92.7:6000 192.168.92.7:60852 ESTABLISHED TCP [::]:6000 [::]:0 LISTENING
Stage xt2-started - After starting the second xterm Sun 07 Jun 2020 08:05:39 PM PDT WSL tcp 0 0 172.22.242.165:48242 192.168.92.7:6000 ESTABLISHED tcp 0 0 172.22.242.165:48244 192.168.92.7:6000 ESTABLISHED Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP 192.168.92.7:6000 192.168.92.7:60852 ESTABLISHED TCP 192.168.92.7:6000 192.168.92.7:60854 ESTABLISHED TCP [::]:6000 [::]:0 LISTENING
Stage xt-disconnected - Both xterms had disappeared from the screen Sun 07 Jun 2020 09:47:00 PM PDT WSL tcp 0 0 172.22.242.165:48242 192.168.92.7:6000 ESTABLISHED tcp 0 0 172.22.242.165:48244 192.168.92.7:6000 ESTABLISHED Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP [::]:6000 [::]:0 LISTENING
Stage xt-fataled-out - Boh xterms had exited with fatal error 110 Mon 08 Jun 2020 07:07:57 AM PDT WSL (no connections) Windows TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING TCP [::]:6000 [::]:0 LISTENING
Thank-you for the detailed follow up. My two xterms have been up for ~24 hours now, but we both pretty much knew that would be the case going in.
Your results are what I suspected (but needed to see). It is about the worst-case scenario for tracking down your variable. The Windows end of the TCP connection went away, but didn't do a close handshake (that the WSL side received anyway). Analogy here is if you had the X server on one machine, the client xterm
on another machine, and unplugged the ethernet cable. That's why I wanted to pursue this. It isn't (scare quote) "WSL" in the Linux kernel sense. It isn't your X server. It sure ain't xterm
.
The question is how your rig got in such a state, and needless to say I can't guess and neither can you. End-game here is to confirm, which absolute metaphysical certainty, you aren't running anything third-party like AV software or VPN software or firewall software, or anything third-party that would install a Windows kernel driver. Then submit network logs following (9) in contributing.md. The "Recreate your problem in the 'Additional Details' section" phase, which for you seems to take a couple of hours, is doing the above again. That's tricky (and unusual) because more commonly a network problem manifests hard and fast. But collecting network logs is about the last chance saloon here.
You could in principle try to collect wireshark logs on port 6000 on both sides, but frankly I don't think that would demonstrate anything we don't already know by deduction.
Your best hope, really, is to get some me2s that help identify the variable. I'll drop the need-repro tag since you've done all you can, short the network logs via feedback hub.
Thanks! Will sort that out and get it done in the next couple of days, and will close this once I finish.
The first thing I'd check for any WSL2 related issues is memory consumption. Even network tools requires some memory to start with and WSL2 will by default use everything at it's disposal (which is everything Windows has). Try limiting your RAM by half using .wslconfig. Then see if it dies overnight or if it takes less time before being consumed.
@WSLUser Thanks for your perspective on this. The systems that are exhibiting this have plenty of physical memory. One has 16GB, one has 32GB, and one has 64GB. I don't see any indication that it's a memory problem. Task Manager still shows plenty of free memory.
Glad to hear it. That just helps narrow the culprit ever so slightly.
Feedback submitted: Link to Feedback details @therealkenc should this issue be closed now?
Open or closed is pretty academic in this instance. Close if you want it off your books (are moving on). Leaving it open is fine too. If nothing comes of this (ie a total absence of me2s arrive that are plausibly the same problem as you) it will get closed at some interminate time in the future either way. Your pleasure.
I have similar issue but only when Windows goes to sleep. I then found this FAQ for x410: https://x410.dev/cookbook/wsl/using-x410-with-wsl2 at the very end.
The LZ for sleep/standby is #5021.
Linking #4675 as possibly (?) related.
I seem to be having a possibly similar issue on 19042.487 and WSL2 (Ubuntu) with VcXSrv.
Some windows (an xfce4-terminal and IntelliJ) seem to occasionally disappear. The first time this happened there was a system sleep involved, but just now it happened maybe only 10 minutes into using the applications. No notable errors in VcXSrv or WSL but I too will see what I can find.
I have the same problem. I am running Win 10 2004 Build 19041.508 with WSL2 Ubuntu 18.04. My X11 apps crash with Fatal IO error 110 (Connection timed out) on X server
. I noticed that it happens most of the time when I recover from hibernation. At the beginning I thought it was an issue with the X server (Mobaxterm) that I was using. I switched to VcXSrv and I had the same issue.
Although quite rare, I noticed that the problem sometime happens when I do not put the machine in hibernation.
I had never seen the issue while I was running with WSL1.
The current workaround seems to be to use X2Go. I haven't tried that yet.
This and #4675 both seems to be Windows dropping non-loopback TCP connections when wake-up from hibernation or when network changes. Because WSL2 networking goes through the WSL bridge, it is not loopback so Windows will reset it. AFAIK there is no easy way to change this behaviour.
The obvious workaround is using xpra or x2go which will reconnect when connection goes out. However in my experience they have much higher latency, are not as stable as raw X11 server and I frequently run into issues and need to restart/reconnect.
I ended up writing my own workaround which uses AF_VSOCK https://github.com/nbdd0121/x11-over-vsock.
This and #4675 both seems to be Windows dropping non-loopback TCP connections when wake-up from hibernation or when network changes. Because WSL2 networking goes through the WSL bridge, it is not loopback so Windows will reset it. AFAIK there is no easy way to change this behaviour.
The obvious workaround is using xpra or x2go which will reconnect when connection goes out. However in my experience they have much higher latency, are not as stable as raw X11 server and I frequently run into issues and need to restart/reconnect.
I ended up writing my own workaround which uses AF_VSOCK https://github.com/nbdd0121/x11-over-vsock.
While this may be your problem, in my case (the base note) there is no hibernation or network change involved at all.
While this may be your problem, in my case (the base note) there is no hibernation or network change involved at all.
Based on the discussion above I thought your issue is also network related? Sometimes a network change might be difficult to notice (e.g. WiFi reassociation, IP change, etc). I previously also had seemingly random X connection drop (my PC is connected via WiFi), and with my workaround I haven't observe any drop so far.
While this may be your problem, in my case (the base note) there is no hibernation or network change involved at all.
Based on the discussion above I thought your issue is also network related? Sometimes a network change might be difficult to notice (e.g. WiFi reassociation, IP change, etc). I previously also had seemingly random X connection drop (my PC is connected via WiFi), and with my workaround I haven't observe any drop so far.
It obviously IS network-related, but nothing to do with a network change since all the systems are hardwired on a reliable LAN. I have seen it on 3 different systems here, at different times. My not-very-educated guess is that it's something to do with the Hyper-V switch. Hopefully @therealkenc and the rest of the team will reveal the solution to this mystery someday. Keep those cards and letters coming!
Same issue: terminal closes unexpectedly after a certain amount of time...
Happens to me as well, using WSL2 on Ubuntu. I run my IntelliJ IDEA, and sometimes it just randomly closes and I have no way to reopen it.
Also same issue on WSL2 & ubuntu20, ~but I didn't start noticing it until I needed to connect to a VPN (OpenVPN) for work~.
~I also installed Docker-desktop around the same time I setup the VPN. Today the issue continued without the VPN software, so I uninstalled Docker-desktop. My Linux windows were closing about every 2 hours - it has now been about 4 hours since uninstalling docker and my windows are all still running.~ Eventually happened again.
~Solved (I hope). Issue seems to be caused by X11 timeouts. I've added ForwardX11Timeout 14d
to ~/.ssh/config
and my windows was survived for 7 hours (so far).~
~Note, setting ForwardX11Timeout
to 0
will disable the timeout functionality, which I'll likely change to later.~
~My ~/.ssh/config
:~
cat ~/.ssh/config
ForwardAgent yes
ForwardX11Timeout 14d
https://github.com/microsoft/WSL/issues/5339#issuecomment-740924145
Solved (I hope). Issue seems to be caused by X11 timeouts. I've added
ForwardX11Timeout 14d
to~/.ssh/config
and my windows was survived for 7 hours (so far).Note, setting
ForwardX11Timeout
to0
will disable the timeout functionality, which I'll likely change to later.My
~/.ssh/config
:cat ~/.ssh/config ForwardAgent yes ForwardX11Timeout 14d
Great to hear this solves your problem, but my issue had no ssh involved. That said, I haven't tried to repro my problem in a while, so giving it another go. Will report back in a few days after I see how it does.
~Are you sure you aren't using ssh. I have little experience with remote x sessions, but according to this semi-reliable source you have to do additional configuration to allow non-ssh connections.~
~Btw, just checked my machine after being afk for 2.5 hrs, still running.~
https://github.com/microsoft/WSL/issues/5339#issuecomment-740924145
Are you sure you aren't using ssh. I have little experience with remote x sessions, but according to this semi-reliable source you have to do additional configuration to allow non-ssh connections.
Btw, just checked my machine after being afk for 2.5 hrs, still running.
Yes, 100% sure. I open a Debian window and from that command line I start an xterm that displays on my xming Xserver running on the Windows box. The xterm comes up fine. There is no ssh running anywhere in this process.
Unfortunately, after about 2 hours, the xterm windows disappeared. The xterm processes still show up in a 'ps ux' done in the Debian command window, though, just like they did before. So, in spite of my fervent hope that this magically got corrected, no joy.
Yes, 100% sure. I open a Debian window and from that command line I start an xterm that displays on my xming Xserver running on the Windows box. The xterm comes up fine. There is no ssh running anywhere in this process.
You are right, I'm basically doing the same as you describe and ssh isn't involved (confirmed being clear text with wireshark). So the reason for this issue disappearing for me is a bit of a mystery.
Possibly related.
I noticed when I disconnect from WiFi that my X11 windows would close. This was because I was using the IP address assigned to this interface for the DISPLAY
variable. Switching to the IP address assigned to the virtual interface labeled "WSL" resolves this particular issue.
Possibly related.
I noticed when I disconnect from WiFi that my X11 windows would close. This was because I was using the IP address assigned to this interface for the
DISPLAY
variable. Switching to the IP address assigned to the virtual interface labeled "WSL" resolves this particular issue.
Not sure what you're saying here. My windows host has 3 IP addresses: Ethernet adapter: 192.168.92.8 #This is on my local LAN vEthernet (default switch): 172.19.160.1 vEthernet (WSL) 172.31.48.1
My Debian WSL2 instance has an IP address of 172.31.50.148. In that console session, I tried
DISPLAY=ddd.ddd.ddd.ddd:0 xterm &
For each of the IP addresses visible in the HOST (the first 3 above). The only one that started an xterm was when I connected to the Ethernet adapter (192.168.92.8).
So, I was unable to replicate your scenario and use the vEthernet (WSL) IP address. It didn't error out immediately, just sat there doing nothing for a while, and then failed out with "Can't open display". No surprise, since my X Server is listening on 192.168.92.8
Where is your X Server listening, and how did you get it to listen on the vEthernet(WSL) IP Address?
I'm now using my local equivalent to vEthernet (WSL) 172.31.48.1
. Apparently my xserver is listening on that interface as well.
I'm not sure where the interface the xserver is listening on is configured. I'm using GWSL which uses VCXSRV.
Me2. I have Windows 10 Pro 20H2 (OS build 19042.746) with WSL2 and Ubuntu 20.04 guest. Windows host has a static IP=192.168.1.10 and WSL2-Ubuntu IP and subnet keeps changing every few days. I am running MobaXterm v20.6 and its builtin Xserver on Windows host.
X apps (xterm, xfig, emacs) on WSL2-ubuntu having DISPLAY=192.168.1.10:0.0 randomly stop displaying on X server within a few hours. Ubuntu ps shows the client apps are still running. The same apps running on a physical machine CentOS 7 that is on the same subnet as Windows 10, connecting to the same X server, do not experience this problem. This is certainly related to WSL2.
I can confirm this happens to me 100% of the time, when my IP changes (i.e. network changes from ethernet to WiFi and vice versa).
This is what works for me:
1) create a bash script with following (call it sysctl.sh and place in $HOME):
#!/bin/bash
sysctl -w net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5 net.ipv4.tcp_keepalive_time=300
2) Change permissions and ownership:
sudo chown root:root sysctl.sh
sudo chmod 755 sysctl.sh
3) Modify sudoers file to allow users to run this (use sudo visudo). Ensure it has following line, where <username> ALL=(root) NOPASSWD:/home/<username>>/sysctl.sh
4) Finally, update your ~/.bashrc file to include:
sudo ./sysctl.sh
Without these changes my bash login shell and terminal (ie terminator) will close after say 30 minutes of inactivity. With these changes my bash login shell and terminal stays open until I close it. Currently its been open for days.
Note: My DISPLAY variable is set to
host.docker.internal:0.0
and my vcxsrv runs using:
vcxsrv.exe :0 -ac -lesspointer -multimonitors -multiwindow -clipboard -nowgl -dpi auto
Hope this works for your wsl2 vm and windows vcxsrv.
This is what works for me:
- create a bash script with following (call it sysctl.sh and place in $HOME):
#!/bin/bash
sysctl -w net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5 net.ipv4.tcp_keepalive_time=300
- Change permissions and ownership:
sudo chown root:root sysctl.sh
sudo chmod 755 sysctl.sh
- Modify sudoers file to allow users to run this (use sudo visudo). Ensure it has following line, where translates to your username:
<username> ALL=(root) NOPASSWD:/home/<username>>/sysctl.sh
- Finally, update your ~/.bashrc file to include:
sudo ./sysctl.sh
Without these changes my bash login shell and terminal (ie terminator) will close after say 30 minutes of inactivity. With these changes my bash login shell and terminal stays open until I close it. Currently its been open for days.
Note: My DISPLAY variable is set to
host.docker.internal:0.0
and my vcxsrv runs using:
vcxsrv.exe :0 -ac -lesspointer -multimonitors -multiwindow -clipboard -nowgl -dpi auto
Hope this works for your wsl2 vm and windows vcxsrv.
WOW!!! Thank you for this! After making this change, it ran overnight, which it has NEVER done before. I'll be cautiously optimistic for a bit longer, but I think this works.
I did make a slight change to your guide, though. I created /etc/sysctl.d/fixX.conf with the contents
net.ipv4.tcp_keepalive_intvl=60
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_keepalive_time=300
This is loaded automatically by sysctl when WSL restarts (need to do a wsl --shutdown if it's already up and running). No need to make any additional scripts or modify ~/.bashrc.
EDIT: Well, darn, per #4232 sysctl values aren't processed. Apologies for the diversion on this. Sorting out how to get the sysctl values loaded seamlessly doesn't detract from the correctness of the fix. Will report back in a few days after more run time.
Thanks again! This is spectacular.
Glad to help :-)
Environment
Steps to reproduce
1) Start an Xserver on the Windows system. I've reproduced this with both Xming and the Cygwin Xserver and with 3 or 4 different X apps (xterm, xclock, xcolors, for instance)
2) Start the Debian WSL app using the icon created when Debian installed
3)
ping hostip
(192.168.xx.32 in my case) to confirm. It works.4)
DISPLAY=192.168.xx.32:0 xterm &
Expected behavior
The X app displays on the host through the Xserver. X app continues to run until I kill the Xserver or the app. Happiness throughout the kingdom.
Actual behavior
At some random point later, the xterm window simply disppears, with no action on my part. This seems to take a few hours sometimes.
A
ps aux
in Debian still shows the xterm process as active.I have tried doing some logging to capture the exact timing, but since the xterm and bash processes are still running, it's difficult to determine exactly when the app stopped displaying.