microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.25k stars 811 forks source link

WSL is non-responsive after waking from hibernate #8696

Open roja45 opened 2 years ago

roja45 commented 2 years ago

Version

Windows 11 Pro 21h2 build 22000.832

WSL Version

Kernel Version

5.10.60.1

Distro Version

Ubuntu 2-.04

Other Software

Docker desktop windows

Repro Steps

Hibernate machine Start up windows open a new terminal

Expected Behavior

Shouldn't hang

Actual Behavior

No response, terminal hangs. wsl --shutdown from command prompt also hangs, only solution is to restart the machine.

Diagnostic Logs

No response

lagz0ne commented 1 year ago

This is my work-around the issue, without having to restart taskkill -IM "wsl.exe" /F

Then just need to open wsl as usual. Somehow, wsl starts much faster

lagz0ne commented 1 year ago

This is my work-around the issue, without having to restart taskkill -IM "wsl.exe" /F

Then just need to open wsl as usual. Somehow, wsl starts much faster

nvm, it didn't work in some different occasions

lagz0ne commented 1 year ago

Can confirm this works.

can put this to a bat file and run in Admin mode taskkill -IM "vmwp.exe" /F. It would revive the WSL

I am having the same issue on Surface Pro 9 5G. I have found that killing the vmwp --Virtual Machine Worker Process will then allow for me to resolve the problem without restarting.

wsl --version WSL version: 1.0.1.0 Kernel version: 5.15.74.2 WSLg version: 1.0.47 MSRDC version: 1.2.3575 Direct3D version: 1.606.4 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22623.1020

lagz0ne commented 1 year ago

Much faster to restart with taskkill -IM vmwp.exe -IM wsl.exe /F. The command must be executed in a privileged terminal

bdavj commented 1 year ago

Seeing exactly the same behaviour here on a Dell notebook (XPS 9550) with 21H2 (22H2 upgrade won't happen...) - very slow / won't kill wsl.exe, hangs using 100% CPU after deep sleep.

xax commented 1 year ago

Much faster to restart with taskkill -IM vmwp.exe -IM wsl.exe /F. The command must be executed in a privileged terminal

Resulting in a ”green screen of death ;(“ here. [Win10 22H2 Build 19045.2311]

bdavj commented 1 year ago

Same here - taskkill -IM vmwp.exe -IM wsl.exe /F causes SYSTEM_SERVICE_EXCEPTION sad face.

himanshu-sagar commented 1 year ago

A quick trick just worked for me, turn down the laptop screen and reopen it (Shutdown/restart is not needed)

suiluj commented 1 year ago

Similar problem for me. After closed notebook for the night WSL is very slow the next day and devcontainer in vscode which run in wsl do not even start anymore. My solution so far is a complete restart.

vasekboch commented 1 year ago

Same issue here. After waking up from sleep WSL is non responsive. I cannot run wsl.exe --shutdown or any other command. Full PC restart or taskkill -IM "vmwp.exe" /F from elevated prompt resolves this issue. But its pain to start everything again.

WSL version: 1.0.3.0 Kernel version: 5.15.79.1 WSLg version: 1.0.47 MSRDC version: 1.2.3575 Direct3D version: 1.606.4 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22621.963

suiluj commented 1 year ago

@vasekboch Easy solution against the pain of having to start everything again: You and your computer should not go to sleep. This is not a bug but a feature to keep you awake and working! 😁

femans commented 1 year ago

Is there any progress on resolving this?

dnlbauer commented 1 year ago

Still happening on 22h2 for me :(

longbiao7498 commented 1 year ago

Is there any progress on resolving this? please solve this !!!!!!

sig9 commented 1 year ago

So much pain and suffering. Lets gooooo MSFT. you can do it!

lagz0ne commented 1 year ago

seems fine on version 25267 dev insider. Not sure what have changed

alex-reach commented 1 year ago

I just created as workaround a cmd with the line "net stop lxssmanager && net start lxssmanager". Execute it as admin, this seems for me to be the fastest way to restart wsl after hibernation.

wmmc88 commented 1 year ago

seems fine on version 25267 dev insider. Not sure what have changed

The behaviour i'm seeing now (after moving from beta ring 22623.1037 to dev ring 221209-1557) is that any terminals open with WSL still crash and any existing apps using WSL crash (ex. VScode w/ WSL or Jetbrains w/ WSL), but WSL no longer hangs and can be restarted.

edit: NVM. It is happening again.

zawasp commented 1 year ago

seems fine on version 25267 dev insider. Not sure what have changed

For me it didn't solve anything.

WSL version: 1.0.3.0 Kernel version: 5.15.79.1 WSLg version: 1.0.47 MSRDC version: 1.2.3575 Direct3D version: 1.606.4 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.25267.1000

dbrand666 commented 1 year ago

I just had to kill a few more things

taskkill -IM vmwp.exe -IM wsl.exe -IM wslhost.exe -IM wslservice.exe /F
bagong commented 1 year ago

I think this and the "no ssh-ing into wsl"-issue are both genuine show-stoppers and should be addressed with highest priority. Maybe a handful of people from the Edge Datamining team could help? (Sorry for the sarcasm, but it's not fun to either have to disable suspend altogether, or restart your computer every day and interrupt ongoing processes..., and you can't even control it via ssh...)

joelzamboni commented 1 year ago

I can confirm the same issue here on Windows 11 22H2 on ARM64:

WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.963
SmaGMan commented 1 year ago

I have reverted to the update 07.05.2021 (Kernel version: 5.4.72) and this resolved the issue. Just removed the last update that was installed on 21.12.2022. And kept the old one (screenshot).

Screenshot_20230102_055605
CarlosJoseChaconChavarria commented 1 year ago

this command above, well it is worst than restarting :| taskkill -IM vmwp.exe -IM wsl.exe /F Don't use it the way I found was to kill the WSL task in Task Manager to avoid restart my laptop

dbrand666 commented 1 year ago

How is it worse? I had to kill 2 more things but after that I could use WSL again.

taskkill -IM vmwp.exe -IM wsl.exe -IM wslhost.exe -IM wslservice.exe /F

Killing the WSL tasks in task manager didn't work for me.

CarlosJoseChaconChavarria commented 1 year ago

How is it worse? I had to kill 2 more things but after that I could use WSL again.

taskkill -IM vmwp.exe -IM wsl.exe -IM wslhost.exe -IM wslservice.exe /F

Killing the WSL tasks in task manager didn't work for me.

more people posted this command will get you a blue screen, and the task manager helped me I have not tested this long version taskkill -IM vmwp.exe -IM wsl.exe -IM wslhost.exe -IM wslservice.exe /F

zawasp commented 1 year ago

I have reverted to the update 07.05.2021 (Kernel version: 5.4.72) and this resolved the issue. Just removed the last update that was installed on 21.12.2022. And kept the old one (screenshot).

Screenshot_20230102_055605

How did you revert?

Cosss7 commented 1 year ago

I also find sometiems aweak from sleep, wslg will be very slow.

I can reproduce it, but I don't know what dump log should I provide?

I really want to resolve the issue.

Catlike14 commented 1 year ago

If it won't close, open task manager then kill the process from there.

Access denied. I use it at work and I'm forced to reboot the system, so I have to close all my open softwares. It's very annoying.

I would try to rever as @SmaGMan, but it could break some dependency, so I will wait for some feedback. It would be great if it would be fixed.

I fixed closing all processes related to wls (thanks to @gyaaniguy), but they're too much and some one them automatically restarts, so this is annoying too.

Any response from official devs?

jonashilmersson commented 1 year ago

I have reverted to the update 07.05.2021 (Kernel version: 5.4.72) and this resolved the issue. Just removed the last update that was installed on 21.12.2022. And kept the old one (screenshot).

Screenshot_20230102_055605

How did you revert?

Interesting! Are you running Win10 or Win11?

SmaGMan commented 1 year ago

I have reverted to the update 07.05.2021 (Kernel version: 5.4.72) and this resolved the issue. Just removed the last update that was installed on 21.12.2022. And kept the old one (screenshot).

Screenshot_20230102_055605

How did you revert?

Interesting! Are you running Win10 or Win11?

Hi, @jonashilmersson , @zawasp. First, on 2021-05-07 I installed the wsl using the old installer (wsl_update_x64.msi). I had updated my windows several times but was still using the installed old version of wsl. On 2022-12-21 I ran wsl --update and the issue with hibernation began. On 2023-01-02 I ran old installer (wsl_update_x64.msi) from my archive but this changed nothing. Then I removed the new one "Windows Subsystem for Linux" via "Add and remove programs" control panel (I had two at that moment: a new one from 2022-12-21 and an old one from 2021-05-07). And this resolved issue for me. I don't know how to obtain the old version of the installer. Possibly, I can share the copy from my archives.

burk3 commented 1 year ago

I couldn't find a package with the 5.4.72 kernel anywhere, so I built it from source myself with some difficulty. I started with the WSL2-Linux-Kernel build instructions, and had to google when issues came up due to only having a newer compiler and binutils. It works much nicer with docker and an older release of ubuntu, so that's what I'll show here.

Setup

First, have an x86 Linux system (should work in WSL) where you have Docker.

Build

Then this little bash script should just work for building the kernel:

mkdir wsl-kernel-build
cd wsl-kernel-build
wget https://github.com/microsoft/WSL2-Linux-Kernel/archive/refs/tags/linux-msft-5.4.72.tar.gz
tar xzf linux-msft-5.4.72.tar.gz
docker run --rm -v $(pwd)/WSL2-Linux-Kernel-linux-msft-5.4.72:/src ubuntu:focal bash -c "apt-get update && apt-get install -y build-essential flex bison libssl-dev libelf-dev bc && cd /src && make -j 64 KCONFIG_CONFIG=Microsoft/config-wsl"

After that finishes, your new old kernel can be found at ./WSL2-Linux-Kernel-linux-msft-5.4.72/arch/x86/boot/bzImage.

"Install"

Get the new old kernel onto your Windows box. Mine resides in ~\.wsl-kernels\vmlinuz-msft-5.4.72.

Then follow the .wslconfig docs to configure WSL2 to use your kernel. Here's what mine looks like in a powershell:

~
❯ cat ~\.wslconfig
[wsl2]
kernel=C:\\Users\\burke\\.wsl-kernels\\vmlinuz-msft-5.4.72

Test

Here's how I tested via powershell. I run uname in WSL to verify the correct kernel is in use.

~  (13s)
❯ wsl --shutdown

~
❯ wsl
❯ uname -a
Linux freddie-kane 5.4.72-microsoft-standard-WSL2 #1 SMP Thu Jan 5 09:39:15 PST 2023 x86_64 GNU/Linux

The Aforementioned Build Issues

I built my kernel on a linux host with packages way newer than when this kernel was built. As a result, I had to make a few changes. I don't recommend anyone else do this; the docker way is much easier to get older versions of stuff.

First, my system had gcc-12 which was too new to compile w/o errors. I installed gcc-11 through my package manager and passed HOSTCC=gcc-11 CC=gcc-11 to make and got past the first hurdle.

Then, my version of objtool (found in the binutils package, I think) was too new as well. Builds would whine about objtool and some missing "thunk"s. Some googling led me to this issue and this patch which I manually applied since it was meant for a different linux version.

Followup

I may try and manually bisect kernel versions to find where the issue actually starts (in case it's not just >=5.5). I suspect that would help MSFT narrow down the problem (assuming someone over there is paying attention to this issue)

arindam-riotinto commented 1 year ago

Same issue here on -

WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19044.2364

Hoping for a resolution soon.

onereal7 commented 1 year ago

I was having the same issue, however it seems now it stoped happening (still good after a few deep hibernations already). Not sure what exactly might have helped but I did a few things:

  1. Added a line to .wslconfig's [wsl2] section: guiApplications=false
  2. Disabled all virtual network adapters except Hyper-V Virtual Ethernet Adapter (had a few for virtualbox)
  3. Restarted LxssManager service to restart wsl and reload changes from .wslconfig

EDIT: unfortunately, hanged again after some more hibernations.. Trying old kernel solution.

carambas32 commented 1 year ago
1. Added a line to .wslconfig's [wsl2] section:
   `guiApplications=false`

Worked for me... Juste once.

cujo commented 1 year ago

I have guiApplications=false from the begining and still affected by the issue. But switching to 5.4.72 kernel worked for me too, thanks!

isaacsu commented 1 year ago

Switching to kernel 5.4.72 by following these instructions worked for me too. I've been testing for a few days now by intentionally putting my laptop to hibernate, no crashing or hanging so far.

$ uname -a
Linux PC 5.4.72-microsoft-standard-WSL2 #1 SMP Mon Jan 9 20:35:11 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
PS > wsl --version
WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2364

Thanks for the workaround @burk3. Hope MSFT team get on to a permanent fix for this.

ayon06 commented 1 year ago

happening to me as well - running the arm64 version:

WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22623.1095
afilp commented 1 year ago

Happening to me also:

WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2486
rouke-broersma commented 1 year ago

My WSL has no problem restarting, however I usually keep my terminal and vscode open and they both lose the connection to WSL after hibernating. When I reopen the terminal and vscode they reconnect to WSL. I never used to have this problem. My terminal has the error code [process exited with code 1 (0x00000001)].

maxbortone commented 1 year ago

I'm having the same issue as @rouke-broersma just described. Changing kernel to an older version as described by @burk3 didn't solve the problem for me.

bogdan-calapod commented 1 year ago

I'm in the same boat with @rouke-broersma and @maxbortone. This also happens if I just lock my laptop, not hibernating or sleeping.

I somehow feel that there are two issues here, both related to the WSL VM being suspended in one way or another either due to the computer being locked or going to sleep.

The annoying thing with this bug is that you can't leave the laptop to get a coffee or something because WSL will lock up when you get back. Downgrading the kernel as @burk3 suggested didn't work for me.

bagong commented 1 year ago

I'm in the same boat with @rouke-broersma and @maxbortone. This also happens if I just lock my laptop, not hibernating or sleeping.

I somehow feel that there are two issues here, both related to the WSL VM being suspended in one way or another either due to the computer being locked or going to sleep.

The annoying thing with this bug is that you can't leave the laptop to get a coffee or something because WSL will lock up when you get back. Downgrading the kernel as @burk3 suggested didn't work for me.

I've been thinking a bit about this and am tempted to think that - apart from fixing the - likely very complex problem - the right answer is to prevent that the computer goes to sleep while wsl is running. I am surprised you get the problem on screensaver or lock-screen already, I used to only get it when the computer went into genuine suspend or hybernation. There might in fact be multiple different issues conflated here. But I think one necessary conclusion - thinking of processes that might be going on inside WSL or of the need to be able to log in to WSL via ssh: there should be a straightforward way to prevent that the computer goes into any of the dangerous states, that might prevent remote login, or of course also cause that WSL cannot recover. As a quick workaround for the time being the -"Awake" PowerToy might be helpful. MacOS offers a way to turn off "PowerNap" for individual processes. Maybe somthing similar exists in Windows?

rouke-broersma commented 1 year ago

But I want my laptop to hibernate after my work day. And I expect my laptop to then start up to a working state, as it has for years. This wsl issue is recent.

abellmann commented 1 year ago

could this be related to wsl-g functionality (i.e. reconnecting to x server after shutdown). I had this issue before wsl-g was introduced with the x401 x windows server. There I was able to solve this by using wsld, but that does not seem to resolve the issue now.. wsld is still running

alex-reach commented 1 year ago

Was anyone in this ìssue able to reproduce this problem on a clean Win 11 Installation? I think this might be a problem on older, maybe even migrated WSL-Installations?

kiuka commented 1 year ago

Was anyone in this ìssue able to reproduce this problem on a clean Win 11 Installation? I think this might be a problem on older, maybe even migrated WSL-Installations?

yes, I just moved to a new laptop, fresh win 11 and it hangs after coming back from hibernation :\

filipegl commented 1 year ago

Was anyone in this ìssue able to reproduce this problem on a clean Win 11 Installation? I think this might be a problem on older, maybe even migrated WSL-Installations?

My windows 11 is from factory, I bought my notebook last month and I got this issue.

femans commented 1 year ago

I switched to another OS because of this.

vinicentus commented 1 year ago

I switched to another OS because of this.

Lol I'm planning on doing the same...

The small problems like this that get no attention really add up.