microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.24k stars 811 forks source link

WSL2 distros fail on start with Error 0xffffffff. (Exit code 4294967295 if launched from Windows Terminal) #4929

Open trbenning opened 4 years ago

trbenning commented 4 years ago

I recently got a new laptop. Before configuring WSL or Docker, I upgraded to insiders version 2004 (full version above). I then enabled WSL, installed Ubuntu, & set the version to 2. Later I installed Docker for Windows, and configure that to use WSL2 as well.

Everything worked great for a while, until my group policy forced me to enable Bitlocker. After that, I started seeing the errors mentioned above. While searching for a solution, I came across this reply on another issue. After executing those 2 commands and rebooting, I was able to install Ubuntu again & upgrade it to WSL2. I then reinstalled Docker, and configured it for WSL2 again as well. This worked great for almost 2 days, until my laptop Green Screened and rebooted (I have the minidump if you'd like me to send it). After the reboot, my WSL2 instances no longer work, and I'm unable to get them working again. I even executed the fsutil commands again just in case they got reset, but to no avail.

Is this behavior caused by Bitlocker, or is the timing merely coincidental?

privacyguy123 commented 4 years ago

I'm currently experiencing the same issue, minus Docker & Bitlocker. My Debian was working perfectly in WSL 2 up until today - now it hangs which causes Windows Terminal to hang also, I've tried remove/readding features etc to no avail.

privacyguy123 commented 4 years ago

Update: today when I booted into Windows WSL said it had no distro installed despite having Debian installed up until until last night.

slecorvaisier commented 4 years ago

Same issue here with my WSL2 distro (ubuntu 18.04)

Biswa96 commented 4 years ago

Do you have latest Windows 10 insider build?

privacyguy123 commented 4 years ago

Do you have latest Windows 10 insider build?

Who would put themselves through that? I'm on Slow ring atm.

trbenning commented 4 years ago

I'm on the slow ring as well. Interestingly enough, today I was able to create a new instance of Alpine, and upgrade it to WSL2 (I couldn't do this yesterday). Nothing about my system has changed that I'm aware of, aside from an additional reboot.

That gave me the confidence to enable WSL2 on my Docker install, which worked as well.

I just got my Ubuntu dev environment setup again, so I'm hesitant to upgrade that one, but I'll update this thread if my Alpine or Docker instances start crashing again.

privacyguy123 commented 4 years ago

I think the most puzzling part of all this is that there are no steps to reproduce in my case - it just happened by itself overnight.

trbenning commented 4 years ago

My experience was more or less the same. I work primarily out of the WSL Ubuntu instance, and everything was going great until suddenly it wasn't. No idea what caused the green screen, but after rebooting I couldn't get anything to work with WSL2. I rebooted a couple of times, and reinstalled the WSL distros but was unable to upgrade them to v2. Then today, on a whim, I tried again and everything just worked. 🤷‍♂️

privacyguy123 commented 4 years ago

This post fixed it for me!

https://www.surfacetablethelp.com/2018/10/fix-hyper-v-not-working-after-windows-10-v1809-upgraded.html

Still couldn't tell you how those options got set in the first place.

slecorvaisier commented 4 years ago

The post shared by @privacyguy123 solved my issue too, it just requires an additional reboot.

pnunn commented 4 years ago

I'm really bummed out by this.. I needed to use traceroute and couldn't under version 1 so followed the bouncing ball to upgarade to V2 and my ubuntu machine took ages to migrate.

Now nothing works.

I tried the post shared by @privacygui123 as well, but when I try and change the settings for vmcompute I get an error

Unexpected error. Sorry, we ran into a problem. Please try again.

But, try as I might... I can't get anything working.

ver returns Version 10.0.19041.113

wsl -l -v returns Ubuntu-18.04 Stopped 2

Any ideas please.. I'm stuck without wsl working.

I just managed to install Alpine (which is running as version 1) so atleast I can still do stuff.. BUT... what the??

Biswa96 commented 4 years ago

Try to install Ubuntu in WSL2 from scratch without migrating from v1. wsl.exe --set-default-version 2 will set WSL2 as default. Also make a backup of existing WSL user home folder and installed packages.

pnunn commented 4 years ago

Thanks Biswa96, how do I backup the home and installed packages? Sorry, new to wsl.

pnunn commented 4 years ago

I tried install openSuse-Leap-15-1 after setting the default version but it won't work either.

wsl --distribution openSuse-Leap-15-1 --user root The attempted operation is not supported for the type of object referenced.

Running it as a normal user, started, set up the new user, then crashed with a screen full of the same messages.

trbenning commented 4 years ago

It happened again today. None of my WSL2 instances will start due to the same error. This time, I believe it was caused by me messing with Hyper-V. I needed to reserve some ports that Hyper-V was hogging, so I performed the following steps:

  1. Opened Windows Features dialog, and disabled Hyper-V
  2. Restarted
  3. Opened PowerShell, and ran the following:
    netsh int ipv4 add excludedportrange tcp 8000 100 persistent
    netsh int ipv4 add excludedportrange tcp 8800 100 persistent
  4. Opened Windows Features dialog, and enabled Hyper-V
  5. Restarted
  6. After booting, all of my WSL2 instances crash with the above-mentioned error.
pnunn commented 4 years ago

Is there a fix for this or do I blow windows away and install a proper OS? This is crazy.

OK, I've converted my main machine back to v1 so I can atleast do some work.

privacyguy123 commented 4 years ago

Still no official response for this?

Are you guys on Slow or Fast ring, can we confirm the problem exists in both?

pnunn commented 4 years ago

I'm on slow.

trbenning commented 4 years ago

Is there a fix for this or do I blow windows away and install a proper OS? This is crazy.

OK, I've converted my main machine back to v1 so I can atleast do some work.

@pnunn If this is your attitude, maybe you shouldn't be on the insider builds at all. The point of us using these builds is to find issues like these so they can fix them before the GA release.

Incidentally, I'm able to get WSL2 instances working again, after deleting all of them and rebooting a couple of times. I have no idea if either of those was required though. This time I'll keep a WSL2 instance installed to see if it ever comes back without having to delete & reinstall it.

craigloewen-msft commented 4 years ago

The error 0xffffffff indicates that the virtual network could not be created. We've identified a possible bug fix for this that is on Windows version 19555 and higher.

For an immediate fix for this error, please consider moving to the fast ring (and understand that this means you'll get updates quicker but could experience more technical issues) otherwise eventually a build will be released to the slow ring that includes this fix!

As well, please let us know if the update does fix your error, we want to make sure that the fix targets this specific use case since the fix was lower in the stack.

For more context check out this Github issue where we are tracking the progress: https://github.com/microsoft/WSL/issues/4364

Thanks for helping report this!

pnunn commented 4 years ago

Thanks @craigloewen-msft atleast we know what it is now... I don't think I can risk the fast ring breaking more than already is. Will we know when this fix is pushed down to the slow ring? Can you let us know here perhaps so we can try wsl2 again?

Ta Peter.

craigloewen-msft commented 4 years ago

Yup! I will ping here or on the linked thread when this fix becomes available on the slow ring.

pnunn commented 4 years ago

Any news on this one @craigloewen-msft? Still chugging away on WSL1 at this point.

craigloewen-msft commented 4 years ago

It's not in slow ring yet!

daviholandas commented 4 years ago

for me was solved like this: 1 - Uninstall docker; 2 - Uninstall / install Hyper-v; 3 - Uninstall / install subsystem;

started working again!

pnunn commented 4 years ago

That didn't work for me. I tired installing a new Debian machine and it won't accept the username that it asks for during startup.

The attempted operation is not supported for the type of object referenced.

is the error.

pnunn commented 4 years ago

Why are Microsoft putting out blog posts like this https://devblogs.microsoft.com/commandline/wsl2-will-be-generally-available-in-windows-10-version-2004/ when wsl2 is not working?

pnunn commented 4 years ago

Hello.. is any one home?

LFBernardo commented 4 years ago

The latest version of Docker desktop decimates WSL. The only way to fix it is to remove any WSL apps and remove the entire Windows subsystem for linux. Reboot and re-install everything. I am on the latest version of everything.

pnunn commented 4 years ago

Unfortunately I've never installed Docker on here because I want to use Virtual Box. Is that a problem perhaps? Hmm...

pnunn commented 4 years ago

Well.. that went well. Just restarted for the latest update and crashed the entire laptop. No disk found.. thank goodness they rolled back. Updates now turned off for as long as possible...

What the hell Microsoft??

pnunn commented 4 years ago

OK, back on the substantive issue. Just removed virtual box and rebooted, wsl2 still crashes horribly.

pnunn commented 4 years ago

Is this ever going to get fixed. I see the media reports seem to suggest WSL2 is ready to go.. I beg to differ.

trbenning commented 4 years ago

For what it's worth, I haven't had any issues since the last time I posted.

mchubby commented 4 years ago

I did as suggested:

  1. uninstalled docker, hyper-v
  2. uninstalled WSL2 MSI kernel update
  3. uninstalled windows capabilities WSL and virtual machine platform
  4. rebooted
  5. installed WSL and virtual machine platform
  6. rebooted
  7. installed WSL2 MSI kernel update

At this point, distros and WSL2 worked fine. No svchost listening on UDP 53.

  1. installed hyper-v and sub-features
  2. rebooted

WSL2 returns 0xffffffff, svchost does listen on UDP 53.

So clearly for me, there's something going on with Hyper-V on Win10 Pro

prom3theu5 commented 4 years ago

Definitely still an issue.

Atm, I quit Docker for Desktop when it fails to start the WSL2 backend, and run the following in an Admin Powershell.

$processes = Get-Process -Id (Get-NetUDPEndpoint -LocalPort 53).OwningProcess | Select-Object -ExpandProperty Id

ForEach ($process in $processes) {
    Stop-Process -ID $process -Force
}

There are always two processes it seems - DockerD, and an unknown service captured by svchost I have no idea what the ramifications of me simply killing the tasks like I do are, but Docker for Windows starts up fine afterwards, Starts the WSL2 backend successfully - and my system is stable

windowsair commented 4 years ago
  1. issues still occur frequently...
onomatopellan commented 4 years ago

@prom3theu5 Once you get the PId with Get-Process you can open Task manager -> Services tab and sort by PId. In my case service SharedAccess (Internet Connection Sharing) uses port 53.

roy-bentley commented 4 years ago

deleted, no longer relevant

Eric2XU commented 4 years ago

Its port 53 for me due to Cisco AnyConnect VPN that installed its umbrella client. Anyone understand why Port 53 has to be open for WSL/Docker/Hyper-V? Disabling the Cisco AnyConnect Services temporarily then restarting the Hyper-V Compute Service, then my WSL2 and Docker fired right up. I then turned on the cisco services. I have a feeling I will have to do each reboot :(

Eric2XU commented 4 years ago

Just as a final post-reboot thought... setting Cisco to manual helps as it wont take over however it does mean I manually have to turn it back on after rebooting but that is preferable.

Also to those not facing the cisco issue, I continued to have other issues but in general simply restarting that Hyper-V Compute Service then trying to fire up WSL2 worked most times.

rwasef1830 commented 4 years ago

@roy-bentley workaround didn't work for me, in addition I only have Docker Desktop service (I don't have the other docker service).

Btw, shouldn't the dependency be the other way around ? ICS is the one causing the problem, Lxss needs to start before it not vice versa....

Reversing it didn't solve it for me either. The only working workaround is to remove Hyper-V which is a huge bummer. It's kind of shocking nobody tested WSL2 with Hyper-V at the same time!!

From more testing it seems the primary reason for the failure is hyper-v causes SharedAccess to listen on 0.0.0.0:53 which breaks WSL that tries to setup SharedAccess to do the same thing...

EDIT: The conflict really actually is with dnscrypt-proxy. Even changing it to listen to 127.0.1.1:53 doesn't avoid the issue. It has to be stopped completely followed by a reboot (if WSL/Hyper-V didn't manage to claim port 53 during startup, it doesn't work until next reboot).

mrpond commented 4 years ago

I notice after running wsl and got Error: 0xffffffff because it fail to create a wsl network adapter like as @craigloewen-msft said. what I first upgrade to this new 2004 build + wsl2, no WSL adapter(in red circle) image and my wsl2 can't ping to local(host os) ip, if your wsl2 also can't, I think will trigger this problem I fix it by doing this, my windows version is image

  1. Goto device manager and remove all hyper v network adapter.. When I do it it had only Hyper-V 1-3 Adapter count, no 4 (WSL)
  2. then reboot image If it work you will see hyper-v with new switch here (I'm not have it when first upgrade) image Try start wsl and pinging local(host os) ip, yeah before I can't ping host os ip but can access internet from wsl2 image

p.s. English is not my native lang hope you can fix your too. I'm also using docker too but this problem isn't docker relate.

muyufan commented 4 years ago

for me, simply un-install Hyper-V and re-start computer, then everything works

nckdhl commented 4 years ago

Uninstalling Hyper-V according to these instructions: answers.microsoft.com/en-us/windows/forum/windows_8-windows_install/how-do-i-uninstall-hyper-v and then restarting also solved the problem for me.

jtsalten commented 4 years ago

It happen to me also. I've been working with WSL2 since late May... it has been running during some days without touching. Today, when I tried to enter again in WSL I got that error too... no way to "restart" this... it just says:

PS C:..\system32> wsl Error: 0xffffffff

It's unbelievable... such unestability... a pity, because the idea is great. But if you cannot trust on it, it's useless... let's hope I didn't lose everything when I manage to get it back to work.

jtsalten commented 4 years ago

uninstalling Hyper-V ans restarting DIDN?T work.... aaagggghhhh.... what the hell!??!? With this kind of errors you really don't know what to do!!!

el-schneider commented 4 years ago

deactivating Acrylic DNS Proxy fixed it for me

pit-hub commented 4 years ago

deactivating Acrylic DNS Proxy fixed it for me

I'm using dnscrypt-proxy. So in my case stopped the dnscrypt-proxy tried the wsl command twice and my default WSL instance booted, docker started, and I started dnscrypt-proxy. Everything is running! It seams that there is DNS resolution issues when running local DNS resolver! Some transparency how it works will definitely help!

somanythings commented 4 years ago

Its port 53 for me due to Cisco AnyConnect VPN that installed its umbrella client. Anyone understand why Port 53 has to be open for WSL/Docker/Hyper-V? Disabling the Cisco AnyConnect Services temporarily then restarting the Hyper-V Compute Service, then my WSL2 and Docker fired right up. I then turned on the cisco services. I have a feeling I will have to do each reboot :(

Thanks Eric, same problem here, but with NordVpn. Only needs to be disconnected, and both wsl2 and wsl2 docker-desktop will start.