microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.36k stars 816 forks source link

WslRegisterDistribution failed with error: 0xffffffff #4364

Closed alexey-gusarov closed 8 months ago

alexey-gusarov commented 5 years ago

Please fill out the below information:

PS C:\WINDOWS\system32> wsl --list Windows Subsystem for Linux Distributions: Ubuntu (Default)

PS C:\WINDOWS\system32> wsl --list --verbose NAME STATE VERSION

PS C:\WINDOWS\system32> WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

FeatureName : Microsoft-Windows-Subsystem-Linux DisplayName : Windows Subsystem for Linux Description : Provides services and environments for running native user-mode Linux shells and tools on Windows. RestartRequired : Possible State : Enabled CustomProperties : ServerComponent\Description : Provides services and environments for running native user-mode Linux shells and tools on Windows. ServerComponent\DisplayName : Windows Subsystem for Linux ServerComponent\Id : 1033 ServerComponent\Type : Feature ServerComponent\UniqueName : Microsoft-Windows-Subsystem-Linux ServerComponent\Deploys\Update\Name : Microsoft-Windows-Subsystem-Linux

PS C:\WINDOWS\system32> wsl --unregister ubuntu Unregistering...

PS C:\WINDOWS\system32> ubuntu Installing, this may take a few minutes... WslRegisterDistribution failed with error: 0xffffffff Error: 0xffffffff (null) Press any key to continue...

See our contributing instructions for assistance.

wsl.zip

devbeard commented 4 years ago

@benhillis Do you have any details of the fix? Different port, dynamic port, or something else?

I have the same issue with my work laptop that runs Umbrella Roaming Client (from Cisco), and the subprocess "dnscrypt-proxy.exe" that is listening on port 53. The fun part is that since this is security software, I am unable to kill the process and get WSL running again. I'm thinking that all these clashes on the same port is going to be a common issue. At least consider a proper error message in the output from wsl.exe that gives the actual error to the end user.

benhillis commented 4 years ago

The fix was fairly deep in the TCP stack. I have given feedback to the team that owns the API that they should improve their failure return codes.

Stanzilla commented 4 years ago

I just hit this on 19041.21, I hope 2004 is not going to ship with this bug?

benhillis commented 4 years ago

We are hoping to put this fix into 2004. This fix has been out for nearly two weeks and we've yet to hear definitively if people are no longer hitting this issue. If somebody that was previously hitting this could chime in on this thread to say that 19541 resolved this issue, that would go a long way towards getting this fixed in 2004.

alpe89 commented 4 years ago

To fix this issue I've just disabled Docker Desktop on system's startup. Simply enough Docker tries to establish a connection to wsl before it is ready and crashes.

pxlrbt commented 4 years ago

I also have this issue on Microsoft Windows [Version 10.0.19041.84]. Already uninstalled docker. No changes. Images are working with WSL1 but fail when switching to WSL2.

EDIT: Well, Acrylic DNS was the issue here, too. Overlooked this in @tdaniely s comment.

mchubby commented 4 years ago

10.0.19041.172, I also see the WSL vswitch briefly appear when trying to register a distribution.

Docker service was set to disabled + computer restart

Get-Process -Id (Get-NetUDPEndpoint -LocalPort 53).OwningProcess yields two processes listening on UDP 53

Both services can be stopped, however Internet Connection Sharing (ICS) (SharedAccess) auto-restarts on its own. Even when you disable it through MMC, the service goes back to manual (trigger start) and starts.

Ergo 0.0.0.0:53 stays squatted, so anything WSL2 won't work.

Edit: will try workarounds in the pingback issue #4929 as suggested by daviholandas, LFBernardo , uninstall and reinstall every related item.

jasonhancn commented 4 years ago

Same as @alexk111, unfortunately though, that's the port that I use for dnscrypt (as Windows 10 still doesn't have support for DoH).

Sucks that there's no way to use both at the same time..?

I found a way to run dnscrypt with WSL2. Listen a loopback ip except 127.0.0.1, such as 127.0.1.1, then configure your network to use dns service on 127.0.1.1, that means 127.0.0.1:53 is not being used, then you can run WSL2 instance without this problem.

mchubby commented 4 years ago

I have done as you've said, rebinding on another address 127.0.1.1 but it doesn't solve the problem.

It is possible to work around this by

However, the Hyper-V VMs will lose connectivity even after re-enabling the default vSwitch. I will try again after resetting Winsock, but I don't think it will solve anything.


Edit: no effect to resetting WinSock + reboot Going back to hyper-v connectivity is possible, wsl --shutdown, delete WSL vSwitch in mmc, stop SharedAccess 2x, enable default vSwitch, restart hns (Host Networking Service)

jasonhancn commented 4 years ago

@mchubby I have tried a new way to solve it. It s rediculous but usable. I create a a V-Switcher for internal networking names dnscrypt switcher, bind adapter IP as 10.0.123.2/8. Then create a Hyper-V VM with dynamic memory within 128~384mb,configure Default Switch and dnscrypt switcher on it, then install OpenWRT. Configure eth0(Default Switch) as wan (dhcp client), eth1 as lan, cancel the bridge in br-lan to separate that interfaces, configure lan ip as 10.0.123.1/8, install and configure dnscrypt-proxy2 as it s github page says. Set the VM auto start with system start, and set dns as 10.0.123.1, that works. I used to use Simple DNSCrypt as it s memory usage about 70mb, now the 'VM dnscrypt' use about 250mb, not too bad.

SHJordan commented 4 years ago

The -1 error indicates an issue creating the virtual network. Those errors tend to be transient.

I can confirm that I'm seeing the "WSL" switch created and removed in the Hyper-V-VmSwitch logs.

I also see the same behaviour with the Hyper-V Default vSwitch, where I can actually see it appearing and disappearing in Network Connections.

Some searching suggests that this is due to Group Policy blocking Internet Connection Sharing, However I've enabled it manually to see if it is indeed the issue and the adapter device is still being removed.

Is there anywhere where I can find a specific list of configuration requirements for getting the WSL v-switch to work?

Edit: I've manually enabled ICS and tested that it works between my LAN and WiFi manually. However the v-switch is still being removed. So scratch that.

v-switch logs: https://pastebin.com/3ReMcxtc

Last edit: After manually going over basically everything on my laptop, I've eventually found that it was the Acrylic DNS Proxy service I'm using. The service was grabbing port 53 and preventing the ICS service from binding to it.

This solved for me! Ty.

fdavidg commented 4 years ago

I am on version 2004, 10.0.19041.208 and I am still having the error on trying to convert to wsl2.

Emmanuel35 commented 4 years ago

I have the same problem with another service. I have use TCPView to identify witch service was listening on port 53. After stopping it, WSL 2 was well done.

sozercan commented 4 years ago

Same issue started right after I enabled windows containers on 19041.264

Eric2XU commented 4 years ago

For what it is worth, I was having the original problem of FFFFFF error.

How anyone figured out it was a port 53 binding issue I dont know but would like to. But based on those comments in thread I went looking with TCP view and came up empty. Thankfully however being an IT guy I knew that in fact port 53 was in use by Cisco Umbrella Client which is a DNS proxy agent used by Cisco Any Connect VPN on many systems.

I had to stop the following services: image

After that it still didnt work by pressing launch to setup the env however uninstall and reinstalling ubuntu from the store then another launch did work!

So what gives with port 53? Why is that important? How did any one figure out that was the problem?

yshahin commented 4 years ago

I am having this issue with nothing running on port 53. The problem happened to me when I upgraded windows 10 to 2004 Any ETA on this issue being resolved?

AndriiZ commented 4 years ago

Build 19041.264, same error What I tried to do

  1. Changed dnscrypt-proxy from 127.0.01 to 127.0.2.1
  2. Changed startup type from Automatic to Automatic (Delayed)
AndriiZ commented 4 years ago

Same as @alexk111, unfortunately though, that's the port that I use for dnscrypt (as Windows 10 still doesn't have support for DoH). Sucks that there's no way to use both at the same time..?

I found a way to run dnscrypt with WSL2. Listen a loopback ip except 127.0.0.1, such as 127.0.1.1, then configure your network to use dns service on 127.0.1.1, that means 127.0.0.1:53 is not being used, then you can run WSL2 instance without this problem.

Does not help

joseph-kaainoa commented 4 years ago

I was getting the same error (0xffffffff) when trying to upgrade my distros and am running Win10 Version 10.0.19041.264 and Docker 2.3.0.2 with the WSL2 option selected on install. I looked at the install guide at WSL Install Guide for Win10 and it had a step to enable the 'Virtual Machine Platform' component (which it said was an optional component).

dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart

After running the above, it wanted me to restart my machine. After restarting, Docker was running and completed the distro upgrade successfully. The 'Virtual Machine Platform' component was key for me.

Result image

manuelojeda commented 4 years ago

I had the same issue, fixed when I found out Acrylic UI was using the port needed for WSL 2. Just in case some of you had this issue and is using that app.

kelderek commented 4 years ago

Same issue for me running Cisco (OpenDNS) Umbrella client. I had to stop the service, uninstall the distro, then get the distro from the store again before it would work. Curiously, after the first linux distro worked it let me restart the Umbrella service and add other distros with it running, no problem...

Edit: after a reboot the WSL 2 distros won't start with the same 0xffffffff error, even if I stop the Umbrella service first.

paulhennell commented 4 years ago

This came up for me on the official release when trying to convert a new Ubuntu install to WSL2.

Reading the thread it seems whatever is on port 53 is the problem (often docker it seems, but I don't have that installed). For anyone else with the same issue you can find out the conflicting program for you running the following in powershell: netstat -nao | Select-String :53 Then find the item that ends in :53 and note the PID (last column). Put that PID where the zeros are here: tasklist /fi "pid eq 00000" You now get the program name for what's blocking your wsl 👍

Stop that service and WSL2 will work again.

ToGo101 commented 4 years ago

what helped for me was to disable Hyper-V in the windows feature section.

dseeley commented 4 years ago

what helped for me was to disable Hyper-V in the windows feature section.

This worked for me on my work laptop (with Cisco Umbrella and other security stuff on it), but was not necessary on my unencumbered personal machines.

kelderek commented 4 years ago

I was going to test out disabling hyper-v and today for whatever reason after another reboot, it all seems to be working happily together - Umbrella, Hyper-V, and WSL 2. I even did a wsl -l -v to confirm it was really running v2 and not v1. Nothing changed on my end, no services stopped - could WSL2 be reaching out to MS and they changed some remotely given instruction or is it just an intermittent problem? So weird...

Eric2XU commented 4 years ago

This just gets stranger and stranger. Docker Desktop and Docker Engine now tries to take 53 as well. So had to change Docker services to manual to allow WSL enough time to claim 53. I can't express enough to the WSL team you must stop requiring 53 to start. I get why you do it, but its not your port to have.

kelderek commented 4 years ago

@Eric2XU maybe I am missing it in the discussion above, but I don't understand why they need to claim port 53, even briefly, and I 100% agree with you that it is not theirs to claim. Would you mind providing a brief explanation or link explaining why they use it? Thanks!

Eric2XU commented 4 years ago

I "think" it has something to do with being able to redirect http://localhost: requests to docker containers or something like that where in dev testing you can provide a url, WSL/Docker intercepts and provides a hyper-v networking IP instead perhaps. Ill be honest I am not so sure ether.

petrsnd commented 4 years ago

I resolved this issue by uninstalling Docker Desktop, Hyper-V, and WSL completely. Rebooted to come up with nothing on. Then, I put WSL 2 back on alone, following the installation instructions here: https://docs.microsoft.com/en-us/windows/wsl/install-win10. Everything was fine and my Ubuntu distro was working. Docker Desktop was working. I used it for a few days and across several reboots.

Then, I decided I needed to run a few test VMs. I re-enabled the Hyper-V feature. After rebooting I'm back to this error again for WSL 2, and I can't access my Ubuntu distro. Hyper-V is working great, but now I can't do Linux dev the way I want to and Docker is busted.

This is definitely related to dnscrypt-proxy. Running:

Get-Process -Id (Get-NetUDPEndpoint -LocalPort 53).OwningProcess

Reveals that much. In my case it is Cisco Umbrella Client is running dnscrypt-proxy under the covers.

Related: bitbeans/SimpleDnsCrypt#500

I added context to that issue as well.

MovGP0 commented 4 years ago

Docker Desktop was using the VM, so I needed to stop the owning processes. The tipp from @petrsnd solved it for me:

Open a new PowerShell console and check the WSL version:

wsl -l -v
  NAME            STATE           VERSION
* Ubuntu-18.04    Running         1

Find the blocking processes:

Get-Process -Id (Get-NetUDPEndpoint -LocalPort 53).OwningProcess
 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
     18    26.42      30.45       5.31   16432   0 dockerd
     15    11.11      16.40       0.36    3972   0 svchost

Kill the owning processes:

# replace the IDs with the proper IDs from the last command
Stop-Process -Id 16432 -Force
Stop-Process -Id 3972 -Force

Try again to migrate:

wsl --set-version 'ubuntu-18.04' 2
Conversion in progress, this may take a few minutes...
For information on key differences with WSL 2 please visit https://aka.ms/wsl2
Conversion complete.

Check again:

wsl -l -v
  NAME            STATE           VERSION
* Ubuntu-18.04    Stopped         2

Restart Docker Desktop at this point and check again:

wsl -l -v
  NAME                   STATE           VERSION
* Ubuntu-18.04           Running         2
  docker-desktop         Running         2
  docker-desktop-data    Running         2
billgatesfan commented 4 years ago

I also have HyperV running a virtual switch, also have Cisco Umbrella client, but don't have Docker.

At the end, I managed to get it working by stopping the "SharedAccess"/Internet Connection Sharing (ICS). Yes, the service restarts from itself, but that was enough to be able to reach the screen where I am prompted to choose a new UNIX username.

(using Win10 build 19041.329, upgraded from Win10 1909 where I already had Ubuntu WSL enabled)

darkanubis0100 commented 4 years ago

In my case the problem was solved when closing the DNS Server that was using port 53.

What I don't understand is that port 53 has to do with WSL

SchwingSK commented 4 years ago

I have the same problem because of Cisco Umbrella, and company policy prevents me from killing it, so I'm completely stuck. Is there a planned patch to make this requirement in WSL2 an optional one or should I abandon all hope and move along?

darkanubis0100 commented 4 years ago

I have the same problem because of Cisco Umbrella, and company policy prevents me from killing it, so I'm completely stuck. Is there a planned patch to make this requirement in WSL2 an optional one or should I abandon all hope and move along?

I am also waiting for the same answer. I depend a lot on my Private DNS Server and I still cannot explain the use that WSL2 gives it so that it asks to close it obligatorily.

DirtyJerz commented 4 years ago

For those with Cisco Umbrella RC interference, changing the "Umbrella Roaming Client" service startup type from Automatic to Automatic (Delayed Start) seems to have resolved the issue for me.

szymonos commented 4 years ago

In my case it seemed like there was a conflict between two internal networks on Hyper-V Switch. One I've created manually earlier for my Virtual Machines and the other created automatically for WSL. I've simply removed the first one leaving WSL as the only Internal network and problem gone.

The strange thing is that it has been working for some time without any issues and it seems the problem hit me when I first switched my Docker to Windows Containers and then switched back to Linux Containers - after restart WSL couldn't start.

Killing manually processes on port 53 allowed me to start both WSL and Docker but after restart it was failing again, and only removing the other internal network fixed it permanently.

timoschd commented 4 years ago

I had the same issue, what solved it for me: deinstall docker, deactivating compression as in this workaround https://github.com/microsoft/WSL/issues/4103#issuecomment-501511504 , reboot, uninstall and reinstall Ubuntu Distro reinstall Docker

singhdharmveer311 commented 4 years ago

I had the same issue, Solution:-- Install Currports(http://www.nirsoft.net/utils/cports.html#DownloadLinks) and see what is using port 53. In my case it was acrylicservice.exe So uninstall/stop those service.

SiegristJ commented 4 years ago

I ran into this issue earlier today when trying to upgrade to WSL2 and get Ubuntu 20.04 LTS working with it.

Based on the advice in moriarity9211's post above, I was able to find the service that was operating on port 53 using this Powershell command: Get-Process -Id (Get-NetTCPConnection -LocalPort 53).OwningProcess If nothing is running on that port, then Powershell will throw an error. In my case, the offending application turned out to be Cisco AnyConnect VPN Client (which is apparently using dnscrypt-proxy internally). Disabling the Windows service and restarting Windows cleared the port and allowed me to proceed with the Linux distro install (and also to finish installing Docker for Windows with the WSL2 feature).

rickywu commented 4 years ago

I have the same issue because I run CoreDNS which used port 53, but If stop CoreDNS and start WSL then start CoreDNS again, then both of them works well

jerrion commented 4 years ago

I have had this issue on restart since updating to wsl 2. I have cisco anywhere connect installed but also endpoint security. I have noticed that if I start a ubuntu terminal prior to endpoint starting than I am able to use wsl2 ubuntu image. If not I have to use the workaround posted here https://github.com/microsoft/WSL/issues/4364#issuecomment-638895281.

If I don't get the terminal up before endpoint starts I use the workaround of finding the svchost pid thats using port 53 and force killing it. If I wait to long after force killing it it will start again and block wsl2. If I start the ubuntu terminal as soon as I kill the pid it works and everything works fine. Its annoying but hopefully this workaround will help others.

fbhood commented 4 years ago

Same issue here. What fixed the issue in my case was to change the Acrylic DNS port from 53 to 55 in the config using the Acrylic DNS UI.

rickywu commented 4 years ago

This can solve this error, and it works well for me If some application used these ports which WSL need, such as DNS 53, try to start WSL first and then start application in Windows.

MikeWilcoxMicrosoft commented 4 years ago

Same issue: Ubuntu says Error: 0xFFFFFFFF
Press any key to continue

wsl -l -v

NAME STATE VERSION

Windows 10 Enterprise Version 10.0.19041 Build 19041

I will look at what is bound to port 53 ..

MikeWilcoxMicrosoft commented 4 years ago

Note, today was the 1st time this issue came up for me... I did a restart and then Ubuntu was the 1st app I started and Ubuntu came up..

I am a linux guy so I assume the screen shot of the powershell and proc list shows correctly who is listening on port 53 , ie svchost and and dockerd (17248) listening :53 Are we sure that it's port :53 that is blocking WSL Ubuntu startup ?

image

meigallodixital commented 4 years ago

The question still is why are they using a system port ...

simondebbarma commented 4 years ago

Took the hint from https://github.com/microsoft/WSL/issues/5092#issuecomment-651034602 and I found the listening on the same port problem too. So, I removed Docker as I haven't been using it for a while. Then I stopped the svchost by finding it using the relevant PID for SharedAccess. Right after that, WSL was back.

So, I can confirm it is the same port listening that's causing these problems, @MikeWilcoxMicrosoft.

gatlinnewhouse commented 4 years ago

Technitium's DNS Service runs on the port 53 which will cause errors when trying to use the Ubuntu WSL 2 (failed to install it until I stopped the DnsService).

meigallodixital commented 4 years ago

All DNS software runs on port 53 because is the standard port for that protocol. Anyone that have a DNS server will have problems.

harryprabowo commented 4 years ago

All DNS software runs on port 53 because is the standard port for that protocol. Anyone that have a DNS server will have problems.

Can confirm - DNSCrypt service uses port 53, and has to be killed for WSL 2 to work normally again.