microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.36k stars 815 forks source link

WSL 2 fails to start when mapped network drive is unreachable #10453

Closed andersonh-uta closed 3 months ago

andersonh-uta commented 1 year ago

Windows Version

Microsoft Windows [Version 10.0.22621.2134]

WSL Version

1.2.5.0

Are you using WSL 1 or WSL 2?

Kernel Version

5.15.90.1

Distro Version

Debian 11

Other Software

Windows Terminal

Repro Steps

  1. Map a network drive which requires a VPN connection to access.
  2. Disconnect from the VPN.
  3. Kill any existing WSL processes.
  4. Attempt to launch WSL.

The issue persists with any distro I've tried, not just Debian--it's happened with all available versions of Ubuntu as well.

Expected Behavior

WSL should start. Updated September 8: WSL should start as quickly as normal, within a few seconds.

Actual Behavior

~WSL does not start. When opened through Windows Terminal, it presents a black screen with no interaction. (I have waited about 10 minutes on several occasions to be sure, and it never gave me an interactive prompt).~

~PowerShell also does not launch after trying to open WSL (works fine if opening PowerShell before launching WSL), unless run as administrator; it never reaches an input prompt. If an instance of PowerShell is already running, all WSL commands run in PowerShell will hang and result in no output, and will not be reliable interruptible with Ctrl+C.~

Updated/corrected September 8: WSL takes in excess of 10+ minutes to start up, during which time PowerShell will usually fail to start if I have not already launched it since last rebooting. If I already have a PowerShell window open, all wsl commands excelt wsl --version will hang, providing no output and not responding to Ctrl+C interrupts, until WSL fully starts up. (Updated September 11) If I try to open a WSL window during this time, e.g. to get a bash prompt inside of my Debian instance, the Windows Terminal tab will be completely blank and have no interactivity for a long period of time (the 10+ minutes mentioned before). WSL seems to be hanging during startup at some point before it gets to an interactive bash prompt, and this seems to be causing all wsl commands run from powershell/cmd.exe to also hang until everything initializes.

(Updated September 15th) The 10+ minute startup is a one-time thing. After WSL is up and running, it behaves as normal, including opening new shells basically instantly.

Diagnostic Logs

Disconnecting all network drives, or re-connecting to the VPN, allows WSL and PowerShell to behave as expected. However, regularly disconnecting and reconnecting to the network drives is not a viable workaround.

Launching WSL on the VPN, then disconnecting from the VPN, causes no issues. The issue occurs exclusively when first launching WSL.

My %userprofile%/wslconfig file only has the debug console enabled. All other values are at defaults.

Within WSL, /etc/fstab has no entries; it's the default that came with the WSL instance.

This behavior has been around for me since Windows 10 (I forget the build number; just got a new laptop with Windows 11, and I no longer have access to the Windows 10 computer to check).

This behavior persists when doing a fresh install of any chosen WSL distribution.

~Possibly related~ (Updated September 8) Related issues: #9570, #9358 - both mention slow startup times when network drives are unreachable ~but for me, the WSL instance fails to start entirely~.

WSL logs: WslLogs-2023-09-07_09-21-16.zip

OneBlue commented 1 year ago

Thank you for reporting this @andersonh-uta. I'm seeing a lot of drvfs traffic in the logs. I wonder if your shell is causing the hang.

When you reproduce the issue, does "wsl echo ok" work ? If not, can you share the output of wsl strace echo ok ?

andersonh-uta commented 1 year ago

Thank you for reporting this @andersonh-uta. I'm seeing a lot of drvfs traffic in the logs. I wonder if your shell is causing the hang.

When you reproduce the issue, does "wsl echo ok" work ? If not, can you share the output of wsl strace echo ok ?

Maybe I hadn't been waiting long enough before, or something changed; after about 10 minutes, this time, WSL started up. I've updated the original message accordingly; looks like this might now be a duplicate of the the two issues linked there.

All the same: I started WSL, waited about five minutes, and ran wsl echo ok and wsl strace echo ok in two different PowerShell tabs. Neither gave me any output until WSL fully started up.

Logs attached (collected up through when WSL was up and running): WslLogs-2023-09-08_15-20-13.zip

OneBlue commented 1 year ago

Thank you @andersonh-uta. Can you try to run "wsl --shutdown", and then "wsl --echo ok" ? I'd be curious to see if this is just blocked by something else.

Also can you share the content of /etc/wsl.conf, if any ?

andersonh-uta commented 1 year ago

@OneBlue (sorry for the slow response, busy weekend) Running wsl --shutdown does nothing when I re-create the issue (not until the whole WSL system boots at least, at which point it behaves as normal). wsl --echo ok does nothing, since it looks like --echo isn't a valid flag. (wsl echo ok does the same thing it did before, though).

Contents of /etc/wsl.conf:

[user]
default=(my username)

(actual username is masked in the above)

OneBlue commented 1 year ago

Oh sorry about that. I meant "wsl echo ok".

andersonh-uta commented 1 year ago

No worries. As mentioned, same behavior as before--no output at all until WSL eventually starts up, at which point it echoes back "ok".

OneBlue commented 1 year ago

Is that the case if you only run "wsl echo ok" while nothing else is running ?

Do you see the same issue if you your distribution to WSL1 ?

andersonh-uta commented 1 year ago

I can only run wsl echo ok and get the expected output ("ok" being echoes back to me) once WSL is up and running, either because I'm connected to the VPN and the network drive is visible, or because I waited the 10+ minutes for it to start up when off the VPN. If I run wsl --shutdown, disconnect from the VPN, then run wsl echo ok, I get the WSL debug console opening up, but nothing happens for the 10+ minutes. (during this time all the previously mentioned symptoms are happening--WSL commands hang with no output/information, trying to start a WSL session via Windows Terminal leads to a completely blank screen with nothing happening, etc).

I tried it with WSL1 and there was no difference in behavior--everything still hangs for 10+ minutes when I'm not connected to the VPN and try to first start WSL. Whatever this is appears to be independent of the WSL version.

I've tried adding the network drive to my /etc/fstab file and setting a 30 second timeout, but this doesn't seem to change anything. It mounts correctly when I'm on the VPN.

I've also tried disabling automounting in my %userprofile%/.wslconfig file, and the the issues still persist.

I've also noticed that Windows Explorer hangs when WSL is stuck in this long startup state. It launched, but it doesn't populate anything--no icons, nothing in the left hand panel, and trying to manually type in a location in the location bar also does nothing--until WSL finishes starting up. I suspect this might be related to the WSL links added under the "Linux" section of the left-hand panel. This particular issue is new as of me moving to Windows 11--I never had this particular issue on Windows 10 before.

OneBlue commented 1 year ago

Interesting. Could you take a dump of the wsl processes while WSL is "hanging". Maybe wait a minute or two after starting it so we can capture where things are stuck.

/dumps

microsoft-github-policy-service[bot] commented 1 year ago

Hello! Could you please provide logs and process dumps to help us better diagnose your issue?

To collect WSL logs and dumps, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1 -Dump

The scipt will output the path of the log file once done.

Once completed please upload the output files to this Github issue.

Click here for more info on logging

Thank you!

andersonh-uta commented 1 year ago

Hm, the zipped logs are ~200mb and it won't let me upload them. Is there some way around this or some specific thing in the log files I should look for and report back on?

Details of what I did:

  1. Disconnect from the VPN, and run wsl --shutdown.
  2. Start collecting logs.
  3. Open a Debian (WSL 2) tab in Windows Terminal.
  4. Wait about 2-3 minutes.
  5. Run wsl echo ok in a Powershell tab in Windows Terminal (it hangs, like before).
  6. Open Windows Explorer (which fails to populate any icons/shortcut options/etc, as before; it's just a completely blank screen instead of showing any files/folders/etc, and has a green progress bar moving across the location bar, but the progress bar crawls to a halt and never makes it all the way across).
  7. Wait another 2-3 minutes.
  8. Stop the logs.

This is with the automount functionality disabled in /etc/wsl.conf (realized it belongs there, not in the %userprofile%/.wslconfig file, after my previous messages).

Ammending previoust statement about Explorer: it does seem to eventually show entries, but it takes a few minutes.

OneBlue commented 1 year ago

The .dmp files is what I'd need to look at to see what WSL is stuck on.

Could you upload them to OneDrive / GDrive or equivalent ?

andersonh-uta commented 1 year ago

Ah, yes, there are other places than GitHub that files can be uploaded to. :)

Let me know if this link works: https://drive.google.com/file/d/1xYt3TrPxMdOhsM5aflXNrCi9s5hyqisM/view?usp=sharing

HarrisonPace commented 1 year ago

I'm experiencing the same behavior, I currently work around it by connecting to my work's VPN , ensuring all network drives are connected then starting WSL. I believe the behavior is much worse recently. Running 22621.2215 22H2 Windows 11.

Happy to send any debugging information as well.

rliessum commented 1 year ago

I do not have any (inaccessible/network) drive mappings, but I still experienced the same 10-second+ wsl boot time issue. I followed the recommendation of setting automount=false in my wsl.conf file (provider flag workarounds are not relevant since I do not have them).

[    4.801133] systemd-journald[40]: Received client request to flush runtime journal.
[    5.572900] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.763168] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.763435] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.763616] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.763812] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.764003] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.764186] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.764372] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.764555] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.764739] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.764923] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.765109] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.765290] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    5.765472] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    6.160290] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
aiyou9 commented 1 year ago

remove the network LightWeight Filter in my web connect setting image

the network LightWeight Filter also stop my web connecting before i install distrubution by wsl2

net stop wslservice
net start wslservice

and then wsl2 work so please check web connect before using wsl2

andersonh-uta commented 1 year ago

@aiyou9 Thanks for finding that--I don't have a LIghtWeight Filter entry for my network adapters, though. But maybe someone else can confirm that this helps?

andersonh-uta commented 1 year ago

@OneBlue Just checking; were you able to access the logs in Google Drive?

alexander-nolan commented 11 months ago

I also just had this problem. One of my network drives no longer existed. That caused all WSL commands to hang indefinitely.

As soon as I unmapped the broken network drive and rebooted it all came back to life.

andersonh-uta commented 4 months ago

Update:

After a recent update--not sure if it was to WSL, Windows, or something else--this issue no longer seems to be happening. WSL starts in the expected amount of time (a few seconds) when the network drive is unreachable.

The original issue may have been a Windows issue, not a WSL issue, since Explorer would also not initialize for quite a long time if the network drive was unreachable. (It would show a completely blank window--no files/folders, nothing at all in the left-hand pane, etc). I suspect there's a setting somewhere for how long to wait for a drive to become available, and that this setting needs to be changed to fix the problem, but that's just speculation. I haven't been able to find where any such settings might be; probably somewhere in the registry, but I'm not familiar enough with that to know where to start looking. If someone else knows, that might be good information to share here.

For completeness:

OneBlue commented 3 months ago

The issue has indeed been fixed in 2.2.3.

Closing.