Open thielj opened 4 months ago
If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'. Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
The relevant log excerpts have already been included. Happy to provide additional information if necessary.
Maybe it's worth mentioning that setting vmIdleTimeout
to -1 or a sufficient large value has no effect.
If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'. Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
Thank you @thielj.
Based on the symptoms, I wonder if the issue is that the ubuntu distribution installer simply shuts down the WSL distribution after creating the user.
Do you see the same symptoms with let's say debian ?
@OneBlue Hi. Please note that the shutdown is initiated by the '10000ms watchdog' before user credentials are entered (21:46:49). It's not automatically restarting after this.
Only when you provide the default user name and hit enter the instance/distro is brought up again (21:47:00). Just wait long enough (e.g. 30 seconds) when prompted to enter the default username and you will see. The default user is created after this, and the instance/distro is then shutdown a second time before the default user is logged in.
I naturally tried Debian but it comes without cloud-init. I haven't investigated how to roll my own cloud-init enabled WSL distro but would think that with the observed '10000ms watchdog' this would be a waste of time. Even the standard 'apt upgrade' module wouldn't be safe to run.
@OneBlue Maybe this helps to determine if cloud-init is still running?
https://cloudinit.readthedocs.io/en/latest/howto/status.html
@thielj: I wonder if what you're experiencing is the normal VM idle time out if there's no activity. Let's try something: Can you put:
[wsl2]
vmIdleTimeout=-1
in .wslconfig
and see if that the solves the issue for you ?
I tried that already, and it's not. Sorry, I thought I had mentioned it.
On Tue, May 28, 2024, 19:48 Blue @.***> wrote:
@thielj https://github.com/thielj: I wonder if what you're experiencing is the normal VM idle time out if there's no activity. Let's try something: Can you put:
[wsl2] vmIdleTimeout=-1
in .wslconfig and see if that the solves the issue for you ?
— Reply to this email directly, view it on GitHub https://github.com/microsoft/WSL/issues/11602#issuecomment-2135902133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CUVLJLACGS4TSUIZ7OSLZETGQ3AVCNFSM6AAAAABIAMLEYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVHEYDEMJTGM . You are receiving this because you were mentioned.Message ID: @.***>
Ok interesting. Could you capture logs with that .wslconfig value set ?
/logs
@OneBlue It was mentioned here: https://github.com/microsoft/WSL/issues/11602#issuecomment-2121227086
Neither -1 or a much longer timeout made any difference at all. Neither did anything else that looked even remotely related in .wslconfig
. It's also readily reproduceable on my laptop (with a 900000 timeout). I would be surprised to hear from anyone who can't reproduce this.
And it seems I just discovered a workaround: If I log into the distro from another terminal window, during the installation, as soon as this is possible, WaitForBootProcess
still chimes in, the typical root login happens -- but the shutdown does not and cloud-init completes as expected. Just having an interactive shell open seems to prevent it.
Maybe the following is related. The actual shutdown in my case also happens almost exactly 15s after WaitForBootProcess
comes in. https://github.com/microsoft/WSL/discussions/8659
The first timer is how long a WSL instance is idle before it is terminated. Think of this as a wsl --terminate
that occurs when there are no more running processes in it other than init. This is non-configurable, but https://github.com/microsoft/WSL/issues/6782 is a feature request for the ability to configure it. As far as I can tell, this is hard-coded at about 15 seconds.
Have you looked what is triggering your WaitForBootProcess
code? It is clearly setup with a fixed 10000ms timeout and then logs in as root. I don't see this error in the logs after the first/--install start.
If it's supposed to detect a running cloud-init: maybe it would be worth looking into that bit? And if not, my guess is that a service like cloud-init doesn't count as 'running process other than init'. A simple command -v cloud-init && cloud-init status --wait
before initiating the shutdown would allow cloud-init to complete and solve the issue.
@OneBlue And another update regarding that workaround: Looking at the debug console for cloud-init output or counting to 15 and then starting wsl ~ cloud-init status --wait
in another terminal does indeed work. Once you try to do something more than just sleeping, there's another problem though: /etc/resolv.conf
disappears, roughly around the time cloud-init is running apt upgrade
. It's probably triggered by your 10000ms watchdog. So sometimes the upgrade runs through, sometimes it fails. Additional packages installed after that always fail (unable to resolve archive.ubuntu.com). It's definitely no problem with my network and is also independent of networkingMode or dnsTunneling.
I can workaround this by replacing the symlink with an immutable copy of /mnt/wsl/resolv.conf
very early in the boot process, like so:
#cloud-config
bootcmd:
- chattr -i /etc/resolv.conf
- rm -f /etc/resolv.conf
- cp /mnt/wsl/resolv.conf /etc/resolv.conf
- chattr +i /etc/resolv.conf
package_update: true
package_upgrade: true
packages: [ btop ]
Your ConfigCreateResolvConfSymlink
isn't happy about that, which makes me think it might also be involved with its disappearance:
[ 25.822086] WSL (2) ERROR: WaitForBootProcess:3342: /sbin/init failed to start within 10000
[ 25.823081] WSL (2) ERROR: ConfigCreateResolvConfSymlink:1741: symlink(/mnt/wsl/resolv.conf
[ 25.833860] WSL (2) ERROR: ConfigCreateResolvConfSymlink:1741: symlink(/mnt/wsl/resolv.conf
I tried generateResolvConf=false
to no avail. The symlink would still disappear. I suppose /etc/wsl.conf is read before the distro is launched?
Windows Version
Microsoft Windows [Version 10.0.22631.3593]
WSL Version
2.2.4.0 (also tested with v2.1.5 and v2.0.5)
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.153.1-microsoft-standard-WSL2
Distro Version
Ubuntu 24.04
Other Software
I uninstalled anything that could possibly interfer
Repro Steps
I made sure
.wslconfig
doesn't exist.Create a minimalist
.cloud-init/default.user-data
as follows:Run
wsl --install Ubuntu-24.04
. When prompted for the username, pause for about 30 seconds. Then continue as usual.Expected Behavior
I would expect that cloud-init would be able to complete an operation that takes longer than just a few seconds (e.g. installing a larger number of packages, which is how I noticed this first).
Actual Behavior
Some kind of watchdog kicks in 10s after the machine is started, logs in as root and executes a shutdown about 15s later, without letting cloud-init finish and leaving the system in an undefined state.
Diagnostic Logs
When logged in, run
journalctl
. You'll find entries like the following (some lines ~~~ omitted):The last two lines above are the last logged message and the first message once the machine is started again.
The shutdown (21:46:49) was most likely initiated by whatever the watchdog's root login was doing.
The start (21:48:00) was initiated interactively by entering the default user credentials.