yuk7 / ArchWSL

ArchLinux based WSL Distribution. Supports multiple install.
https://git.io/archwsl
MIT License
6.91k stars 201 forks source link

systemctl initializing #356

Open CoachYT1 opened 8 months ago

CoachYT1 commented 8 months ago

Describe the issue After updating to latest ArchWSL systemctl is not working. systemctl status shows initializing

To Reproduce Update to latest ArchWSL and make a clean installation.

Expected behavior systemctl should start normally

Screenshots image

Enviroment:

9numbernine9 commented 8 months ago

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

CoachYT1 commented 8 months ago

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

* `wsl --shutdown` to terminate all running WSL instances

* Add a `%USERPROFILE%\.wslconfig` file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
* Wait 10 seconds or so, then restart your Arch WSL.

image

Same

xuangeyouneihan commented 8 months ago

I have the same problem, and it does not work for me either 😰 The only thing changed is that Tainted: cgroupsv1 has gone

xuangeyouneihan commented 8 months ago

Well, I found this and modified .wslconfig according to it, then it worked. But when I renamed .wslconfig to .wslconfig1 without modifying it to enable cgroups v1, Systemd was also working somehow. Then I tried to rename .wslconfig1 back without modifying it to disable cgroups v1, backup the origional ext4.vhdx, unregister ArchWSL, and then re-install it with a new ext4.vhdx, Systemd did not work again. Finally I deleted .wslconfig, and replaced the new ext4.vhdx with the old one, and Systemd works. So why did it work in my old ext4.vhdx, and why didn't it work in a new ext4.vhdx?

9numbernine9 commented 8 months ago

I'm running into this issue as well when setting up an ArchWSL instance on a brand new Windows 10 installation (despite my earlier comments about potential workaround/solutions).

Trying to narrow this down a bit further, I started going back through ArchWSL relesases:

What's odd is that it works fine with the last 2022 release - and not only that, I can bring all the packages up-to-date with pacman -Syu and everything still works fine. I don't know a lot about how WSL distributions are created, but it's something that's changed in the initial configuration/bootstrapping processes between those releases?

C:\> wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.4170
xuangeyouneihan commented 8 months ago

I'm running into this issue as well when setting up an ArchWSL instance on a brand new Windows 10 installation (despite my earlier comments about potential workaround/solutions).

Trying to narrow this down a bit further, I started going back through ArchWSL relesases:

* [24.3.11.0](https://github.com/yuk7/ArchWSL/releases/tag/24.3.11.0) ❌

* [24.2.24.0](https://github.com/yuk7/ArchWSL/releases/tag/24.2.24.0) ❌

* [22.10.16.0](https://github.com/yuk7/ArchWSL/releases/tag/22.10.16.0) ✔️

What's odd is that it works fine with the last 2022 release - and not only that, I can bring all the packages up-to-date with pacman -Syu and everything still works fine. I don't know a lot about how WSL distributions are created, but it's something that's changed in the initial configuration/bootstrapping processes between those releases?

C:\> wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.4170

Does wayland-0 exist in /run/user/$UID with version 22.10.16.0? I found that wayland-0 is missing in version 24.3.11.0 when Systemd accidentally enabled, see #357

9numbernine9 commented 8 months ago

Does wayland-0 exist in /run/user/$UID with version 22.10.16.0? I found that wayland-0 is missing in version 24.3.11.0 when Systemd accidentally enabled, see #357

No, it doesn't.

rayae commented 8 months ago

I manually built a rootfs with docker, everything works well. I think this problem just in the repo's release. My build script(built with China pacman mirror) create-rootfs.sh user-dbus-wayland-x11 user-systemctl-status system-systemctl-status

mrcaidev commented 8 months ago

None of these solutions work on my side. Only rolling back to version 22.10.16.0 works.

I'm using version 24.3.31.0 on Windows 11.

wsl --version

WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3374
xuangeyouneihan commented 8 months ago

Does Systemd work normally in v24.3.31.0 released yesterday?

yuk7 commented 8 months ago

@xuangeyouneihan Sorry, nothing has changed on that front in that release

xuangeyouneihan commented 8 months ago

@xuangeyouneihan Sorry, nothing has changed on that front in that release

Hope this will be fixed soon 😂 BTW, do you have any idea on what caused this issue?

WH-2099 commented 8 months ago

Does Systemd work normally in v24.3.31.0 released yesterday?

v24.3.31.0 still not work for my environment.

WSL Version: 2.2.1.0
Kernel Version: 5.15.150.1-2
WSLg Version: 1.0.60
MSRDC Version: 1.2.5105
Direct3D Version: 1.611.1-81528511
DXCore Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows Version: 10.0.22635.3420
WH-2099 commented 7 months ago

After testing a combination of Arch.exe wsldl.exe rootfs.tar.gz. I now suspect that the problem is mainly related to rootfs.tar.gz and most likely to the systemd-firstboot.service service. I'm continuing to troubleshoot the problem.

WH-2099 commented 7 months ago

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-firstboot.service stuck is the direct cause.

The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl.


Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot. a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.


The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

CoachYT1 commented 7 months ago

image In my case also systemd-networkd-wait-online.service was blocking the systemd boot process.

wswind commented 7 months ago

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-fisrtboot.service stuck is the direct cause.

The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl.

Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot. a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.

The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

Spelling error should be 'firstboot' instead of 'fisrtboot'

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

image

As I tested, remove /etc/machine-id from rootfs.tar.gz would not fix this issue.

CnsMaple commented 7 months ago

I manually built a rootfs with docker, everything works well. I think this problem just in the repo's release. My build script(built with China pacman mirror) create-rootfs.sh user-dbus-wayland-x11 user-systemctl-status system-systemctl-status

@rayae Thank you for your script, it's very useful.

mrcaidev commented 6 months ago

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

This fixed my problem. I'm using v24.4.28.0.

shanoor commented 4 months ago

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

* `wsl --shutdown` to terminate all running WSL instances

* Add a `%USERPROFILE%\.wslconfig` file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
* Wait 10 seconds or so, then restart your Arch WSL.

I had an issue with a very long wsl boot and systemd not starting right away (with the infamous Failed to connect to bus: No such file or directory), I had to wait 30s and manually run sudo systemctl start user@1000 every time to get systemd back. Your solution worked for me, it now back to what it was before, it's fast again and working, thanks!

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

I also had to do this to get Docker working again. Thanks!

WH-2099 commented 4 months ago

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-fisrtboot.service stuck is the direct cause. The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl. Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot. a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.

The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

Spelling error should be 'firstboot' instead of 'fisrtboot'

This is how I fix this issue:

1. Cancel running jobs like systemd-firstboot.service

2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

image

As I tested, remove /etc/machine-id from rootfs.tar.gz would not fix this issue.

thx

l3n4QAQ commented 3 months ago

I'm using v24.4.28.0.

modify ExecStart in /usr/lib/systemd/system/systemd-networkd-wait-online.service.

The new ExecStart should be: ExecStart=/usr/lib/systemd/systemd-networkd-wait-online -i eth0 --any --timeout=10

restart WSL: wsl --shutdown

check again: systemctl status

kloon15 commented 3 months ago

Proper workaround here: https://github.com/microsoft/WSL/issues/11857