microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.02k stars 797 forks source link

`systemd` doesn't create user (d)bus #8842

Open mangkoran opened 1 year ago

mangkoran commented 1 year ago

Version

Windows version: 10.0.22621.521

WSL Version

Kernel Version

5.15.62.1-microsoft-standard-WSL2

Distro Version

ArchWSL

Other Software

Repro Steps

  1. Upgrade WSL to 0.67.6.0
  2. Enable systemd.
  3. sudo loginctl enable-linger <username> [1]
  4. wsl --shutdown
  5. Open WSL

Expected Behavior

User systemd service should launched.

Actual Behavior

User systemd service is not launched. Checked via below commands. Neither succeeded.

❯ systemctl list-units --type=service --user
Failed to connect to bus: No such file or directory

❯ systemctl status --user
Failed to connect to bus: No such file or directory

Diagnostic Logs

I noticed that now WSL set below variables correctly.

❯ echo $XDG_RUNTIME_DIR\n$DBUS_SESSION_BUS_ADDRESS
/run/user/1000/
unix:path=/run/user/1000/bus

But then I suppose that this is due to init now handled by systemd, which is the normal behavior. However, the user (d)bus is not present in the above directory.

❯ exa -l /run/user/1000/
drwx------ - mangkoran 22 Sep 05:30 dbus-1
|rw------- 0 mangkoran 22 Sep 05:31 fish_universal_variables.notifier
srwx------ 0 mangkoran 22 Sep 05:31 lf.mangkoran.sock
drwx------ - mangkoran 22 Sep 05:30 pulse
srwxrwxrwx 0 mangkoran 22 Sep 05:30 wayland-0
.rw-rw---- 0 mangkoran 22 Sep 05:30 wayland-0.lock
cerebrate commented 1 year ago

This is an artifact of how WSL creates shells/commands when you run them; i.e., just spawning them directly.

It doesn't go through the normal Linux login process (or variants of same) and thus doesn't invoke PAM, and it's the systemd PAM module that creates a login session and spawns a user systemd (and thus session dbus) for it.

(You can see this if you run loginctl, which will tell you there are no established login sessions.)

I got around this in genie by installing systemd-machined, which then let me use wsl sudo machinectl shell username@.host to create a proper login session. This same workaround does the same under the Microsoft systemd implementation, and produces the expected user systemd and session dbus.

Hopefully in the future the WSL team can incorporate the needed PAM-triggering functionality directly into wsl.exe.

mangkoran commented 1 year ago

Thank you for your insight. Now that we know what are the missing "step" from WSL invocation.

I do however see my user from loginctl 🤔

❯ loginctl
SESSION  UID USER      SEAT TTY
     c1 1000 mangkoran      pts/2

1 sessions listed.

Is it because I run sudo loginctl enable-linger <username> before?

cerebrate commented 1 year ago

Yes, turning on linger will create the user session/user systemd instance/session bus as part of the systemd start process.

I believe you'll still need to use the/a workaround to get inside the user session before you can make (reliable) use of them, though.

bhoppi commented 1 year ago

the variables $XDG_RUNTIME_DIR and $DBUS_SESSION_BUS_ADDRESS are not created by systemd, the evidence is that the value of $XDG_RUNTIME_DIR is with a trailing character '/', which is not the behavior of systemd.

bhoppi commented 1 year ago

Hopefully in the future the WSL team can incorporate the needed PAM-triggering functionality directly into wsl.exe.

I think a better way is to give up /init and let systemd take over everything in the VM. Because by using systemd, most of /init's functionality is redundant. MS engineers just need to implement a few systemd services to be responsible for host interaction things such as sync and interop.

cerebrate commented 1 year ago

the variables $XDG_RUNTIME_DIR and $DBUS_SESSION_BUS_ADDRESS are not created by systemd, the evidence is that the value of $XDG_RUNTIME_DIR is with a trailing character '/', which is not the behavior of systemd.

Actually, they are. For the latter, see /usr/lib/systemd/user/dbus.socket, where the DBUS_SESSION_BUS_ADDRESS variable is set in the session base environment; the former is set by pam_systemd during login (see here: https://www.freedesktop.org/software/systemd/man/pam_systemd.html).

It's not a problem with the Microsoft /init per se. It's the general problem that creating a process as a user does not necessarily create a user session for that user; you have to go through one of the routes that call upon PAM for that.

cerebrate commented 1 year ago

Still an issue in 0.68.2, although I note that WSL now automounts the /mnt/wslg/runtime-dir at /run/user/1000, which saves a peck of trouble.

Edited: unless something you do does start a systemd user session, at which point user-runtime-dir@.service blows the contents of that folder and thus /mnt/wslg/runtime-dir away. That's problematic.

psvo commented 1 year ago

I could make the user session partially work with native systemd support by providing /bin/login binary (it can be just a symlink to the actual login binary). It's launched by WSL only once, but it keeps the session running even without linger enabled.

It gives me running systemd user units and it makes systemctl --user commands usable. Note it's still not a complete solution, see #9213 - the PAM is not invoked for the actual shell and there're some hardcoded enviroment variables set probably by the WSL /init:

XDG_RUNTIME_DIR=/run/user/1000/
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
$ strings /init | grep /run/user
/run/user/%d/
unix:path=/run/user/%d/bus

So, YMMV, but it works at least on my system:

>wsl --version
WSL version: 1.0.0.0
Kernel version: 5.15.74.2
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19044.2311
NixOS version: 22.05.20221125
Systemd version: 250 (250.8)

/etc/wsl.conf

[boot]
systemd=true

[automount]
enabled=true
mountFsTab=false
options=metadata,uid=1000,gid=100
root=/mnt/

[interop]
appendWindowsPath=false
enabled=true

[network]
generateHosts=true
generateResolvConf=true
hostname=nixos

[user]
default=nixos
psmolkin commented 1 year ago

wsl.exe -u root --shell-type login -- login -f <username> works for me (Fedora37)

UPD:

wsl --shutdown
wsl -u root -- su <username>
ls -al /run/user/1000

total 0
drwx------ 4 username username 120 Nov 30 15:57 .
drwxr-xr-x 4 root     root      80 Nov 30 15:57 ..
srw-rw-rw- 1 username username   0 Nov 30 15:57 bus
srw-rw-rw- 1 username username   0 Nov 30 15:57 pipewire-0
drwxr-xr-x 2 username username  60 Nov 30 15:57 pulse
drwxr-xr-x 5 username username 140 Nov 30 15:57 systemd

but when start it normally:

wsl --shutdown
wsl -u <username>
ls -al /run/user/1000

total 0
drwx------ 4 username username 120 Nov 30 16:00 .
drwxr-xr-x 3 root     root      60 Nov 30 16:00 ..
drwx------ 3 username username  60 Nov 30 16:00 dbus-1
drwx------ 2 username username  80 Nov 30 16:00 pulse
srwxrwxrwx 1 username username   0 Nov 30 16:00 wayland-0
-rw-rw---- 1 username username   0 Nov 30 16:00 wayland-0.lock
D3vil0p3r commented 1 year ago

I have the same issue. When I install the WSL image (Arch Linux with systemd), on the first login the dbus works correctly and on /run/user/1000 I have:

drwx------ athena 1000  180 B Mon Apr  3 20:42:38 2023  ./
drwxr-xr-x root   root   60 B Mon Apr  3 20:42:38 2023  ../
srw-rw-rw- athena users   0 B Mon Apr  3 20:42:38 2023  bus=
drwx------ athena 1000   60 B Mon Apr  3 20:42:35 2023  dbus-1/
drwx------ athena users  60 B Mon Apr  3 20:42:38 2023  gcr/
drwx------ athena users 140 B Mon Apr  3 20:42:38 2023  gnupg/
drwx------ athena users  60 B Mon Apr  3 20:42:38 2023  keyring/
drwx------ athena 1000   80 B Mon Apr  3 20:42:38 2023  pulse/
drwxr-xr-x athena users 140 B Mon Apr  3 20:42:38 2023  systemd/

Then, if I run exit and then I run wsl --shutdown and I go to check the files in /run/user/1000, I have still the bus file:

drwx------ athena 1000  180 B Mon Apr  3 20:44:50 2023  ./
drwxr-xr-x root   root   60 B Mon Apr  3 20:44:50 2023  ../
srw-rw-rw- athena users   0 B Mon Apr  3 20:44:50 2023  bus=
drwx------ athena 1000   60 B Mon Apr  3 20:44:48 2023  dbus-1/
drwx------ athena users  60 B Mon Apr  3 20:44:50 2023  gcr/
drwx------ athena users 140 B Mon Apr  3 20:44:50 2023  gnupg/
drwx------ athena users  60 B Mon Apr  3 20:44:50 2023  keyring/
drwx------ athena 1000   80 B Mon Apr  3 20:44:50 2023  pulse/
drwxr-xr-x athena users 140 B Mon Apr  3 20:44:50 2023  systemd/

Then, if I run exit and then I run wsl --unregister <distro-name> and then I run again the WSL image, after the installation, I go to check the files in /run/user/1000, bus is deleted and I have:

drwx------ athena 1000  140 B Tue Apr  4 01:08:56 2023  .
drwxr-xr-x root   root   60 B Tue Apr  4 01:08:51 2023  ..
drwx------ athena 1000   60 B Tue Apr  4 01:08:49 2023  dbus-1
drwx------ athena users  40 B Tue Apr  4 01:08:56 2023  gnupg
drwx------ athena 1000   80 B Tue Apr  4 01:08:49 2023  pulse

So, why after wsl --unregister <distro-name> the bus file is not created?

D3vil0p3r commented 1 year ago

I am noting an additional behavior. From scratch, if I install the WSL image, I get the right bus and systemd files in /run/user/1000. Some other times, from scratch, if I install the WSL image, I don't get those bus and systemd files in /run/user/1000. So, it seems that sometimes the bus and systemd files are created on /run/user/1000 and other times are not. I don't understand by which "logic" it creates the bus and systemd and sometimes it does not.

D3vil0p3r commented 1 year ago

I solved it by running sudo systemctl restart user@1000. After this command, bus and systemd appear again in the /run/user/1000 directory.

cerebrate commented 1 year ago

@psvo

As I understand the current situation, WSL always tries to set up a user session by running /bin/login - but what it doesn't do is check whether or not systemd has started up enough to be able to answer the request generated by PAM via login to set up the user session/runtime directory/systemd/etc.

The upshot of this is that on a fast and/or lightly-loaded machine, you always get a user session, but the more heavily loaded your machine is, the more likely it is that this login will go off before systemd is ready for it and you'll get nothing.

(Not a problem that can occur on native Linux, of course, since you can't even try to log in until systemd has already passed the relevant startup phase.)

Among the various things my hacks do is ensure that no WSL session is started (it wait-loops) before systemd is ready for it.

SvenVD commented 1 year ago

What is the hack you are describing: "do is ensure that no WSL session is started (it wait-loops) before systemd is ready for it." ?

I had the problem starting vscode inside wsl2 (Rocky Linux release 9.1 (Blue Onyx)) requiring /run/user/1000, but that one was missing

I do have the /mnt/wsl/run/user/1000, but I do not have /run/user/1000

sudo systemctl restart user@1000 in .bash_profile did solve my issue, however I want to understand what is going wrong.

cat /etc/wsl.conf
[user]
default=sven

[network]
generateResolvConf = false

[boot]
systemd=true
ahupp commented 1 year ago

Based on the theory that it's a race during startup, starting wsl as root before running my own shell fixes it:

wsl --user root true
wsl
cerebrate commented 1 year ago

@SvenVD This function:

https://github.com/arkane-systems/bottle-imp/blob/master/binsrc/imp/__main__.py#L58-L111

in bottle-imp handles this situation. The first loop (lines 61-78) waits until the system dbus socket is available, because you need the system dbus to query systemd for its state. The second loop (lines 80-103) keep querying systemd as to its state (via https://github.com/arkane-systems/bottle-imp/blob/master/binsrc/imp/helpers.py#L24 ) until it has reached the running state (i.e., all the units are successfully started up).

This delay means that when we need to ask systemd to do something - such as when login (via PAM) asks systemd-logind to create a user session, runtime directory, etc., etc., - it's ready to do so. WSL doesn't include this delay natively, so if your computer isn't fast enough to start all the relevant bits of systemd up before it gets around to making a login session, it's not prepared and listening to the instruction to create it, and so you don't get one.

moo1210 commented 10 months ago

Really need this to run systemctl --user commands still. I just ran into this today. Logging in twice doesn't seem like a reasonable solution to me.

HarbingerNight commented 6 months ago

I have had success working around the wsl pam_env and systemctl --user commands issue by changing the wsl startup command parameters to launch a shell through sudo i.e.

wsl -u root sudo -u username zsh

whether or not this will help op with `loginctl enable-linger username' I can not say for sure.

khei4 commented 3 months ago

I tackled this problem on the context of the WebKit GTK port building with the following wsl version installed by wsl --install.

$ wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.3447 
[08:30] :WebKit (main %) | systemctl status --user
● XXX
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Thu 2024-04-18 08:17:10 JST; 13min ago
   CGroup: /user.slice/user-1000.slice/user@1000.service
           └─init.scope
             ├─413 /lib/systemd/systemd --user
             └─414 (sd-pam)
[08:30] XXX:WebKit (main %) | cat /etc/wsl.conf
[boot]
systemd=true 

and the workaround doesn't work for me. I also tried wsl --shutdown and Windows(host) reboot.

[09:17]  | sudo systemctl restart user@1000
[09:18] | ls -al /run/user/1000
total 0
drwx------ 6 khei4 khei4 220 Apr 18 09:18 ./
drwxr-xr-x 4 root  root   80 Apr 18 09:17 ../
drwxr-xr-x 3 khei4 khei4  60 Apr 18 09:17 dbus-1/
prw------- 1 khei4 khei4   0 Apr 18 09:17 fish_universal_variables.notifier|
drwx------ 2 khei4 khei4 140 Apr 18 09:18 gnupg/
srw-rw-rw- 1 khei4 khei4   0 Apr 18 09:18 pk-debconf-socket=
drwxr-xr-x 2 khei4 khei4  80 Apr 18 09:17 pulse/
srw-rw-rw- 1 khei4 khei4   0 Apr 18 09:18 snapd-session-agent.socket=
drwxr-xr-x 5 khei4 khei4 140 Apr 18 09:18 systemd/
lrwxrwxrwx 1 root  root   31 Apr 18 09:17 wayland-0 -> /mnt/wslg/runtime-dir/wayland-0=
lrwxrwxrwx 1 root  root   36 Apr 18 09:17 wayland-0.lock -> /mnt/wslg/runtime-dir/wayland-0.lock 

Control pane > programs and features > Turn Windows features on or off

Virtual Machine Platform and Windows Subsystem for Linux is tuned on

[11:27] X:WebKit (main %) | Tools/Scripts/update-webkitgtk-libs
Ensuring the local Flatpak repository is not corrupted
[19/20] Verifying webkit-sdk:runtime/org.webkit.Platform/x86_64/23.08…
Checking remotes...
Pruning objects
Updating Flatpak environment
Looking for updates…

        ID                                           Branch         Op         Remote             Download
 1. [✗] org.freedesktop.Platform.GL.default          23.08          i          webkit-sdk         1.0 kB / 484.9 MB

Warning: While pulling runtime/org.freedesktop.Platform.GL.default/x86_64/23.08 from remote webkit-sdk: Invalid checksum for static delta 9c2522fc508be5d80b50a7ae93daa626ed23462bb8d374480c92dc937187c401
Installation complete.
SDK version: 277469@main
Updating icecc/sccache standalone toolchain archives
Error connecting: Could not connect: No such file or directory
Failed to get a11y address Command '('gdbus', 'call', '-e', '-d', 'org.a11y.Bus', '-o', '/org/a11y/bus', '-m', 'org.a11y.Bus.GetAddress')' returned non-zero exit status 1.
bwrap: Can't find source path /run/user/1000/bus: No such file or directory

The following command returned a non-zero exit status: flatpak run --user --die-with-parent --filesystem=host --allow=devel --talk-name=org.gtk.vfs --talk-name=org.gtk.vfs.* --device=all --device=dri --share=ipc --share=network --socket=pulseaudio --socket=session-bus --socket=system-bus --socket=wayland --socket=x11 --system-talk-name=org.a11y.Bus --system-talk-name=org.freedesktop.GeoClue2 --talk-name=org.freedesktop.Flatpak --talk-name=org.freedesktop.secrets --env=TEST_RUNNER_INJECTED_BUNDLE_FILENAME=/app/webkit/WebKitBuild/Release/lib/libTestRunnerInjectedBundle.so --env=PATH=/usr/lib/sdk/llvm16/bin:/usr/bin:/usr/lib/sdk/rust-stable/bin/ --env=TZ=America/Los_Angeles --env=WAYLAND_DISPLAY=wayland-0 --env=DISPLAY=:0 --command=/usr/bin/which org.webkit.Sdk gcc
Output: b''
Died at Tools/Scripts/update-webkitgtk-libs line 30. 

But I finally could see bus by removing the wsl --installed Ubuntu 22.04 by wsl --unregister Ubuntu and remove from app Windows start menu, then download Ubuntu 22.04 LTS from Microsoft Store, and reboot Windows(host).

[08:33] | ls -al /run/user/1000
total 0
drwxr-xr-x 12 khei4 khei4 480 Apr 19 07:57 .
drwxr-xr-x  3 root  root   60 Apr 18 18:04 ..
drwxr-xr-x  2 khei4 khei4 200 Apr 19 08:05 .dbus-proxy
drwxr-xr-x  4 khei4 khei4  80 Apr 19 08:05 .flatpak
drwx------  3 khei4 khei4  80 Apr 18 18:49 .flatpak-helper
drwx------  2 khei4 khei4  60 Apr 18 18:49 at-spi
srw-rw-rw-  1 khei4 khei4   0 Apr 18 18:04 bus
drwxr-xr-x  4 khei4 khei4  80 Apr 18 18:04 dbus-1
drwx------  2 khei4 khei4  60 Apr 18 18:12 dconf
dr-x------  2 khei4 khei4   0 Jan  1  1970 doc
drwx------  2 khei4 khei4 140 Apr 18 18:04 gnupg
srw-rw-rw-  1 khei4 khei4   0 Apr 18 18:04 pipewire-0
-rw-rw----  1 khei4 khei4   0 Apr 18 18:04 pipewire-0.lock
srw-rw-rw-  1 khei4 khei4   0 Apr 18 18:04 pk-debconf-socket
drwxr-xr-x  2 khei4 khei4  80 Apr 18 18:04 pulse
srw-rw-rw-  1 khei4 khei4   0 Apr 18 18:04 snapd-session-agent.socket
drwxr-xr-x  6 khei4 khei4 160 Apr 18 18:49 systemd
srwxr-xr-x  1 khei4 khei4   0 Apr 19 07:57 vscode-git-12b25b9439.sock
srwxr-xr-x  1 khei4 khei4   0 Apr 18 22:46 vscode-ipc-2f1f8efb-de93-454c-8e94-1f9928d062b2.sock
srwxr-xr-x  1 khei4 khei4   0 Apr 18 22:39 vscode-ipc-8cbd36d5-3b2a-4e02-9b85-4c506b6be3e4.sock
srwxr-xr-x  1 khei4 khei4   0 Apr 19 07:57 vscode-ipc-9a53415a-c39b-4a39-a971-4f12fed57ce9.sock
srwxr-xr-x  1 khei4 khei4   0 Apr 19 07:56 vscode-ipc-b59dd6fe-d8e1-4a6f-af26-4a86ec114c49.sock
lrwxrwxrwx  1 root  root   31 Apr 18 18:04 wayland-0 -> /mnt/wslg/runtime-dir/wayland-0
lrwxrwxrwx  1 root  root   36 Apr 18 18:04 wayland-0.lock -> /mnt/wslg/runtime-dir/wayland-0.lock
nithin-mk commented 1 month ago

I get the same error when I tried to start a Homebrew service in Ubuntu 22.04 in WSL 2.

❯ brew services start minio --debug
/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/brew.rb (Formulary::FromAPILoader): loading minio
/usr/bin/env /home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/shims/shared/git --version
==> /home/linuxbrew/.linuxbrew/bin/systemctl --user status homebrew.minio

Failed to connect to bus: No such file or directory
Error: Failure while executing; `/home/linuxbrew/.linuxbrew/bin/systemctl --user daemon-reload` exited with 1.
/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/extend/kernel.rb:263:in `safe_system'
/home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/homebrew/homebrew-services/lib/service/services_cli.rb:332:in `install_service_file'
/home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/homebrew/homebrew-services/lib/service/services_cli.rb:112:in `block in start'
/home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/homebrew/homebrew-services/lib/service/services_cli.rb:97:in `each'
/home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/homebrew/homebrew-services/lib/service/services_cli.rb:97:in `start'
/home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/homebrew/homebrew-services/lib/service/commands/start.rb:12:in `run'
/home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/homebrew/homebrew-services/cmd/services.rb:139:in `run'
/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/brew.rb:92:in `<main>'

What's the solution?