microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.23k stars 808 forks source link

WSL login nukes systemd/dbus user session / contents of /run/user/1000 #10205

Open sarim opened 1 year ago

sarim commented 1 year ago

Windows Version

Microsoft Windows [Version 10.0.22621.1778]

WSL Version

1.3.10.0

Are you using WSL 1 or WSL 2?

Kernel Version

5.15.90.2-microsoft-standard-WSL2

Distro Version

Ubuntu 22.04

Other Software

No response

Repro Steps

Make sure guiApplications=true in wslconfig.

  1. Open WSL from Windows Terminal.
  2. Run systemctl --user status
  3. Or Run ls /run/user/1000

Expected Behavior

Excepts systemd user session to be present and systemctl --user to be able connect to it.

This kinda relates to #8842 , but not the same issue. This not the race condition issue, rather WSL login is nuking bus and other sockets in /run/user/1000 directory. Check the attached video demonstration.

If we follow the Following steps:

  1. In Powershell, wsl --shutdown few times to be sure.
  2. wsl -u root -e /bin/bash this makes wsl login to root. So wsl doesn't touch 1000 user (named gittu). this user has linger enabled, so systemd naturally creates the user session.
  3. Here logged in via root, ls /run/user/1000 shows proper bus, systemd etc.. sockets created.
  4. Now open wsl to user gittu (which is default user) by opening a new tab in Windows Terminal.
  5. Observe that [ 29.574698] WSL (2): Creating login session for gittu line appears in dmesg output, confirming that WSL indeed created a user session for gittu, nuking previously good user session created by systemd.
  6. Now output of ls /run/user/1000/ doesn't have bus, systemd etc.. sockets.

Now If I disable wslg, so guiApplications=false. The issue is solved, wsl doesn't nuke contents of /run/user/1000. So from this observation my conclusion is wslg is nuking the contents of /run/user/1000, and only manually putting wslg's files there.

https://github.com/microsoft/WSL/assets/1235888/d32511fa-967f-4337-8da8-f08ea9468856

Actual Behavior

↪ ~ ➤ systemctl --user status
Failed to connect to bus: No such file or directory
↪ ~ ➤ ls /run/user/1000 -l
total 0
drwx------ 3 gittu gittu 60 Jun 16 00:24 dbus-1
drwx------ 2 gittu gittu 80 Jun 16 00:24 pulse
lrwxrwxrwx 1 root  root  31 Jun 16 00:25 wayland-0 -> /mnt/wslg/runtime-dir/wayland-0
-rw-rw---- 1 gittu gittu  0 Jun 16 00:24 wayland-0.lock

Diagnostic Logs

No response

benhillis commented 1 year ago

@OneBlue - another one related to your /run/usr/ change.

OneBlue commented 1 year ago

Thank you for reporting this @sarim.

Interestingly, I can't reproduce the issue. Can you share the output of mount before and after opening WSL with the gittu user ?

WSL does mount an overlayfs on /run/user/X when the session is created, but happens regardless of whether GUI apps are enabled or not so I wonder if there's something else happening here.

sarim commented 1 year ago

.wslconfig

[wsl2]
kernelCommandLine=cgroup_no_v1=all
memory=16GB
swap=0
guiApplications=true
debugConsole=false
#vmIdleTimeout=-1

networkingMode=bridged
vmSwitch=WSLBridged
dhcp=false
macAddress=0E:00:00:00:00:00
ipv6=true

/etc/wsl.conf

[user]
default=gittu
[boot]
systemd=true
[network]
hostname = GITTUW11WSL
generateResolvConf=false

/etc/fstab

cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0

After wsl --shutdown, log into wsl via root user with wsl -u root -e /bin/bash. Then mount and save output. Then I open a new tab in terminal, it logs into gittu user. Run mount and save output. The two txt are attached. Here's the diff.

diff mount-before.txt mount-after.txt

41a42,45
> none on /mnt/wslg/run/user/1000/rw type tmpfs (rw,relatime)
> none on /run/user/1000/rw type tmpfs (rw,relatime)
> none on /mnt/wslg/run/user/1000 type overlay (rw,relatime,lowerdir=/mnt/wslg/runtime-dir,upperdir=/mnt/wslg/run/user/1000/rw/upper,workdir=/mnt/wslg/run/user/1000/rw/work)
> none on /run/user/1000 type overlay (rw,relatime,lowerdir=/mnt/wslg/runtime-dir,upperdir=/mnt/wslg/run/user/1000/rw/upper,workdir=/mnt/wslg/run/user/1000/rw/work)

mount-after.txt mount-before.txt

Btw if I run sudo systemctl restart user@1000, the /run/user/1000/ directory gets restored. Both wslg sockets and systemd sockets and other files are now present here. Thats what I've been doing, after starting wsl, I run the command once. Let me know if you need any more info @OneBlue

sarim commented 1 year ago

Also this might also be relevant.

root@GITTUW11WSL:~# ls /run/user/1000/rw
ls: cannot access '/run/user/1000/rw': No such file or directory
root@GITTUW11WSL:~# ls /mnt/wslg/run/user/1000/rw
ls: cannot access '/mnt/wslg/run/user/1000/rw': No such file or directory
root@GITTUW11WSL:~# ls /mnt/wslg/run/user/1000
dbus-1  pulse  wayland-0  wayland-0.lock

Edit:

WSL does mount an overlayfs on /run/user/X when the session is created, but happens regardless of whether GUI apps are enabled or not so I wonder if there's something else happening here.

Umm I don't understand, when guiApplications=false, the "system"/"wslg" distro doesn't get launched. This "/mnt/wslg" directory is shared with that system distro right? So the behavior is definitely changing from how much I understand.

Below outputs are when guiApplications=false. Notice the output of mount is absent of any /mnt/wslg related entry. Though I don't understand how /mnt/wslg directory is created now as there's no such entry in mount output.

↪ ~ ➤ sudo tree /mnt/wslg/
/mnt/wslg/
└── run
    └── user
        └── 1000

3 directories, 0 files
↪ ~ ➤ ls /run/user/1000/
bus  dbus-1  gnupg  pipewire-0  pipewire-0.lock  pk-debconf-socket  podman  systemd
↪ ~ ➤ mount
none on /mnt/wsl type tmpfs (rw,relatime)
none on /usr/lib/wsl/drivers type 9p (ro,nosuid,nodev,noatime,dirsync,aname=drivers;fmask=222;dmask=222,mmap,access=client,msize=65536,trans=fd,rfd=7,wfd=7)
/dev/sdb on / type ext4 (rw,relatime,discard,errors=remount-ro,data=ordered)
none on /usr/lib/wsl/lib type overlay (rw,nosuid,nodev,noatime,lowerdir=/gpu_lib_packaged:/gpu_lib_inbox,upperdir=/gpu_lib/rw/upper,workdir=/gpu_lib/rw/work)
rootfs on /init type rootfs (ro,size=8186832k,nr_inodes=2046708)
none on /dev type devtmpfs (rw,nosuid,relatime,size=8186860k,nr_inodes=2046715,mode=755)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
devpts on /dev/pts type devpts (rw,nosuid,noexec,noatime,gid=5,mode=620,ptmxmode=000)
none on /run type tmpfs (rw,nosuid,nodev,mode=755)
none on /run/lock type tmpfs (rw,nosuid,nodev,noexec,noatime)
none on /run/shm type tmpfs (rw,nosuid,nodev,noatime)
none on /dev/shm type tmpfs (rw,nosuid,nodev,noatime)
none on /run/user type tmpfs (rw,nosuid,nodev,noexec,noatime,mode=755)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
drvfs on /mnt/c type 9p (rw,noatime,dirsync,aname=drvfs;path=C:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=262144,trans=virtio)
drvfs on /mnt/d type 9p (rw,noatime,dirsync,aname=drvfs;path=D:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=262144,trans=virtio)
drvfs on /mnt/e type 9p (rw,noatime,dirsync,aname=drvfs;path=E:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=262144,trans=virtio)
drvfs on /mnt/h type 9p (rw,noatime,dirsync,aname=drvfs;path=H:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=262144,trans=virtio)
drvfs on /mnt/i type 9p (rw,noatime,dirsync,aname=drvfs;path=I:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=262144,trans=virtio)
/dev/sdb on /run/user type ext4 (rw,relatime,discard,errors=remount-ro,data=ordered)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/qemu type tmpfs (rw,nosuid,nodev,relatime,mode=755)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=1638024k,nr_inodes=409506,mode=700,uid=1000,gid=1000)
higordearaujo-bsy commented 1 year ago

For those that need guiApplications=true, one of these might fix things for you until someone gets this fixed in WSL:

sudo systemctl restart user@1000

or

mkdir ~/run_user_1000
mv /run/user/1000/* ~/run_user_1000/
sudo umount /run/user/1000
mv ~/run_user_1000/* /run/user/1000/
rm -rf ~/run_user_1000/
sarim commented 1 year ago

I just wrote and add this to my bashrc to restart user session. the command is allowed in sudoers file.

function check_and_restart_session {
    # Check if "/run/user/1000/bus" exists
    if [ -e "/run/user/1000/bus" ]; then
        return 0
    fi

    # Try to avoid race condition
    sleep 0.$(( ( RANDOM % 300 ) + 50 ))

    # Check if "/tmp/gittuRestartSession" exists
    if [ -e "/tmp/gittuRestartSession" ]; then
        echo "/tmp/gittuRestartSession exists"
        return 0
    fi

    # If neither condition is true, restart the session
    touch /tmp/gittuRestartSession
    sudo /usr/bin/systemctl restart user@1000.service
    echo "Restart User Session"
}
jakebailey commented 11 months ago

This just started happening to me today, where I actually have a working system for a bit but then it breaks. In journalctl -b0 I can see:

Sep 19 12:14:55 JABAILE-DESK02 systemd[1]: dmesg.service: Deactivated successfully.
Sep 19 12:14:55 JABAILE-DESK02 sudo[736]:  jabaile : TTY=pts/0 ; PWD=/home/jabaile ; USER=root ; COMMAND=/usr/bin/ls -lh /run/user
Sep 19 12:14:55 JABAILE-DESK02 sudo[736]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Sep 19 12:14:55 JABAILE-DESK02 sudo[736]: pam_unix(sudo:session): session closed for user root
Sep 19 12:15:05 JABAILE-DESK02 sudo[747]:  jabaile : TTY=pts/0 ; PWD=/home/jabaile ; USER=root ; COMMAND=/usr/bin/ls -lh /run/user
Sep 19 12:15:05 JABAILE-DESK02 sudo[747]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Sep 19 12:15:05 JABAILE-DESK02 sudo[747]: pam_unix(sudo:session): session closed for user root
Sep 19 12:15:09 JABAILE-DESK02 systemd[1]: systemd-timedated.service: Deactivated successfully.
Sep 19 12:15:11 JABAILE-DESK02 sudo[773]:  jabaile : TTY=pts/0 ; PWD=/home/jabaile ; USER=root ; COMMAND=/usr/bin/ls -lh /run/user
Sep 19 12:15:11 JABAILE-DESK02 sudo[773]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Sep 19 12:15:11 JABAILE-DESK02 sudo[773]: pam_unix(sudo:session): session closed for user root
Sep 19 12:15:22 JABAILE-DESK02 kernel: hv_balloon: Max. dynamic memory size: 32652 MB
Sep 19 12:16:37 JABAILE-DESK02 systemd-networkd-wait-online[187]: Timeout occurred while waiting for network connectivity.
Sep 19 12:16:37 JABAILE-DESK02 systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 12:16:37 JABAILE-DESK02 systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Sep 19 12:16:37 JABAILE-DESK02 systemd[1]: Failed to start Wait for Network to be Configured.

I'm running sudo ls -lh /run/user and the first ones show 1000. But then system finishes starting up and then everything that uses this dir breaks.

Running sudo systemctl restart user@1000 works, though this means I have to wait for the failure to occur and then run that command and ensure I have restarted all of my terminals, as I have various programs like fnm which use that dir.

I thought this might be due to 2.0.0.0, but I don't have it yet:

WSL version: 1.2.2.0
Kernel version: 5.15.90.1

Perhaps this is a different issue than this thread; apologies if it is, but this is perfectly reproducible for me so I'm totally happy to try things out.

thwint commented 11 months ago

Not sure if it is the same issue I see. Right after startup is finished WSL seems to stop the User Manager. This is also visible in syslog:

Sep 21 07:32:51 w00wmi systemd[1]: Startup finished in 2min 1.606s.
Sep 21 07:33:00 w00wmi systemd[1]: Stopping User Manager for UID 1000...
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped target Main User Target.
Sep 21 07:33:00 w00wmi systemd[1828]: Stopping D-Bus User Message Bus...
Sep 21 07:33:00 w00wmi systemd[1828]: Stopping PipeWire Media Session Manager...
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped D-Bus User Message Bus.
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped PipeWire Media Session Manager.
Sep 21 07:33:00 w00wmi systemd[1828]: Stopping PipeWire Multimedia Service...
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped PipeWire Multimedia Service.
Sep 21 07:33:00 w00wmi systemd[1828]: Removed slice User Core Session Slice.
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped target Basic System.
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped target Paths.
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped target Sockets.
Sep 21 07:33:00 w00wmi systemd[1828]: Stopped target Timers.
Sep 21 07:33:00 w00wmi systemd[1828]: Closed D-Bus User Message Bus Socket.
Sep 21 07:33:00 w00wmi systemd[1828]: Closed GnuPG network certificate management daemon.
Sep 21 07:33:00 w00wmi systemd[1828]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Sep 21 07:33:00 w00wmi systemd[1828]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Sep 21 07:33:00 w00wmi systemd[1828]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Sep 21 07:33:00 w00wmi systemd[1828]: Closed GnuPG cryptographic agent and passphrase cache.
Sep 21 07:33:00 w00wmi systemd[1828]: Closed PipeWire Multimedia System Socket.
Sep 21 07:33:00 w00wmi systemd[1828]: Closed debconf communication socket.
Sep 21 07:33:00 w00wmi systemd[1828]: Closed REST API socket for snapd user session agent.
Sep 21 07:33:00 w00wmi systemd[1828]: Removed slice User Application Slice.
Sep 21 07:33:00 w00wmi systemd[1828]: Reached target Shutdown.
Sep 21 07:33:00 w00wmi systemd[1828]: Finished Exit the Session.
Sep 21 07:33:00 w00wmi systemd[1828]: Reached target Exit the Session.
Sep 21 07:33:00 w00wmi systemd[1]: user@1000.service: Deactivated successfully.
Sep 21 07:33:00 w00wmi systemd[1]: user@1000.service: Deactivated successfully.
Sep 21 07:33:00 w00wmi systemd[1]: Stopped User Manager for UID 1000.
Sep 21 07:33:00 w00wmi systemd[1]: Stopping User Runtime Directory /run/user/1000...
Sep 21 07:33:00 w00wmi systemd[1]: run-user-1000.mount: Deactivated successfully.
Sep 21 07:33:00 w00wmi systemd[1]: user-runtime-dir@1000.service: Deactivated successfully.
Sep 21 07:33:00 w00wmi systemd[1]: Stopped User Runtime Directory /run/user/1000.
Sep 21 07:33:00 w00wmi systemd[1]: Removed slice User Slice of UID 1000.

After this, when running sudo systemctl restart user@1000 once it is never stopped again until I restart WSL. I am not sure when exactly it started in my case, but it worked before.

WSL version: 1.2.5.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.3448
thwint commented 11 months ago

Just found another interesting article: https://serverfault.com/questions/1139283/systemd-stops-user-manager-and-kills-all-user-processes

In my case I am running Ubuntu 22.04. After enabling lingering for my user it does not seem to be stopped anymore.

So my workaround: loginctl enable-linger 1000

jonathan-f-silva commented 9 months ago
sudo systemctl restart user@1000

This solved to me a Failed to connect to bus: No such file or directory issue. Using guiApplications=false here. Thanks @higordearaujo-bsy! 👍

bgupta commented 2 months ago

Workaround for WSL2 Ubuntu 22.04 Systemd Issue

For anyone running WSL2 Ubuntu 22.04 using systemd and encountering this issue, I found a very simple workaround.

  1. sudo loginctl enable-linger $USER.
  2. Disable systemd in /etc/wsl.conf.
  3. Restart WSL using wsl --shutdown.
  4. Re-enable systemd in /etc/wsl.conf.
  5. Restart WSL again.

Note that I had enabled systemd while running 20.04 and it was working fine. This issue only surfaced when I did an in-place upgrade to 22.04. I'll note that before I figured this out, running sudo systemctl restart user@1000 did fix it, but I had to run it every time I restarted the WSL VM. However, enabling linger alone was having no effect.

I guess "Have you tried turning it off and on again?" never stops being sage advice.

polyzen commented 1 month ago

Just found another interesting article: https://serverfault.com/questions/1139283/systemd-stops-user-manager-and-kills-all-user-processes

In my case I am running Ubuntu 22.04. After enabling lingering for my user it does not seem to be stopped anymore.

So my workaround: loginctl enable-linger 1000

For me this started occurring after disabling GUI application support via the WSL Settings app. Enabling linger for the user seems to have resolved the issue.

Edit: I also disabled Hyper-V Firewall at the same time, but based on the comments in this thread, I figure that's unrelated.

Stanzilla commented 1 month ago

Also see https://github.com/microsoft/WSL/issues/8879 for more systemd issues

ELISSAWII commented 1 month ago

Windows Version

Microsoft Windows [Version 10.0.22621.1778]

WSL Version

1.3.10.0

Are you using WSL 1 or WSL 2?

  • [x] WSL 2
  • [ ] WSL 1

Kernel Version

5.15.90.2-microsoft-standard-WSL2

Distro Version

Ubuntu 22.04

Other Software

No response

Repro Steps

Make sure guiApplications=true in wslconfig.

  1. Open WSL from Windows Terminal.
  2. Run systemctl --user status
  3. Or Run ls /run/user/1000

Expected Behavior

Excepts systemd user session to be present and systemctl --user to be able connect to it.

This kinda relates to #8842 , but not the same issue. This not the race condition issue, rather WSL login is nuking bus and other sockets in /run/user/1000 directory. Check the attached video demonstration.

If we follow the Following steps:

  1. In Powershell, wsl --shutdown few times to be sure.
  2. wsl -u root -e /bin/bash this makes wsl login to root. So wsl doesn't touch 1000 user (named gittu). this user has linger enabled, so systemd naturally creates the user session.
  3. Here logged in via root, ls /run/user/1000 shows proper bus, systemd etc.. sockets created.
  4. Now open wsl to user gittu (which is default user) by opening a new tab in Windows Terminal.
  5. Observe that [ 29.574698] WSL (2): Creating login session for gittu line appears in dmesg output, confirming that WSL indeed created a user session for gittu, nuking previously good user session created by systemd.
  6. Now output of ls /run/user/1000/ doesn't have bus, systemd etc.. sockets.

Now If I disable wslg, so guiApplications=false. The issue is solved, wsl doesn't nuke contents of /run/user/1000. So from this observation my conclusion is wslg is nuking the contents of /run/user/1000, and only manually putting wslg's files there.

Screenshot.2023-06-16.00.24.23.mp4

Actual Behavior

↪ ~ ➤ systemctl --user status
Failed to connect to bus: No such file or directory
↪ ~ ➤ ls /run/user/1000 -l
total 0
drwx------ 3 gittu gittu 60 Jun 16 00:24 dbus-1
drwx------ 2 gittu gittu 80 Jun 16 00:24 pulse
lrwxrwxrwx 1 root  root  31 Jun 16 00:25 wayland-0 -> /mnt/wslg/runtime-dir/wayland-0
-rw-rw---- 1 gittu gittu  0 Jun 16 00:24 wayland-0.lock

Diagnostic Logs

No response

Hi My response is not related to your problem... But I got a problem running one of the scripts cuz I'm totally noob

How can I get root login access in windows! ![Uploading SharedScreenshot.jpg…]()

maxb commented 3 weeks ago

I've just run across this myself in a newly installed Ubuntu 22.04 setup. It's a pretty big out-of-the-box flaw for anyone who actually wants to use user systemd. Is anyone working on it on the WSL team?

TTcheng commented 6 days ago

Workaround for WSL2 Ubuntu 22.04 Systemd Issue

For anyone running WSL2 Ubuntu 22.04 using systemd and encountering this issue, I found a very simple workaround.

1. `sudo loginctl enable-linger $USER`.

2. Disable systemd in `/etc/wsl.conf`.

3. Restart WSL using `wsl --shutdown`.

4. Re-enable systemd in `/etc/wsl.conf`.

5. Restart WSL again.

Note that I had enabled systemd while running 20.04 and it was working fine. This issue only surfaced when I did an in-place upgrade to 22.04. I'll note that before I figured this out, running sudo systemctl restart user@1000 did fix it, but I had to run it every time I restarted the WSL VM. However, enabling linger alone was having no effect.

I guess "Have you tried turning it off and on again?" never stops being sage advice.

sudo loginctl enable-linger $USER works for me, also works after restart wsl