nix-community / NixOS-WSL

NixOS on WSL(2) [maintainer=@nzbr]
Apache License 2.0
1.86k stars 118 forks source link

Systemctl with user flag don't work #375

Open aikooo7 opened 10 months ago

aikooo7 commented 10 months ago

Bug description

Using any commands of systemctl with the --user flag like systemctl list-unit-files --user --state=enabled with systemd native support activate gives the error: Failed to connect to bus: No such file or directory

To Reproduce

Steps to reproduce the behavior:

  1. Activate native systemd adding wsl.nativeSystemd = true; to /etc/nixos/configuration.nix
  2. Run systemctl list-unit-files --user --state=enabled

Expected behavior

The command execute without errors.

In the example command it should display all units.

Logs

Similar to #165 doing systemctl list-unit-files --user --state=enabled with native systemd support turned off works perfectly:

UNIT FILE                STATE   PRESET
emacs.service            enabled enabled
nixos-activation.service enabled enabled
dbus.socket              enabled enabled
gpg-agent-ssh.socket     enabled enabled
gpg-agent.socket         enabled enabled

5 unit files listed.

While with native systemd support turned off errors:

Failed to connect to bus: No such file or directory
SuperSandro2000 commented 10 months ago

Are you using WSL2?

aikooo7 commented 10 months ago

I am

nzbr commented 10 months ago

This is a bug/missing feature in WSL. My Ubuntu distro with systemd enabled for example behaves exactly the same. syschdemd included a workaround that made this work, but Microsoft's systemd implementation (which we call native) does not have that

nzbr commented 10 months ago

I'll leave this open, because I do in fact consider this a bug, but in WSL, not here. There might be possible workarounds, but I'd much rather see Microsoft fix it

aikooo7 commented 10 months ago

This is a bug/missing feature in WSL. My Ubuntu distro with systemd enabled for example behaves exactly the same. syschdemd included a workaround that made this work, but Microsoft's systemd implementation (which we call native) does not have that

Alright I will make a issue in wsl repo and keep you/this issue updated

wyndon commented 2 months ago

The issue isn't always present. Right now I do not have the issue, whereas some days ago I was encountering it.

It's worth noting though that services and stuff takes a bit of time to start up (at least for me, even nix-daemon), so by trying to reproduce the issue directly after the shell is available, you'll 100% encounter the issue. Check the startup is actually finished by using htop or similar before attempting to reproduce, you should see a bunch of stuff starting.

paperdev-code commented 2 months ago

Experiencing this, programs.ssh.startAgent wasn't working for me, using nativeSystemd = false; solves the issue, but really isn't ideal...

$ wsl --version WSL version: 2.2.4.0 Kernel version: 5.15.153.1-2 WSLg version: 1.0.61 MSRDC version: 1.2.5326 Direct3D version: 1.611.1-81528511 DXCore version: 10.0.26091.1-240325-1447.ge-release Windows version: 10.0.22631.4037

Is there an active issue for this on the WSL(g?) repo?

nialov commented 2 months ago

Issue for me as well with, e.g., ssh-agent from home-manager. The user systemd unit is not enabled/working/ with native systemd enabled.

systemctl --user
# Failed to connect to bus: No such file or directory
WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.19045.4651
Prince213 commented 1 month ago

Related WSL2 issues:

xieve commented 1 month ago

My workaround is to set this as the shell command in my terminal (run as root):

/run/current-system/sw/bin/zsh -c \
"until [ -S /run/dbus/system_bus_socket ]; \
 do sleep 1; \
done; \
systemctl restart user@1000; \
export DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/1000/bus'; \
exec sudo --preserve-env=DBUS_SESSION_BUS_ADDRESS --user xieve zsh"

It should be possible to incorporate a similar workaround into NixOS-WSL, if that is deemed appropriate. I also tried using login, which would be the "cleaner" version in my eyes, but it re-sets PATH to only include unix binaries.

antoineco commented 1 month ago

Is this still relevant? I just performed a fresh installation of NixOS-WSL with all the default settings (in particular the native systemd integration), and user scope is working without issues:

$ systemctl list-unit-files --user --state=enabled
UNIT FILE                STATE   PRESET
nixos-activation.service enabled enabled
ssh-agent.service        enabled enabled
dbus.socket              enabled enabled
$ systemctl status --user ssh-agent.service
● ssh-agent.service - SSH Agent
     Loaded: loaded (/etc/systemd/user/ssh-agent.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-09-13 13:31:15 UTC; 5min ago
    Process: 288 ExecStartPre=/nix/store/k71apxkm38m3g34k01sb6zhysi0y7gph-coreutils-9.5/bin/rm -f /run/user/1000/ssh-agent (code=exited, status=0/SUCCESS)
    Process: 290 ExecStart=/nix/store/78mv13w9mgh0s0rd7rnr6ff4d7a39bpd-openssh-9.7p1/bin/ssh-agent -a /run/user/1000/ssh-agent (code=exited, status=0/SUCCESS)
   Main PID: 297 (ssh-agent)
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/ssh-agent.service
             └─297 /nix/store/78mv13w9mgh0s0rd7rnr6ff4d7a39bpd-openssh-9.7p1/bin/ssh-agent -a /run/user/1000/ssh-agent

Sep 13 13:31:15 calavera systemd[272]: Starting SSH Agent...
Sep 13 13:31:15 calavera systemd[272]: Started SSH Agent.
> wsl --version
WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.22631.4112
xieve commented 1 month ago

Is this still relevant? I just performed a fresh installation of NixOS-WSL with all the default settings (in particular the native systemd integration), and user scope is working without issues:

$ systemctl list-unit-files --user --state=enabled
UNIT FILE                STATE   PRESET
nixos-activation.service enabled enabled
ssh-agent.service        enabled enabled
dbus.socket              enabled enabled
$ systemctl status --user ssh-agent.service
● ssh-agent.service - SSH Agent
     Loaded: loaded (/etc/systemd/user/ssh-agent.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-09-13 13:31:15 UTC; 5min ago
    Process: 288 ExecStartPre=/nix/store/k71apxkm38m3g34k01sb6zhysi0y7gph-coreutils-9.5/bin/rm -f /run/user/1000/ssh-agent (code=exited, status=0/SUCCESS)
    Process: 290 ExecStart=/nix/store/78mv13w9mgh0s0rd7rnr6ff4d7a39bpd-openssh-9.7p1/bin/ssh-agent -a /run/user/1000/ssh-agent (code=exited, status=0/SUCCESS)
   Main PID: 297 (ssh-agent)
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/ssh-agent.service
             └─297 /nix/store/78mv13w9mgh0s0rd7rnr6ff4d7a39bpd-openssh-9.7p1/bin/ssh-agent -a /run/user/1000/ssh-agent

Sep 13 13:31:15 calavera systemd[272]: Starting SSH Agent...
Sep 13 13:31:15 calavera systemd[272]: Started SSH Agent.
> wsl --version
WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.22631.4112

Yes, this is still relevant (see open issues on WSL). This is a race condition that depends on how long systemd takes to fully start up, which is less for fast and light systems.

go-colin commented 3 weeks ago

My workaround is to set this as the shell command in my terminal (run as root):

/run/current-system/sw/bin/zsh -c \
"until [ -S /run/dbus/system_bus_socket ]; \
 do sleep 1; \
done; \
systemctl restart user@1000; \
export DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/1000/bus'; \
exec sudo --preserve-env=DBUS_SESSION_BUS_ADDRESS --user xieve zsh"

It should be possible to incorporate a similar workaround into NixOS-WSL, if that is deemed appropriate. I also tried using login, which would be the "cleaner" version in my eyes, but it re-sets PATH to only include unix binaries.

Thank you for this. guess when I updated wsl this was the root of the problem I've been running into. Was 95% of the way there of restoring my environment and this was the missing link! Was annoying to have to sudo things that I didn't need to, was breaking a lot of integrations with vscode, dev flakes, docker, etc.

A little annoying to have to run those 3 commands every time I fire up wsl, but it's better than the alternative of disrupting my overall workflow.

🙏

Now if only could resolve this read-only file system error when nixos-rebuild switch. But that's fine, rebuild boot and terminating wsl isn't too painful since not updating often.

------- edit:

I've resorted to disabling nativeSystemd for now as it's just less painful at the moment. This does seem to likely be an upstream issue with the latest WSL updates.

lucdew commented 2 weeks ago

I also face the same issue on WSL. It disappears if I disable native systemd wsl.nativeSystemd = false but the startup takes 40s on my I7 13th Gen, with SSD and plenty of RAM 32 GB and .wslconfig reserving half of the cores and memory.

My wsl version:

WSL version: 2.3.24.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.65
MSRDC version: 1.2.5620
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26100.1-240331-1435.ge-release
Windows version: 10.0.22631.4317

xieve workaround works for me except that it slows down the startup from wsl waiting for the system dbus socket to be created and checking if it is there. It takes as long as the disabled native systemd support. I ran the script with:

wsl -d NixOS -u root  /run/current-system/sw/bin/zsh ... exec sudo --preserve-env=DBUS_SESSION_BUS_ADDRESS --user mysuser zsh"

Another way to somehow workaround the issue is to enable native systemd and also enable user systemd lingering mode for the user by doing so in the NixOS machine configuration:

systemd.tmpfiles.rules = [
    "f /var/lib/systemd/linger/myusername"
  ];

Then when I login quickly as the user. The user's systemd services are not started, but if I wait a couple of seconds more the user systemd eventually starts.

systemctl --user status                                                                                                                                                10m 18.75s
nixos
    State: running 

Edit1: Another drawback of enabling systemd lingering for the user is that when logged in as the user, the sudo command is broken

sudo: effective uid is not 0, is /run/wrappers/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

It is because the symlink /run/wrappers/bin which is in the PATH links to a directory that is not yet created... The latter contains executables with sticky bit enabled like sudo. In my case it is a matter of 1 to 3 seconds. So I either have to start another shell (zsh in my case) or I need to run the rehash command for my zsh shell. It is possible to configure zsh to always rehash on completions.

doronbehar commented 1 week ago

Thanks to everyone for the investigation efforts. I too encountered the mentioned issue:

sudo: effective uid is not 0, is /run/wrappers/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

Just wanted to share my TL;DR, which is to rehash (using ZSH). For most of the times though, the WSL will be running anyway in the background, and this won't be needed. Not only that, I'd get into a Tmux session right afterwards, which will give enough time for the wrappers to be loaded so that I shouldn't notice this issue.