nix-community / home-manager

Manage a user environment using Nix [maintainer=@rycee]
https://nix-community.github.io/home-manager/
MIT License
6.88k stars 1.79k forks source link

bug: sd-switch panic on switch #5025

Closed seqizz closed 4 months ago

seqizz commented 8 months ago

Are you following the right branch?

Is there an existing issue for this?

Issue description

Sorry for another clone of the periodic ghost issue, but after opting to use sd-switch, I see following at the end of all home-manager switch operation:

thread 'main' panicked at src/main.rs:151:6:
Error switching: Process org.freedesktop.systemd1 exited with status 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It is the somewhat known error referred to dbus you see in other issues, but there was no resolution so far & it was magically disappearing on those reported. On my system this is sadly reproducible.

Problem is, the system has an active dbus session (+ no other weirdness as far as I can see). Some information from the system (collected from troubleshooting steps of similarly reported issues e.g.: https://github.com/nix-community/home-manager/issues/371)

Thanks for any tips!

Maintainer CC

No response

System information

- system: `"x86_64-linux"`
 - host os: `Linux 6.7.4, NixOS, 23.11 (Tapir), 23.11.20240211.809cca7`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(root): `"nixos-23.11"`
 - channels(gurkan): `"home-manager-23.11.tar.gz"`
 - nixpkgs: `/etc/nix/path/nixpkgs`
rycee commented 8 months ago

Yeah, there is something amiss in the way sd-switch works. That is the reason why we haven't been able to make it the default. I plan to more or less rewrite it relatively soon and hopefully that should make it more robust. At least it should make it give more useful error messages.

At the moment I'm focusing on getting https://github.com/nix-community/home-manager/pull/5024 and https://github.com/nix-community/home-manager/pull/4976 in. After that I'll get on sd-switch.

rycee commented 7 months ago

@seqizz Would you mind trying out the switch-to-zbus branch of sd-switch and see if it gives a more helpful error message?

If you are using a Nix Flake based setup then you can override the existing sd-switch using an overlay. Similar to how I do it here: https://git.sr.ht/~rycee/configurations/commit/34b13ff0054a8a3a26b5b74b83fd703fbf467de7#flake.nix

seqizz commented 7 months ago

Sadly building it failed with:

       last 10 log lines:
       > Finished cargoSetupPostPatchHook
       > Running phase: updateAutotoolsGnuConfigScriptsPhase
       > Running phase: configurePhase
       > Running phase: buildPhase
       > Executing cargoBuildHook
       > ++ env CC_X86_64_UNKNOWN_LINUX_GNU=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/cc CXX_X86_64_UNKNOWN_LINUX_GNU=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/c++ CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/cc CC_X86_64_UNKNOWN_LINUX_GNU=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/cc CXX_X86_64_UNKNOWN_LINUX_GNU=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/c++ CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/cc CARGO_BUILD_TARGET=x86_64-unknown-linux-gnu HOST_CC=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/cc HOST_CXX=/nix/store/i6zjqpawh725z1lyg3alglzlabnzbjx7-gcc-wrapper-12.3.0/bin/c++ cargo build -j 8 --target x86_64-unknown-linux-gnu --frozen --profile release
       > error: package `zvariant_derive v4.0.0` cannot be built because it requires rustc 1.75 or newer, while the currently active rustc version is 1.73.0
       > Either upgrade to rustc 1.75 or newer, or use
       > cargo update -p zvariant_derive@4.0.0 --precise ver
       > where `ver` is the latest version of `zvariant_derive` supporting rustc 1.73.0

I also gave it unstable repo, but I think I should also override rust with even newer version?

rycee commented 7 months ago

@seqizz Hmm, it should build OK with a recent nixpkgs-unstable, I'm using that in my setup:

nixpkgs-unstable.url = "github:NixOS/nixpkgs/nixpkgs-unstable";

sd-switch = {
  url = "sourcehut:~rycee/sd-switch/switch-to-zbus";
  inputs.nixpkgs.follows = "nixpkgs-unstable";
};

And running inside a nixpkgs-unstable checkout:

$ git log -1
commit f33dd27a47ebdf11dc8a5eb05e7c8fbdaf89e73f (HEAD, origin/nixpkgs-unstable)
Merge: fa15b53dbea5 47abf0334033
Author: Bobby Rong <rjl931189261@126.com>
Date:   Tue Feb 20 13:36:14 2024 +0800

    Merge pull request #288704 from Aleksanaa/cinnamon.cinnamon-control-center

    cinnamon.cinnamon-control-center: fix tls support in online accounts

$ nix run .#rustc -- --version
rustc 1.75.0 (82e1608df 2023-12-21) (built from a source tarball)
rycee commented 7 months ago

You can also let it use its own nixpkgs (i.e., not including the follows line)…

seqizz commented 7 months ago

I am clearly doing something wrong on my flake setup, I even had to fix the cargohash..

I removed the follows line, it has to work since whole idea of flake is to not have this kind of issues: https://git.gurkan.in/gurkan/nixos-system-flake/commit/ac0cbb38055e6376f9cedc7e17fabdad5088fdb6

(btw thanks for taking a stab at this)

rycee commented 7 months ago

Hmm, looks a bit too complicated. I think replacing the whole sd-switch = prev.sd-switch… thing by something like

sd-switch = inputs.sd-switch-src.packages.${final.system}.default

may work better.

seqizz commented 7 months ago

Yep, it did the trick, thanks!

Now I had:

Error: Error switching

Caused by:
    0: Failed to create systemd manager proxy
    1: org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
rycee commented 7 months ago

Thanks! That's quite helpful!

I wonder if the systemd user session is actually listening at all on /tmp/dbus-pCtHZePHSo. Typically I think it prefers /run/user/1000/bus. Could you check if there is a socket file on the /run/user/1000/bus path?

Also, could you check if you have dbus-run-session running? Something like ps ax | grep dbus-run-session.

seqizz commented 7 months ago
~> ps aux | grep run-session
gurkan      2123  0.0  0.0   3968  1920 ?        S    Feb20   0:00 /nix/store/qxvy6vc2x65f1lj49pxvdsnc2y4d6772-dbus-1.14.10/bin/dbus-run-session /nix/store/rz4n14d75fghwdf1l4jn5viri6k4yl4h-myAwesome-master/bin/awesome

Also

~> ls -la /run/user/1000/bus
srw-rw-rw- 1 gurkan gurkan 0 Feb 20 20:27 /run/user/1000/bus=

But at the same time:

~> sudo find /tmp -type s -name "dbus*" -exec ls -la {} \;
srwxrwxrwx 1 gurkan gurkan 0 Feb 20 20:27 /tmp/dbus-H9aaJRBiM5

Not sure why this one exists πŸ€”

Nebucatnetzer commented 7 months ago

I have the same error in my Ubuntu-22.04 WSL but the file doesn't exist

ls: cannot access '/run/user/1000/bus': No such file or directory

And no run-session

zweili@co-ws-con4:~$ ps aux | grep run-session
zweili     32704  0.0  0.0   4028  2128 pts/2    S+   17:33   0:00 grep run-session
rycee commented 7 months ago

Thanks for the feedback! Could you (both) try see where systemd thinks the session bus is? For example, on my system:

$ systemctl --user show-environment | grep DBUS_SESSION_BUS_ADDRESS
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus

Another alternative that directly shows the socket that systemd actually has opened:

$ sudo lsof -a -U -p $(pgrep -U $UID 'systemd')
…
systemd 2269 rycee   10u  unix 0xffffa263cce7dd80      0t0 7662 /run/user/1000/bus type=STREAM (LISTEN)
…
rycee commented 7 months ago

@Nebucatnetzer Could you double check if your zweili user has user ID 1000?

$ echo $UID
1000

$ echo $XDG_RUNTIME_DIR 
/run/user/1000
Nebucatnetzer commented 7 months ago

Sure, no problem. The first command didn't show anything but echo on the variable did work.

❯ systemctl --user show-environment | grep DBUS_SESSION_BUS_ADDRESS

❯ echo $DBUS_SESSION_BUS_ADDRESS
unix:path=/run/user/1000/bus
❯ sudo lsof -a -U -p $(pgrep -U $UID 'systemd')
COMMAND PID   USER   FD   TYPE             DEVICE SIZE/OFF  NODE NAME
systemd 391 zweili    1u  unix 0xffff898e0001bfc0      0t0 77567 type=STREAM
systemd 391 zweili    2u  unix 0xffff898e0001bfc0      0t0 77567 type=STREAM
systemd 391 zweili    3u  unix 0xffff898de0708cc0      0t0 77585 type=DGRAM
systemd 391 zweili   16u  unix 0xffff898de064cc80      0t0 79009 /run/user/1000/systemd/notify type=DGRAM
systemd 391 zweili   17u  unix 0xffff898de064d0c0      0t0 79010 type=DGRAM
systemd 391 zweili   18u  unix 0xffff898dd300a200      0t0 79012 /run/user/1000/systemd/private type=STREAM
systemd 391 zweili   19u  unix 0xffff898dd300a640      0t0 79014 type=STREAM
systemd 391 zweili   21u  unix 0xffff898de064a200      0t0 79011 type=DGRAM
systemd 391 zweili   22u  unix 0xffff898dd3008880      0t0 79024 /run/user/1000/gnupg/S.gpg-agent.ssh type=STREAM
systemd 391 zweili   25u  unix 0xffff898dd300aec0      0t0 79028 /run/user/1000/pk-debconf-socket type=STREAM
systemd 391 zweili   26u  unix 0xffff898dd3009dc0      0t0 79018 /run/user/1000/gnupg/S.dirmngr type=STREAM
systemd 391 zweili   27u  unix 0xffff898dd300b740      0t0 79030 /run/user/1000/snapd-session-agent.socket type=STREAM
systemd 391 zweili   28u  unix 0xffff898dd300aa80      0t0 79020 /run/user/1000/gnupg/S.gpg-agent.browser type=STREAM
systemd 391 zweili   29u  unix 0xffff898dd300c400      0t0 79026 /run/user/1000/gnupg/S.gpg-agent type=STREAM
systemd 391 zweili   30u  unix 0xffff898dd3008cc0      0t0 79022 /run/user/1000/gnupg/S.gpg-agent.extra type=STREAM
❯ echo $UID
1000

~
❯ echo $XDG_RUNTIME_DIR
/run/user/1000/
seqizz commented 7 months ago

On my side:

~> systemctl --user show-environment | grep DBUS_SESSION_BUS_ADDRESS
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus

And

~> sudo lsof -a -U -p $(pgrep -U $UID 'systemd')
...
systemd 1935 gurkan   28u  unix 0xffff9682023c0880      0t0   6812 /run/user/1000/bus type=STREAM (LISTEN)
...
rycee commented 7 months ago

@seqizz Could you try updating the sd-switch flake and see if it makes a difference. I added a commit that makes it prefer /run/user/$UID/bus if it exists, otherwise it uses DBUS_SESSION_BUS_ADDRESS. A bit hacky but it might work πŸ™‚

@Nebucatnetzer Hmm, that is very interesting. You have

$ echo $DBUS_SESSION_BUS_ADDRESS
unix:path=/run/user/1000/bus

but /run/user/1000/bus does not exist? What does busctl --user --list | grep systemd1 say? Do you have the dbus-user-session package installed? Are you able to run regular systemctl commands, like systemctl --user status?

seqizz commented 7 months ago

Yep, no crashes πŸŽ‰

...
Creating home file links in /home/gurkan
Activating onFilesChange
Activating reloadSystemd
Starting units: hm-graphical-session.target, tray.target

So, the problem is that home-manager can't grab DBUS_SESSION_BUS_ADDRESS for some reason? We can dig into it if you'd like to (e.g. dump the environment from sd-switch itself) but since the current workaround works, that's fine for me too. Thanks!

rycee commented 7 months ago

@seqizz I think the issue is that you actually have two user D-Bus sessions. One that is started at login, located at /run/user/1000/bus. This is the one that is used by systemd. My guess is that if you login on the Linux console you will get that in DBUS_SESSION_BUS_ADDRESS.

But when your graphical session starts up, it will also start a new D-Bus session using dbus-run-session, this is the one that ends up at /tmp/dbus-pCtHZePHSo and overwrites the "correct" DBUS_SESSION_BUS_ADDRESS. You can see this in the paste you did earlier:

~> ps aux | grep run-session
gurkan      2123  0.0  0.0   3968  1920 ?        S    Feb20   0:00 /nix/store/qxvy6vc2x65f1lj49pxvdsnc2y4d6772-dbus-1.14.10/bin/dbus-run-session /nix/store/rz4n14d75fghwdf1l4jn5viri6k4yl4h-myAwesome-master/bin/awesome

I think the proper solution is to remove the use of dbus-run-session but for now perhaps the hack I added in sd-switch works. I imagine you are not the only one with this issue.

In principle I think all occurrences of dbus-run-session should be removed from Nixpkgs, except possibly in some test cases.

Edit: To summarize, sd-switch can grab DBUS_SESSION_BUS_ADDRESS just fine, the problem is that it lies and systemd is connected to a different D-Bus address.

Nebucatnetzer commented 7 months ago

@seqizz Could you try updating the sd-switch flake and see if it makes a difference. I added a commit that makes it prefer /run/user/$UID/bus if it exists, otherwise it uses DBUS_SESSION_BUS_ADDRESS. A bit hacky but it might work πŸ™‚

@Nebucatnetzer Hmm, that is very interesting. You have

$ echo $DBUS_SESSION_BUS_ADDRESS
unix:path=/run/user/1000/bus

but /run/user/1000/bus does not exist? What does busctl --user --list | grep systemd1 say? Do you have the dbus-user-session package installed? Are you able to run regular systemctl commands, like systemctl --user status?

zweili@co-ws-con4:~$ busctl --user --list | grep systemd1
Failed to connect to bus: No such file or directory

dbus-user-session is not installed

Systemd works fine as far as I can tell.

zweili@co-ws-con4:~$ systemctl --user status
● co-ws-con4
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Fri 2024-02-23 08:37:23 CET; 1min 28s ago
   CGroup: /user.slice/user-1000.slice/user@1000.service
           β”œβ”€app.slice
           β”‚ β”œβ”€ssh-agent.service
           β”‚ β”‚ └─430 /nix/store/9g3y8bvpp39z5f18v80znnbh49vc281a-openssh-9.6p1/bin/ssh-agent -D -a /run/user/1000/ssh-agent
           β”‚ └─emacs.service
           └─init.scope
rycee commented 7 months ago

@Nebucatnetzer Ok, seems systemctl and systemd uses /run/user/1000/systemd/private to communicate when there is no user D-Bus session available. I think the only way for sd-switch to work on such a system would be to run systemctl commands and parse its output. If you are able to, could you try installing dbus-user-session and see if that helps?

I'm somewhat reluctant to go away from using D-Bus to communicate with systemd since it feels more robust. But maybe to have it as a fallback for systems without D-Bus? πŸ˜•

Nebucatnetzer commented 7 months ago

After installing the package it works fine. πŸ‘

seqizz commented 7 months ago

I think the proper solution is to remove the use of dbus-run-session but for now perhaps the hack I added in sd-switch works. I imagine you are not the only one with this issue.

I tested this properly and you're 100% right. Removed the "dbus-run-session" from xsession.windowManager.command and everything still worked, with single dbus socket. That was an old mistake of mine I assume.

Anyway, thanks again for digging this. And yes, probably other systems will be rescued from similar multi-bus confusion with this check πŸ‘

Since this one is linked to other magically-resolved issues and the workaround will be coming with v0.4.0, feel free to close this issue.

stale[bot] commented 4 months ago

Thank you for your contribution! I marked this issue as stale due to inactivity. Please be considerate of people watching this issue and receiving notifications before commenting 'I have this issue too'. We welcome additional information that will help resolve this issue. Please read the relevant sections below before commenting.

If you are the original author of the issue

* If this is resolved, please consider closing it so that the maintainers know not to focus on this. * If this might still be an issue, but you are not interested in promoting its resolution, please consider closing it while encouraging others to take over and reopen an issue if they care enough. * If you know how to solve the issue, please consider submitting a Pull Request that addresses this issue.

If you are not the original author of the issue

* If you are also experiencing this issue, please add details of your situation to help with the debugging process. * If you know how to solve the issue, please consider submitting a Pull Request that addresses this issue.

Memorandum on closing issues

Don't be afraid to manually close an issue, even if it holds valuable information. Closed issues stay in the system for people to search, read, cross-reference, or even reopen – nothing is lost! Closing obsolete issues is an important way to help maintainers focus their time and effort.

rycee commented 4 months ago

Nixpkgs unstable now has sd-switch version 0.4.0, which hopefully resolves this issue. I'll close, please comment if the issue remains.