nix-community / home-manager

Manage a user environment using Nix [maintainer=@rycee]
https://nix-community.github.io/home-manager/
MIT License
7.13k stars 1.84k forks source link

bug: `home-manager-$user.service` fails on boot - "Failed to connect to bus: No medium found" #3172

Open oati opened 2 years ago

oati commented 2 years ago

Are you following the right branch?

Is there an existing issue for this?

Issue description

I started noticing that home manager was failing on my boot screen. Restarting the service does not result in any failures.

○ home-manager-erin.service - Home Manager environment for erin
     Loaded: loaded (/etc/systemd/system/home-manager-erin.service; enabled; preset: enabled)
     Active: inactive (dead) since Mon 2022-08-22 08:48:42 CDT; 23min ago
   Main PID: 1849 (code=exited, status=1/FAILURE)
        CPU: 906ms

Aug 22 08:48:41 erin-laptop hm-activate-erin[2486]: dbus-daemon[2486]: [session uid=1000 pid=2486] Successfully activated service 'ca.desrt.dconf'
Aug 22 08:48:41 erin-laptop hm-activate-erin[1849]: Activating linkGeneration
Aug 22 08:48:41 erin-laptop hm-activate-erin[1849]: Cleaning up orphan links from /home/erin
Aug 22 08:48:41 erin-laptop hm-activate-erin[1849]: Creating profile generation 1146
Aug 22 08:48:41 erin-laptop hm-activate-erin[1849]: Creating home file links in /home/erin
Aug 22 08:48:42 erin-laptop hm-activate-erin[1849]: Activating onFilesChange
Aug 22 08:48:42 erin-laptop hm-activate-erin[2885]: Failed to connect to bus: No medium found
Aug 22 08:48:42 erin-laptop systemd[1]: home-manager-erin.service: Main process exited, code=exited, status=1/FAILURE
Aug 22 08:48:42 erin-laptop systemd[1]: home-manager-erin.service: Failed with result 'exit-code'.
Aug 22 08:48:42 erin-laptop systemd[1]: Failed to start Home Manager environment for erin.

I'm running tmpfs on root, which might be a part of why it fails every time.

Maintainer CC

No response

System information

this path will be fetched (0.06 MiB download, 0.30 MiB unpacked):
  /nix/store/425q8ya3a22qjinlipsxw66jwsi4wqa2-bash-interactive-5.1-p16-dev
copying path '/nix/store/425q8ya3a22qjinlipsxw66jwsi4wqa2-bash-interactive-5.1-p16-dev' from 'https://cache.nixos.org'...
 - system: `"x86_64-linux"`
 - host os: `Linux 5.18.18, NixOS, 22.11 (Raccoon), 22.11.20220819.495b19d`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.10.3`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/store/gxyn8m9xsbl6w1904ig697x2d7w0hhgz-source`
oati commented 2 years ago

Maybe this is as simple as adding dbus.service as a dependency to the systemd service declaration in nixos/default.nix.

Also, if dbus is a dependency of home-manager, shouldn't it be setting services.dbus.enable = true in nixos?

berbiche commented 2 years ago

Hi,

Does the error reproduce reliably?

oati commented 2 years ago

As far as I've tested, it happens on every boot.

oati commented 2 years ago

This might be a race condition between the (system) home manager service and the user dbus service.

somasis commented 2 years ago

I think I've experienced this issue in the past due to sloppy home.activation hooks.

If memory serves, it was something like...

  home.activation."autorandr" = lib.hm.dag.entryAfter [ "writeBoundary" ] ''
    if ${pkgs.systemd}/bin/systemctl --user is-active -q graphical-session.target; then
        $DRY_RUN_CMD ${hook} -c || :
    fi
  '';

but

  home.activation."autorandr" = lib.hm.dag.entryAfter [ "writeBoundary" ] ''
    if $DRY_RUN_CMD ${pkgs.systemd}/bin/systemctl --user is-active -q graphical-session.target; then
        $DRY_RUN_CMD ${hook} -c || :
    fi

would not.

NickCao commented 2 years ago

This might be a race condition between the (system) home manager service and the user dbus service.

This is indeed the case, hm-set-env https://github.com/nix-community/home-manager/blob/586ac1fd58d2de10b926ce3d544b3179891e58cb/nixos/default.nix#L66 imports dbus related environment variables from the system user session, thus the user dbus services has to be active before hm get activated.

Myaats commented 2 years ago

I have a similar error on unstable where it complains about dbus-daemon (executable or socket?) not found, instead of no medium during the dconfSettings activation script.

Nov 13 11:11:10 shun systemd[1]: Starting Home Manager environment for mats...
Nov 13 11:11:10 shun hm-activate-mats[703]: Starting Home Manager activation
Nov 13 11:11:10 shun hm-activate-mats[703]: Activating checkFilesChanged
Nov 13 11:11:10 shun hm-activate-mats[703]: Activating checkLinkTargets
Nov 13 11:11:11 shun hm-activate-mats[703]: Activating writeBoundary
Nov 13 11:11:11 shun hm-activate-mats[703]: Activating installPackages
Nov 13 11:11:11 shun hm-activate-mats[703]: Activating dconfSettings
Nov 13 11:11:11 shun hm-activate-mats[1104]: dbus-run-session: failed to execute message bus daemon 'dbus-daemon': No such file or directory
Nov 13 11:11:11 shun hm-activate-mats[1102]: dbus-run-session: EOF reading address from bus daemon
Nov 13 11:11:11 shun systemd[1]: home-manager-mats.service: Main process exited, code=exited, status=127/n/a
Nov 13 11:11:11 shun systemd[1]: home-manager-mats.service: Failed with result 'exit-code'.
Nov 13 11:11:11 shun systemd[1]: Failed to start Home Manager environment for mats.

I do have systemd in initrd enabled to allow TPM based LUKS unlocking which may be related to the issue.

austinbutler commented 2 years ago

I also get failed to execute message bus daemon 'dbus-daemon': No such file or directory. It happens during a nixos-rebuild switch for the other user on my laptop. I use encryption but it's not automatically unlocked via TPM.

NomisIV commented 2 years ago

I get the exact same error as @Myaats on boot, but the service can be started manually after. So it sounds like a race condition.

NomisIV commented 2 years ago

I think this PR might be related, and could solve the problem: https://github.com/nix-community/home-manager/pull/3405

NickCao commented 2 years ago

The race is inherent as dbus is started by the respective systemd user session, while home-manager activation is not.

stale[bot] commented 1 year ago

Thank you for your contribution! I marked this issue as stale due to inactivity. Please be considerate of people watching this issue and receiving notifications before commenting 'I have this issue too'. We welcome additional information that will help resolve this issue. Please read the relevant sections below before commenting.

If you are the original author of the issue

* If this is resolved, please consider closing it so that the maintainers know not to focus on this. * If this might still be an issue, but you are not interested in promoting its resolution, please consider closing it while encouraging others to take over and reopen an issue if they care enough. * If you know how to solve the issue, please consider submitting a Pull Request that addresses this issue.

If you are not the original author of the issue

* If you are also experiencing this issue, please add details of your situation to help with the debugging process. * If you know how to solve the issue, please consider submitting a Pull Request that addresses this issue.

Memorandum on closing issues

Don't be afraid to manually close an issue, even if it holds valuable information. Closed issues stay in the system for people to search, read, cross-reference, or even reopen – nothing is lost! Closing obsolete issues is an important way to help maintainers focus their time and effort.

testplayername commented 8 months ago

I am getting the same error as the original issue on NixOS:

> journalctl --boot --no-pager --unit home-manager-mycoolusername.service
Mar 19 14:08:09 nixosmachine systemd[1]: Starting Home Manager environment for mycoolusername...
Mar 19 14:08:09 nixosmachine hm-activate-mycoolusername[1052]: Starting Home Manager activation
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating checkFilesChanged
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating checkLinkTargets
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating unmountPersistentStoragePaths
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating createAndMountPersistentStoragePaths
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating writeBoundary
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating createGpgHomedir
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating linkGeneration
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Creating profile generation 1
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Creating home file links in /home/mycoolusername
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1052]: Activating batCache
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1717]: No themes were found in '/home/mycoolusername/.config/bat/themes', using the default set
Mar 19 14:08:10 nixosmachine hm-activate-mycoolusername[1717]: No syntaxes were found in '/home/mycoolusername/.config/bat/syntaxes', using the default set.
Mar 19 14:08:11 nixosmachine hm-activate-mycoolusername[1717]: Writing theme set to /home/mycoolusername/.cache/bat/themes.bin ... okay
Mar 19 14:08:11 nixosmachine hm-activate-mycoolusername[1717]: Writing syntax set to /home/mycoolusername/.cache/bat/syntaxes.bin ... okay
Mar 19 14:08:11 nixosmachine hm-activate-mycoolusername[1717]: Writing metadata to folder /home/mycoolusername/.cache/bat ... okay
Mar 19 14:08:11 nixosmachine hm-activate-mycoolusername[1052]: Activating createXdgUserDirectories
Mar 19 14:08:11 nixosmachine hm-activate-mycoolusername[1052]: Activating installPackages
Mar 19 14:08:11 nixosmachine hm-activate-mycoolusername[1052]: Activating onFilesChange
Mar 19 14:08:11 nixosmachine systemctl[1779]: Failed to connect to bus: No medium found
Mar 19 14:08:11 nixosmachine systemd[1]: home-manager-mycoolusername.service: Main process exited, code=exited, status=1/FAILURE
Mar 19 14:08:11 nixosmachine systemd[1]: home-manager-mycoolusername.service: Failed with result 'exit-code'.
Mar 19 14:08:11 nixosmachine systemd[1]: Failed to start Home Manager environment for mycoolusername.
Mar 19 14:08:11 nixosmachine systemd[1]: home-manager-mycoolusername.service: Consumed 1.514s CPU time, no IP traffic.

My configuration is similar to that of @oati and the symptoms are the same:

Other notes:

It does look like a race of some sort, which becomes deterministic with root on tmpfs for some reason. The error still shows up when a very long sleep is inserted right after onFilesChange in Home Manager's activation script:

# Same error appears with this in the user's Home Manger configuration
home.activation = {
    sleepyTime = inputs.home-manager.lib.hm.dag.entryBetween ["onFilesChange"] ["installPackages"] ''
        echo "Sleeping..."
        sleep 100
    '';
};

The race is inherent as dbus is started by the respective systemd user session, while home-manager activation is not.

If this is the case, the contents of the onFilesChange section in the Home Manager activation script (a systemd system service) would need to be run after the user's DBus service is ready (a systemd user service). A straightforward solution would be to move onFilesChange logic to its own systemd user service, and set it to run after the user DBus service, but is that possible or desired? I am unsure if that change might have side effects or break something else.

arthsmn commented 7 months ago

I'm able to reproduce this error using a normal installation (without tmpfs). I'm using dbus-broker as the dbus implementation. I also have apparmor enabled and I'm setting services.dbus.apparmor = "enabled".

testplayername commented 7 months ago

Doing some more digging and testing for my specific issue, I found the lines in my Home Manager configuration that causes the Failed to connect to bus: No medium found error:

xdg.configFile."fnott/fnott.ini" = {
    onChange = "${inputs.nixpkgs.systemd}/bin/systemctl --user restart fnott";
};

xdg.configFile."path/to/home-manager-managed/file".onChange = "${nixpkgs.systemd}/bin/systemctl --user <any subcommand>" causes the error in my config. The issue is in the Home Manager activation script and the effects of three sections inside:

  1. DAG entry checkFilesChanged always marks $HOME/fnott/fnott.ini as changed, because the machine has impertinent /home and $HOME/fnott/fnott.ini does not exist yet. This is important in a later step.

Relevent code in the Home Manager activation script:

_iNote "Activating %s" "checkFilesChanged"
function _cmp() {
  if [[ -d $1 && -d $2 ]]; then
    diff -rq "$1" "$2" &> /dev/null
  else
    cmp --quiet "$1" "$2"
  fi
}
declare -A changedFiles
_cmp '/nix/store/zgk3xkxn2pqqdwf1v18a7l89ys0fxi5r-fnott.ini' '/home/mycoolusername'/'.config/fnott/fnott.ini' \
  && changedFiles['.config/fnott/fnott.ini']=0 \
  || changedFiles['.config/fnott/fnott.ini']=1
unset -f _cmp

The important line is changedFiles['.config/fnott/fnott.ini']=1 which always runs.

  1. $HOME/fnott/fnott.ini is actually created later, in DAG entry linkGeneration, but this is too late because the file is already marked as changed.

  2. Later, in DAG entry onFilesChange, the systemctl --user command runs and fails because the systemd user session has not started yet. Note that the systemctl --user command is inside an if condition ${changedFiles['.config/fnott/fnott.ini']} == 1, but that condition is always true because of changedFiles['.config/fnott/fnott.ini']=1 from DAG entry checkFilesChanged earlier.

_iNote "Activating %s" "onFilesChange"
if (( ${changedFiles['.config/fnott/fnott.ini']} == 1 )); then
  if [[ -v DRY_RUN || -v VERBOSE ]]; then
    echo "Running onChange hook for" '.config/fnott/fnott.ini'
  fi
  if [[ ! -v DRY_RUN ]]; then
    /nix/store/4npvfi1zh3igsgglxqzwg0w7m2h7sr9b-systemd-255.4/bin/systemctl --user restart fnott
  fi
fi

Two possible solutions:

  1. Deal with the absense of the systemd user session: run the systemctl --user command only if the systemd user daemon is running. This does mean the command will never run at boot, even if the fnott config file changed for any reason, but this is perfectly fine because fnott has not even started yet, and fnott will read the config file when it does start later. Simple enough to implement, but feels like a workaround because the second solution is better:
    # The commands that check whether the systemd user daemon is running are inspired by https://github.com/nix-community/home-manager/blob/057117a401a34259c9615ce62218aea7afdee4d3/modules/systemd.nix#L343-L350
    xdg.configFile."fnott/fnott.ini" = {
        onChange = let
            ensureRuntimeDir = "XDG_RUNTIME_DIR=\${XDG_RUNTIME_DIR:-/run/user/$(id -u)}";
            systemctlUserCmd = "${ensureRuntimeDir} ${inputs.nixpkgs.systemd}/bin/systemctl --user";

            realCmd = "${systemctlUserCmd} restart fnott";

        in ''
            systemdUserStatus=$(${systemctlUserCmd} is-system-running 2>&1 || true)

            if [[ $systemdUserStatus == 'running' || $systemdUserStatus == 'degraded' ]]; then
                ${realCmd}
            else
                echo "systemd user session not running; skipping fnott config reload"
            fi
        '';
    };
  1. Deal with impertinent file always being marked as changed: move the onFilesChange logic to a systemd user service. The new user service will certainly activate after the user systemd daemon is running, and can be set to activate after all impermanence files/folders are linked. This is the better solution, attacking closer to the root cause, but is more complicated to implement. This issue has discussion about doing this, and this pull request looks promising.
theRoboxx commented 4 months ago

I'm also experience this issue but only use the xdg-utils as a package without extra configuration. Manual restarting the service works without a problem.

Version: NixOS 24.11 unstable

$ journalctl -eu home-manager-usyer.service
Jun 30 17:55:54 latopler hm-activate-usyer[1172]: Starting Home Manager activation
Jun 30 17:55:54 latopler hm-activate-usyer[1172]: Activating checkFilesChanged
Jun 30 17:55:54 latopler hm-activate-usyer[1172]: Activating checkLinkTargets
Jun 30 17:55:54 latopler hm-activate-usyer[1172]: Activating writeBoundary
Jun 30 17:55:54 latopler hm-activate-usyer[1172]: Activating installPackages
Jun 30 17:55:54 latopler hm-activate-usyer[1172]: Activating dconfSettings
Jun 30 17:55:55 latopler hm-activate-usyer[1633]: dbus-daemon[1633]: [session uid=1000 pid=1633] Activating service name='ca.desrt.dconf' requested by ':1.0' (uid=1000 pid=1637 comm="/nix/store/l93v1f908siryjwp0s5b>
Jun 30 17:55:55 latopler hm-activate-usyer[1633]: dbus-daemon[1633]: [session uid=1000 pid=1633] Successfully activated service 'ca.desrt.dconf'
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: Activating linkGeneration
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: Cleaning up orphan links from /home/usyer
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: No change so reusing latest profile generation 276
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: Creating home file links in /home/usyer
Jun 30 17:55:52 latopler systemd[1]: Starting Home Manager environment for usyer...
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: Activating onFilesChange
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: Activating reloadSystemd
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: User systemd daemon not running. Skipping reload.
Jun 30 17:55:55 latopler hm-activate-usyer[1172]: Activating setupEtc
Jun 30 17:55:55 latopler systemctl[2275]: Failed to connect to bus: No medium found
Jun 30 17:55:55 latopler systemd[1]: home-manager-usyer.service: Main process exited, code=exited, status=1/FAILURE
Jun 30 17:55:55 latopler systemd[1]: home-manager-usyer.service: Failed with result 'exit-code'.
Jun 30 17:55:55 latopler systemd[1]: Failed to start Home Manager environment for usyer.
Jun 30 17:55:55 latopler systemd[1]: home-manager-usyer.service: Consumed 1.175s CPU time, no IP traffic.