nix-community / home-manager

Manage a user environment using Nix [maintainer=@rycee]
https://nix-community.github.io/home-manager/
MIT License
6.68k stars 1.75k forks source link

bug: borgmatic service borked permissions #5749

Open sdaqo opened 3 weeks ago

sdaqo commented 3 weeks ago

Are you following the right branch?

Is there an existing issue for this?

Issue description

The borgmatic service is failing since 2024-08-11 with this error:

× borgmatic.service - borgmatic backup
     Loaded: loaded (/home/paul/.config/systemd/user/borgmatic.service; linked; preset: enabled)
     Active: failed (Result: exit-code) since Sat 2024-08-17 16:07:06 CEST; 18min ago
TriggeredBy: ● borgmatic.timer
    Process: 698364 ExecStartPre=/nix/store/xfm4mg874w5n39zbqx24yiw7hmka94n7-coreutils-9.5/bin/sleep 3m (code=exited, status=214/SETSCHEDULER)
        CPU: 329us

Aug 17 16:07:06 nixos-desktop systemd[3651]: Starting borgmatic backup...
Aug 17 16:07:06 nixos-desktop (sleep)[698364]: borgmatic.service: Failed to set up CPU scheduling: Operation not permitted
Aug 17 16:07:06 nixos-desktop systemd[3651]: borgmatic.service: Control process exited, code=exited, status=214/SETSCHEDULER
Aug 17 16:07:06 nixos-desktop systemd[3651]: borgmatic.service: Failed with result 'exit-code'.
Aug 17 16:07:06 nixos-desktop systemd[3651]: Failed to start borgmatic backup.

Maintainer CC

@DamienCassou

System information

- system: `"x86_64-linux"`
 - host os: `Linux 6.6.44, NixOS, 24.05 (Uakari), 24.05.20240810.a781ff3`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.5`
 - channels(root): `"nixos-24.05"`
 - nixpkgs: `/nix/store/61502m9h1jl40j7sxbcyjdm6qi1k7x97-source`
DamienCassou commented 3 weeks ago

I'm unfortunately suffering from the same problem and haven't found a solution yet.

sdaqo commented 3 weeks ago

I think one of these is triggering the perm errors, but who knows why it happened so suddenly. https://github.com/nix-community/home-manager/blob/086f619dd991a4d355c07837448244029fc2d9ab/modules/services/borgmatic.nix#L48C1-L53C1

sdaqo commented 3 weeks ago

I saw this: https://github.com/Nefelim4ag/systemd-swap/issues/160

AmeerTaweel commented 3 weeks ago

I'm facing the same issue. I would love to help but I have no idea how. I would appreciate it if you give me some pointers.

sdaqo commented 3 weeks ago

You could try commenting out the lines I marked in the link, but this is probably not the root cause, beyond that I am none the wiser. I would try but do not have access to any computer rn.

DamienCassou commented 3 weeks ago

Another thing that is worth trying: go back in the past and tell me which commit of nixpkgs or home-manager started the problem.

@sdaqo @AmeerTaweel are you on NixOS or another distribution (and which one)?

AmeerTaweel commented 3 weeks ago

Yes, I'm using NixOS 24.11. I will try finding the commit using git bisect tonite or tomorrow.

AmeerTaweel commented 3 weeks ago

I did some bisecting and figured out that nixpkgs commit b9fb14ccf62a39df41a2a92e2bad2777c62ab914 is good. Now git bisect wants me to test 4c086d8ee04004eceadc2e9f3bc2e08163cdd677 but building the derivation is taking forever. I guess some core package was modified or something.

Note: I kept a static home-manager commit (4fcd54df7cbb1d79cbe81209909ee8514d6b17a4) during the process.

I created a minimal config for the tests. I test if the config works or not using nixos-rebuild build-vm. I confirmed that I face the issue in 52ec9ac3b12395ad677e8b62106f0b98c1f8569d.

So current state of work:

sdaqo commented 3 weeks ago

Awesome, maybe just git bisect skip on the one that doesn't build. Sorry that I can't help out but I think we are getting somewhere!

AmeerTaweel commented 2 weeks ago

Thanks @sdaqo, I didn't know about git bisect skip. I used it to skip two commits that were taking long to build. I finished the process and I found that 4ca52fdf5f0da995fc26e7d07c6c30a710ed4f8a is the first bad commit.

I'm attaching the configuration I used for testing: test.zip

I hope this helps @DamienCassou

JuanGarcia345 commented 2 weeks ago

I think one of these is triggering the perm errors, but who knows why it happened so suddenly. https://github.com/nix-community/home-manager/blob/086f619dd991a4d355c07837448244029fc2d9ab/modules/services/borgmatic.nix#L48C1-L53C1

According to the table 9 in systemd.exec manpage, the service error status (status=214/SETSCHEDULER) is directly linked to the parameter CPUSchedulingPolicy. I wouldn't be able to say how or why the systemd or kernel settings have changed to lead to this behavior (on NixOS 24.05) but sched_setscheduler(2) does not specify what are the required privileges to change the CPU scheduling.

I'd say that the Nice parameter would be enough to set the execution priority for this service (19 gives a very low priority), that is CPUSchedulingPolicy can be safely removed (I've tested by duplicating the SystemD units and verifying the priority of the spawned processes).

mputz86 commented 2 weeks ago

Maybe it helps someone; this creates an override for the default service definition and unsets the problematic part. Does work on my side :thinking: (yes, it's a workaround :D , a fix would be better; but a system without working backup-sys is no-go ;) :D ; right? :D )

  home.file."${config.xdg.configHome}/systemd/user/borgmatic.service.d/override.conf".text
  = ''
    [Service]
    CPUSchedulingPolicy=
  '';
AmeerTaweel commented 1 week ago

Thanks for the workaround. It works for me. I will keep using it until we have a fix.

DamienCassou commented 1 week ago

Thank you for your investigation @AmeerTaweel. Unfortunately, I have no clue why this commit would cause any issue: https://github.com/NixOS/nixpkgs/commit/4ca52fdf5f0da995fc26e7d07c6c30a710ed4f8a.

I confirm the workaround. Any of you wants to submit a PR removing this line?