hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.11k stars 4.21k forks source link

Packaging switch from valid `simple` unit to `notify` unit results in abandoned processes. #27935

Open jboero opened 3 months ago

jboero commented 3 months ago

[UPDATE] this problem is reproduced with systemd v255.10.

Hi a while ago the packaging systemd units were apparently changed from simple to notify type but I'm seeing problems with unreaped failed processes. I don't think the notifications are being handled correctly. Does anybody else notice these problems?

Comit by @RickyGrassmuck please advise. https://github.com/hashicorp/vault/commit/b09f3c014883e574236cee9921b52b5421177149

Failed processes are not handled and restarting a failed service just forks another instance.

> sudo systemctl status vault
× vault.service - "HashiCorp Vault - A tool for managing secrets"
     Loaded: loaded (/usr/lib/systemd/system/vault.service; disabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Thu 2024-08-01 13:08:45 BST; 23min ago
       Docs: https://developer.hashicorp.com/vault/docs
    Process: 3888769 ExecStart=/usr/bin/vault server -config=/etc/vault.d/vault.hcl (code=exited, status=1/FAILURE)
   Main PID: 3888769 (code=exited, status=1/FAILURE)
      Tasks: 9 (limit: 18782)
     Memory: 34.4M (peak: 60.3M)
        CPU: 269ms
     CGroup: /system.slice/vault.service
             ├─3830852 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             ├─3831258 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             ├─3831624 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             ├─3875357 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             ├─3875781 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             ├─3876153 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             ├─3888016 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             ├─3888417 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
             └─3888814 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session

Aug 01 13:08:45 xps systemd[1]: vault.service: Unit process 3830852 (dbus-daemon) remains running after unit stopped.
Aug 01 13:08:45 xps systemd[1]: vault.service: Unit process 3831258 (dbus-daemon) remains running after unit stopped.
Aug 01 13:08:45 xps systemd[1]: vault.service: Unit process 3831624 (dbus-daemon) remains running after unit stopped.
Aug 01 13:08:45 xps systemd[1]: vault.service: Unit process 3875357 (dbus-daemon) remains running after unit stopped.
Aug 01 13:08:45 xps systemd[1]: vault.service: Unit process 3875781 (dbus-daemon) remains running after unit stopped.
Aug 01 13:08:45 xps systemd[1]: vault.service: Unit process 3876153 (dbus-daemon) remains running after unit stopped.
Aug 01 13:08:45 xps systemd[1]: vault.service: Unit process 3888016 (dbus-daemon) remains running after unit stopped.

And the pgrep

> ps -ef | grep vault
vault    3830852       1  0 12:54 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3831258       1  0 12:54 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3831624       1  0 12:54 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3875357       1  0 13:05 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3875781       1  0 13:05 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3876153       1  0 13:05 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3888016       1  0 13:08 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3888417       1  0 13:08 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
vault    3888814       1  0 13:08 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
user   3989592  503396  0 13:32 pts/1    00:00:00 grep --color=auto vault

When using notify type ExecStart should point to a script that handles dbus hooks, not the raw service binary: https://askubuntu.com/questions/1120023/how-to-use-systemd-notify [UPDATE] dbus support was added to Vault so this is no longer a problem.

jboero commented 2 months ago

Any update on this? This is a serious problem and a very quick solution.

jboero commented 2 months ago

OK I'll do it myself. The original systemd unit I wrote left default simple but this time it's explicitly set to simple as it should be.

https://github.com/hashicorp/vault/pull/28029

ryancragun commented 2 months ago

Hello @jboero,

The systemd unit for Vault is type notify as we've made use of systemd notify functionality in Vault for some time now.

The dangling dbus processes you're seeing are likely caused by an upstream dependency which has not been fixed yet.

A workaround that ought to resolve it for you would be to define the DBUS_SESSION_BUS_ADDRESS environment variable before launching vault, e.g

# /etc/vault.d/vault.env
DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus

For further information you can refer to our support article.

If that doesn't resolve it please let us know.

jboero commented 2 months ago

Hi @ryancragun thanks for the update. I didn't realize you had added native dbus support to Vault.

I did just try to reproduce this on various versions of systemd and I see it's not a problem on older RHEL 8 environments. This happens since a new systemd v255.10 update came in on my test Fedora environment. I think this may actually be an issue in the future. Setting my own back to type=simple solves the problem for me.

Installed Packages
Name         : systemd
Version      : 255.10
Release      : 3.fc40
Architecture : x86_64
Size         : 16 M
Source       : systemd-255.10-3.fc40.src.rpm
Repository   : @System
From repo    : updates
jboero commented 1 month ago

Excellent thank you for investigating. Well done Mike Oprea