vmware-archive / salt-pack

Salt Package Builder
Apache License 2.0
55 stars 23 forks source link

gpg-agent failing to start in supervised mode on Ubuntu 18.04 #645

Closed dmurphy18 closed 5 years ago

dmurphy18 commented 5 years ago

Currently the gpg-agent is failing to start in supervised mode (required to properly sign packages, import keys, passphrase) with Ubuntu 18.04 which uses gpg-agent v2.2x, which is based on systemd internally, and will probably be an issue for upcoming Buster (Debian 10).

Issue has been isolated to salt/utils/process.py function daemonize_if and it calling daemonize(False) which then does not redirect stdout, stderr. Faking out the daemonize_if with a 'return' works starting gpg-agent correctly but they highstate will fail since no daemonizing when required.

Due to Python multiprocessing issues with redirection as highlighted in function daemonize, where the first multiprocessing daemon redirects successfully, but a subsequent multiprocessing daemon redirection of that daemon will blow up. Hence need to allow for an single use indication (opts) that starting up and allow an initial daemon with redirection this once for an initial redirection with daemonize_if.

The initial daemonization during init is due to old System V init process, but with systemd now prevalent on most Linux distributions, we should add a check for systemd present and allow it to daemonize the salt startup instead of init process: but that is for the develop branch.

dmurphy18 commented 5 years ago

Upon a night's reflection, the daemonize_if may not be the problem, but shows similar, given that the issue occurs with a steady state master and minion and is not related to startup.

Perhaps there may be some method to decide on whether the daemonization is being done from another multiprocessing daemonized process and attempt to avoid the issue described above.

Further this occurs with the method of invocation on the master, therefore something has to be passed to the minion in the command messaging to the minion.

dmurphy18 commented 5 years ago

revised the gpg-agent_kill.sh to also check that the script pid was not the ppid of the gpg-agent process getting killed, that is check pip and ppid since the script can have a sub-shell, also updated for $BASHPID as opposed to just $$, better check for script pid. Previously, only checking pid, hence when running on an empty machine testing, noticed the kill -9 where the ppid was that of the kill script.

This could explain the randomness of the gpg-agent sometimes working or not, depending on the returned list from $(ps -ef | grep -v 'grep' | grep gpg-agent)

Testing with nightly builds over the weekend