vmware-archive / salt-pack

Salt Package Builder
Apache License 2.0
55 stars 23 forks source link

Centos6 python2-pycryptodomex with salt-api defunct processes when package upgraded or removed #561

Closed Ch3LL closed 5 years ago

Ch3LL commented 6 years ago

When using centos6 setup with salt-api using the optional python2-pycryptodomex package, if you upgrade or remove the pycryptodomex package a defunct salt-api process shows up.

Replication Steps:

  1. install salt 2017.7.5 or 7.6
  2. install python2-pycryptodomex version 3.4: yum install python2-pycryptodomex
  3. setup salt-master, salt-minion and salt-api. check process list:
[root@li1004-10 ~]# ps aux | grep -i salt-api
root     27787  0.7  1.8 390012 38448 ?        S    21:59   0:00 /usr/bin/python2.7 /usr/bin/salt-api -d
root     27790  1.9  2.0 1777700 41788 ?       Sl   21:59   0:00 /usr/bin/python2.7 /usr/bin/salt-api -d
  1. upgrade pycryptodomex to 3.6 yum upgrade python2-pycryptodomex
  2. check process list:
[root@li1004-10 ~]# ps aux | grep -i salt-api
root     27787  0.0  1.8 390012 38516 ?        S    21:59   0:00 /usr/bin/python2.7 /usr/bin/salt-api -d
root     27973  0.2  1.8 349108 37392 ?        S    22:00   0:00 /usr/bin/python2.7 /usr/bin/salt-api -d
root     27975  1.4  1.8 1719632 37212 ?       Sl   22:00   0:00 /usr/bin/python2.7 /usr/bin/salt-api -d
root     28100  1.8  0.0      0     0 ?        Z    22:01   0:00 [salt-api] <defunct>

WORKAROUND: manually kill the salt-api process. Then restart salt-api process and everything works.

Ch3LL commented 6 years ago

I manually tested this on centos7 to see if we were seeing the same thing even though our automated tests were passing. What i found after upgrading pycryptodomex from 3.4 to 3.6 was that there was now 3 salt-api processes showing up but it never becomes a defunct process. salt-api still continues to work. also when i try to restart the service it correclty kills the processes. I dont need to manually kill the service and can use the correct service manager to restart.

dmurphy18 commented 6 years ago

The problem also highlights itself if pycryptodomex is removed. A dangling child process of the original salt-api process is left behind. The same is happening with Salt 2018.3.1 on both Centos 7 & 6. Salt-api is going to use the pycryptodomex version still in memory, even after yum erases it, and hence will not start using the basic pycrypto until it is restarted.

Requires further examination of salt-api and how it handles upgrade and removal of underlying packages that it is using.

dmurphy18 commented 6 years ago

@Ch3LL The problem is due to the use of cryptographic package not being hot-plug capable in Salt. That is, adding a preferred cryptographic package, does not immediately imply its usage, similarly it's removal (parts of its removal are detected and a new user of it can be re-spun leading to additional processes.

It is best after the addition or removal of a cryptographic package, to restart all Salt packages which are currently installed and active, for example: salt-minion, salt-master, salt-api. This is similar to changing a configuration parameter in a config file and having to restart the Salt component which utilizes the config file.

Preferred order of cryptographic packages utilized by Salt: M2Crypto pycryptodomex pycrypto

If the preferred cryptographic package is unavailable, the next in the list is tried. Note: pycrypto is a required dependency, that is, at a minimum pycrypto must be available for Salt to install.

dmurphy18 commented 6 years ago

With removal of python2-pycryptodomex, CherryPy monitors system modules, see https://github.com/cherrypy/cherrypy/blob/master/cherrypy/process/plugins.py#L689-L690

which results in the following occurring (from systemd journal output) Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Restarting because /usr/lib64/python2.7/site-packages/Cryptodome/IO/PEM.py changed. Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Stopped thread 'Autoreloader'. Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Bus STOPPING Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('0.0.0.0', 8000)) shut down Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Stopped thread '_TimeoutMonitor'. Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Bus STOPPED Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Bus EXITING Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Bus EXITED Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Waiting for child threads to terminate... Jun 20 11:43:00 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:00] ENGINE Re-spawning /usr/bin/salt-api Jun 20 11:43:01 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:01] ENGINE Listening for SIGHUP. Jun 20 11:43:01 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:01] ENGINE Listening for SIGTERM. Jun 20 11:43:01 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:01] ENGINE Listening for SIGUSR1. Jun 20 11:43:01 localhost.localdomain salt-api[10721]: [20/Jun/2018:11:43:01] ENGINE Bus STARTING

Hence the previous recommendation of restarting salt-xxxx components after adding or removing a cryptographic package which Salt utilizes.

@Ch3LL With this information, can this be closed and the doc's updated to reflect the need to restart the Salt components

dmurphy18 commented 6 years ago

Note the code in CherryPy is not limited to Cryptodome since it does a sys.modules.items() as demonstrated on an upgrade from 2017.7.6 to 20183.1 and ldap.py is noted as changed, from the systemd journal output:

398 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Restarting because /usr/lib/python2.7/site-packages/salt/auth/ldap.py changed. 399 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Stopped thread 'Autoreloader'. 400 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Bus STOPPING 401 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('0.0.0.0', 8000)) shut down 402 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Stopped thread '_TimeoutMonitor'. 403 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Bus STOPPED 404 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Bus EXITING 405 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Bus EXITED 406 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Waiting for child threads to terminate... 407 Jun 20 14:32:07 localhost.localdomain salt-api[21440]: [20/Jun/2018:14:32:07] ENGINE Re-spawning /usr/bin/salt-api

This is going to make any hot-plug changes interesting, in that how many other packages have such code which monitors what is used and detecting changes. Note, that the install of Salt packages for packages that are already running causes a restart to be performed to pickup any configuration changes, hence we are back to two salt-api processes, in this instance.

root@localhost:~# ps -ef | grep salt-api root 24436 1 0 14:32 ? 00:00:00 /usr/bin/python /usr/bin/salt-api root 24612 24436 0 14:32 ? 00:00:00 /usr/bin/python /usr/bin/salt-api root 25848 23810 0 14:33 pts/3 00:00:00 grep --color=auto salt-api root@localhost:~#

Ch3LL commented 6 years ago

My concern isn't that you need to restart the service. I understand that requirement. My concern was the process becoming defunct on cent6 as it does not occur on cent7. This requires manually stopping the process with a killsignal. Restarting the service with service salt-api restart does not work as the defunct process stays around.

dmurphy18 commented 6 years ago

zombie process. Guess we need to generate a list of packages with Salt where if you are going to install them, then kindly stop salt's running processes, then upgrade, then restart stopped running processes, or install packages xyz before installing salt.

Best analogy is , cannot change the engine while driving down the freeway.

dmurphy18 commented 5 years ago

@Ch3LL Is this still a concern or can we close it, since we don't do hot-plug, hence correct usage is shutdown, then start