saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.09k stars 5.47k forks source link

[BUG] Some of the minons crashes with the msg "AttributeError: 'MinionManager' object has no attribute 'reload'" #62956

Open saiaprameya opened 1 year ago

saiaprameya commented 1 year ago

Description of Issue

Hi SALTSTACK Team,

I have about 500 salt-minions running and i keep seeing intermittent issues where they randomly crash after some time with the following backtrace:

[DEBUG ] Minion of '172.16.0.1' is handling event tag '_minion_mine' [DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'C01-B07-VM6', 'tcp://172.16.0.1:4506') [DEBUG ] Minion return retry timer set to 7 seconds (randomized) [DEBUG ] Closing AsyncReqChannel instance [DEBUG ] Closing IPCMessageClient instance [DEBUG ] schedule.handle_func: Removing /var/cache/salt/minion/proc/20221020223936633621 [DEBUG ] Subprocess Schedule(name=__mine_interval, jid=20221020223936633621) cleaned up Process MinionKeepAlive: Traceback (most recent call last): File "/usr/local/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/virt-py3/lib/python3.8/site-packages/salt/scripts.py", line 142, in minion_process minion.start() File "/root/virt-py3/lib/python3.8/site-packages/salt/cli/daemons.py", line 325, in start self._real_start() File "/root/virt-py3/lib/python3.8/site-packages/salt/cli/daemons.py", line 337, in _real_start self.minion.tune_in() File "/root/virt-py3/lib/python3.8/site-packages/salt/minion.py", line 1182, in tune_in self.io_loop.start() File "/root/virt-py3/lib/python3.8/site-packages/salt/ext/tornado/ioloop.py", line 865, in start event_pairs = self._impl.poll(poll_timeout) File "/root/virt-py3/lib/python3.8/site-packages/salt/scripts.py", line 109, in handle_hup manager.minion.reload() AttributeError: 'MinionManager' object has no attribute 'reload' (virt-py3) [root@C01-B07-VM6 ~]#

All are running salt-minion 3005.1

Can you please let me know if this is a known problem and any solution to this?

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)

Have about 500 VMs running and a server which has reachability to these VMs. Running minions on these 500 VMs and salt-master on the server. salt-master -l debug salt-minion -l debug

Steps to Reproduce Issue

(Include debug logs if possible and relevant.)

Versions Report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

salt --versions-report /root/virtualenv3.7.7/lib/python3.7/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") Salt Version: Salt: 3005.1

Dependency Versions: cffi: 1.15.1 cherrypy: Not Installed dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.2 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.4 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: 2.21 pycrypto: 3.9.8 pycryptodome: 3.15.0 pygit2: Not Installed Python: 3.7.7 (default, Sep 16 2022, 13:08:04) python-gnupg: Not Installed PyYAML: 5.3.1 PyZMQ: 20.0.0 smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.3

System Versions: dist: centos 8 Core locale: UTF-8 machine: x86_64 release: 4.18.0-193.el8.x86_64 system: Linux version: CentOS Linux 8 Core

All minions are running: salt-minion 3005.1

OrangeDog commented 1 year ago

Seen before but with no resolution: #44816

What are your minions doing? Is the error always after handling that mine event?

saiaprameya commented 1 year ago

Hi, I am implementing traffic tool and have master dispatching scripts to the minions to execute and collect the statistics. So this problem comes when i just leave these minions running and after a day or so come back and just run ping and see some of the minions do not respond. When i go back and check salt-minion is not running and the logs show the backtrace that i have pasted before.

Let me know if you need more details.

saiaprameya commented 1 year ago

salt-master is not running always. What it means is that i dont run salt-master in the background and i run it only when i am using the traffic tools where i execute the custom modules on the minions. It might exit due to terminal timeout as i ssh to the server and run salt-master -l debug.

OrangeDog commented 1 year ago

handle_hup would indicate that it's trying to reload because your terminal disconnects.

salt-master and salt-minion are designed to run as persistent system daemons, not ad-hoc CLI tools. The non-daemon interfaces are salt, salt-call, and salt-run.

saiaprameya commented 1 year ago

Hi James, Salt minion is running in the background and is persistent. More over out of 500 minions i see 5-6 minions crashing intermittently with this backtrace. This happens when i keep them idle for say 2 days. If thats the case all the minions should have crashed.

saiaprameya commented 1 year ago

Hi James, Do you have any updates on this one?