saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.16k stars 5.48k forks source link

RHEL8 minion does not run states after installation of minion #63223

Open infantvin opened 1 year ago

infantvin commented 1 year ago

Description Creating a new RHEL8 salt-minion using salt-cloud. This automatically installs the latest salt-minion version for 3005.x

Post the salt configuration, the minion does not execute any state passed from the master. The error in the minion log says

2022-12-06 16:21:18,435 [salt.utils.event :821 ][DEBUG ][4602] Sending event: tag = _salt_error; data = {'message': 'The minion function caused an exception', 'args': ('The minion function caused an exception',)

I am attaching a debug based output of the minion log to this report so that all the details are available.

Setup

Fresh install of salt-master 3005.1-2 on RHEL9. Trying to use salt-cloud to create a new RHEL8 minion (RHEL 8.0)

The platform is VMware and the salt-master and minion are both VMware virtual machines.

There is no firewall running on either master or minion.

Both are in the same network, so no VLAN etc configuration

Trying to use onedir 3005 as bootstrap arguments. Same thing happens with any method for the minion install git/stable etc

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior Attaching the logs to this report.

Steps:

Expected behavior

The state should execute successfully. Never happens

Screenshots If applicable, add screenshots to help explain your problem.

Versions Report

$ salt-master --versions-report Salt Version: Salt: 3005.1 Dependency Versions: cffi: 1.14.5 cherrypy: Not Installed dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 2.11.3 libgit2: 1.3.0 M2Crypto: 0.38.0 Mako: Not Installed msgpack: 1.0.3 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: 2.20 pycrypto: 3.16.0 pycryptodome: 3.14.0 pygit2: 1.7.1 Python: 3.9.14 (main, Nov 7 2022, 00:00:00) python-gnupg: Not Installed PyYAML: 5.4.1 PyZMQ: 22.3.0 smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: rhel 9.0 Plow locale: utf-8 machine: x86_64 release: 5.14.0-70.13.1.el9_0.x86_64 system: Linux version: Red Hat Enterprise Linux 9.0 Plow # salt-minion --versions-report Salt Version: Salt: 3005.1 Dependency Versions: cffi: 1.14.6 cherrypy: 18.6.1 dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.0 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.9.8 pygit2: Not Installed Python: 3.9.15 (main, Nov 8 2022, 03:47:03) python-gnupg: 0.4.8 PyYAML: 5.4.1 PyZMQ: 23.2.0 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: rhel 8.0 Ootpa locale: utf-8 machine: x86_64 release: 4.18.0-80.el8.x86_64 system: Linux version: Red Hat Enterprise Linux 8.0 Ootpa (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) Both master and minion are at same level. ```yaml any simple sls file can be tried. Say a small sls to install gcc. In our case, we have a startup state to rename the host based on the VM name in Vmware. [salt-master-exception.txt](https://github.com/saltstack/salt/files/10169518/salt-master-exception.txt) [salt-minion-log.txt](https://github.com/saltstack/salt/files/10169521/salt-minion-log.txt) ```

Additional context We have a master running 3003 version of salt and everything works fine there. This seems to be 3005 specific. We need 3005 to move our master to the newer version. But this is breaking.

Please let me know if you need any other information from me.

OrangeDog commented 1 year ago

I am attaching a debug based output of the minion log to this report so that all the details are available.

Where is it? In particular, the actual exception should be logged somewhere.

infantvin commented 1 year ago

Hi I was positive I added those text files from the master and minion.

I am adding these again. Please confirm when you see them.

salt-master-exception.txt salt-minion-log.txt

infantvin commented 1 year ago

I am adding the exception seen on the master here. Its present in the txt file attached before as well if needed.

The minion function caused an exception: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/salt/minion.py", line 1935, in _thread_return return_data = minion_instance._execute_job_function( File "/usr/lib/python3.9/site-packages/salt/minion.py", line 1894, in _execute_job_function return_data = self.executors[fname](opts, data, func, args, kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 149, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1228, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1243, in _run_as return _func_or_method(*args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/executors/direct_call.py", line 10, in execute return func(*args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 149, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1228, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1243, in _run_as return _func_or_method(*args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/modules/state.py", line 793, in apply_ return sls(mods, **kwargs) File "/usr/lib/python3.9/site-packages/salt/modules/state.py", line 1394, in sls high_, errors = st_.render_highstate({opts["saltenv"]: mods}) File "/usr/lib/python3.9/site-packages/salt/state.py", line 4461, in render_highstate statefiles = fnmatch.filter(self.avail[saltenv], sls_match) File "/usr/lib/python3.9/site-packages/salt/state.py", line 3562, in __getitem__ self._avail[saltenv] = self._hs.client.list_states(saltenv) File "/usr/lib/python3.9/site-packages/salt/fileclient.py", line 379, in list_states for path in self.file_list(saltenv): File "/usr/lib/python3.9/site-packages/salt/fileclient.py", line 1363, in file_list return self.channel.send(load) File "/usr/lib/python3.9/site-packages/salt/utils/asynchronous.py", line 125, in wrap raise exc_info[1].with_traceback(exc_info[2]) File "/usr/lib/python3.9/site-packages/salt/utils/asynchronous.py", line 131, in _target result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs)) File "/usr/lib/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync return future_cell[0].result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/channel/client.py", line 295, in send ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw) File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/channel/client.py", line 252, in _crypted_transfer ret = yield _do_transfer() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/channel/client.py", line 233, in _do_transfer data = yield self.transport.send( File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/transport/zeromq.py", line 914, in send ret = yield self.message_client.send(load, timeout=timeout) File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/transport/zeromq.py", line 624, in send recv = yield future File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info salt.exceptions.SaltReqTimeoutError: Message timed out ERROR: Minions returned with non-zero exit code

infantvin commented 1 year ago

Update:

I tried to create a RHEL8.0 VM based minion again by using an older minion version (3003.4) and even that has the same problem. Same error as above from the salt master after a few minutes of waiting when a state.apply is executed.

OrangeDog commented 1 year ago

This might be another case of #62881. There are a lot of timeout errors in that log.

Are you sure the minion can contact the master on both ports? You might need to adjust firewall rules.

infantvin commented 1 year ago

Thanks for pointing me to that issue.

In our case, the firewalls on the OS are disabled on both master and minion.

I will verify if the ports are reachable and get back in sometime. Let me see what I find.

If this info helps, these VMs are in the same VLAN in Vsphere and those dont have restrictions and just like the issue in #62881, we have 2 salt masters in the same infrastructure running 3003.4 which are currently managing all the build machines and they are working properly.

The intention of moving to 3005.x is to be able to use it for RHEL9 which is needed by our Dev and QA team. We thought it would be a good move to move to new master servers on RHEL9 too. During the testing, this is what I found and reported it.

I will check on what you said and get back to you in sometime.

Thank you


From: James Howe @.> Sent: 07 December 2022 21:21 To: saltstack/salt @.> Cc: Infant Patrick @.>; Author @.> Subject: Re: [saltstack/salt] RHEL8 minion does not run states after installation of minion (Issue #63223)

This might be another case of #62881https://github.com/saltstack/salt/issues/62881. There are a lot of timeout errors in that log.

Are you sure the minion can contact the master on both ports? You might need to adjust firewall rules.

— Reply to this email directly, view it on GitHubhttps://github.com/saltstack/salt/issues/63223#issuecomment-1341167581, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK4DTR6SQLQUC7PLUDVNTBTWMCW7BANCNFSM6AAAAAASV4GRHE. You are receiving this because you authored the thread.Message ID: @.***>

OrangeDog commented 1 year ago

Ah, two masters? #62577, #62318

infantvin commented 1 year ago

Hi,

Thanks for the references again.

Our infrastructure has independent masters. They are not connected to each other.

We have different masters in the sense that one master is used only for testing. No production build machines are attached to it. The second one is exclusively for the production based infrastructure. Nothing goes into this unless it gets tested and validated on the first one.

They are independent of each other and the only common ground is that they have the same versions of the salt packages and the underlying OS.

So in reference to the way the documentation puts it, we have a single master based environment.

I hope this explains things clearly.

I am going to check for those ports we spoke about and get back in sometime.


From: James Howe @.> Sent: 08 December 2022 14:37 To: saltstack/salt @.> Cc: Infant Patrick @.>; Author @.> Subject: Re: [saltstack/salt] RHEL8 minion does not run states after installation of minion (Issue #63223)

Ah, two masters? #62577https://github.com/saltstack/salt/issues/62577, #62318https://github.com/saltstack/salt/issues/62318

— Reply to this email directly, view it on GitHubhttps://github.com/saltstack/salt/issues/63223#issuecomment-1342307154, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK4DTR6YTGZLVFDNVENKWE3WMGQNPANCNFSM6AAAAAASV4GRHE. You are receiving this because you authored the thread.Message ID: @.***>

infantvin commented 1 year ago

I just verified that both the ports on the master are reachable from the minion

telnet salt-master-3005 4506 Trying xxx... Connected to salt-master-3005. Escape character is '^]'. ^]

telnet salt-master-3005 4505 Trying xxxx... Connected to salt-master-3005. Escape character is '^]'. ^]

There is also a connection established from the minion side as per the netstat output. However the communication is not taking place

ps -ef|grep salt root 926 1 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion root 1423 926 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion root 1427 1423 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion root 1880 1423 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion

netstat -anp|grep pyt tcp 0 0 10.246.66.86:47148 10.246.67.21:4505 ESTABLISHED 1423/platform-pytho

And whenever we try something like test.ping, the connection gets establised on port 4506 of the master as well.

So, something is going wrong between the master and minion communication when a state is run. The question is what is it?

infantvin commented 1 year ago

Hi

After a long weekend of continuous tests, this looks like an IPV6 related problem. If IPV6 is enabled on the master, this request timeout error is seen whenever a minion is created using salt-cloud. This happens even with the master.conf has IPV6 set to off on it.

On disabling IPV6 completely in the network configuration of the ethernet card on the master node, the minion works normally.

I think I have to investigate if the IPV6 network (which is actively used in our environment) has any problems. If there is none, then I will open a new issue to you guys again.

Thanks