Open amalaguti opened 7 months ago
Update, Still not very deterministic, but I was able to reproduce it twice, by keeping the salt-master service down for about 30 minutes, then starting the master and twice I've seen one of the minions to fail the same way
I see this on the master which seems to be a "normal" debug message
[DEBUG ] Failed to authenticate message
[DEBUG ] Minion failed to auth to master. Since the payload is encrypted, it is not known which minion failed to authenticate. It is likely that this is a transient failure due to the master rotating its public key.
[DEBUG ] salt.crypt.get_rsa_pub_key: Loading public key
[DEBUG ] Sending event: tag = minion_ping; data = {'id': 'vesselsim-win-ems-1', 'cmd': '_minion_event', 'pretag': None, 'data': 'ping', 'tag': 'minion_ping', '_stamp': '2024-04-08T15:18:22.457113'}
But on the minion:
2024-04-08 15:14:40,442 [salt.crypt :1486][DEBUG ][4660] Failed to authenticate message
2024-04-08 15:14:40,457 [tornado.application:353 ][ERROR ][4660] Future <salt.ext.tornado.concurrent.Future object at 0x00000187D74D7EE0> exception was never retrieved: Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\channel\client.py", line 294, in _crypted_transfer
ret = yield _do_transfer()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1070, in run
yielded = self.gen.send(value)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\channel\client.py", line 284, in _do_transfer
data = self.auth.crypticle.loads(data, raw, nonce=nonce)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\crypt.py", line 1522, in loads
data = self.decrypt(data)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\crypt.py", line 1487, in decrypt
raise AuthenticationError("message authentication failed")
salt.exceptions.AuthenticationError: message authentication failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\minion.py", line 2727, in handle_event
yield _minion.req_channel.send(
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\channel\client.py", line 341, in send
ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\channel\client.py", line 298, in _crypted_transfer
ret = yield _do_transfer()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\gen.py", line 1070, in run
yielded = self.gen.send(value)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\channel\client.py", line 284, in _do_transfer
data = self.auth.crypticle.loads(data, raw, nonce=nonce)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\crypt.py", line 1522, in loads
data = self.decrypt(data)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\crypt.py", line 1487, in decrypt
raise AuthenticationError("message authentication failed")
salt.exceptions.AuthenticationError: message authentication failed
2024-04-08 15:14:40,457 [tornado.general :590 ][ERROR ][4660] Uncaught exception in ZMQStream callback
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\lib\site-packages\zmq\eventloop\zmqstream.py", line 584, in _run_callback
f = callback(*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\transport\zeromq.py", line 628, in handle_reply
future.set_result(data)
AttributeError: 'NoneType' object has no attribute 'set_result'
2024-04-08 15:14:40,457 [tornado.general :645 ][ERROR ][4660] Uncaught exception in zmqstream callback
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\lib\site-packages\zmq\eventloop\zmqstream.py", line 634, in _handle_events
self._handle_recv()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\zmq\eventloop\zmqstream.py", line 663, in _handle_recv
self._run_callback(callback, msg)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\zmq\eventloop\zmqstream.py", line 584, in _run_callback
f = callback(*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\transport\zeromq.py", line 628, in handle_reply
future.set_result(data)
AttributeError: 'NoneType' object has no attribute 'set_result'
2024-04-08 15:14:40,457 [tornado.application:640 ][ERROR ][4660] Exception in callback (<zmq.Socket(zmq.REQ) at 0x187d5dee740>, <function wrap.<locals>.null_wrapper at 0x00000187D5E02EF0>)
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\ioloop.py", line 890, in start
handler_func(fd_obj, events)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\ext\tornado\stack_context.py", line 278, in null_wrapper
return fn(*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\zmq\eventloop\zmqstream.py", line 634, in _handle_events
self._handle_recv()
File "C:\Program Files\Salt Project\Salt\lib\site-packages\zmq\eventloop\zmqstream.py", line 663, in _handle_recv
self._run_callback(callback, msg)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\zmq\eventloop\zmqstream.py", line 584, in _run_callback
f = callback(*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\lib\site-packages\salt\transport\zeromq.py", line 628, in handle_reply
future.set_result(data)
AttributeError: 'NoneType' object has no attribute 'set_result'
The difference in this minion is the following settings, but I'm not sure if it is related
ping_interval: 5
auth_safemode: True
@amalaguti Please provide the output of salt --versions-report.
@amalaguti Please provide the output of
salt --versions-report.
(MASTER)
Salt Version: Salt: 3006.4
Python Version: Python: 3.10.13 (main, Oct 4 2023, 21:54:22) [GCC 11.2.0]
Dependency Versions: cffi: 1.14.6 cherrypy: unknown dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.2 libgit2: Not Installed looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.9.8 pygit2: Not Installed python-gnupg: 0.4.8 PyYAML: 6.0.1 PyZMQ: 23.2.0 relenv: 0.13.12 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4
System Versions: dist: rhel 9.1 Plow locale: utf-8 machine: x86_64 release: 5.14.0-162.23.1.el9_1.x86_64 system: Linux version: Red Hat Enterprise Linux 9.1 Plow
Salt Version: Salt: 3006.6
Python Version: Python: 3.10.13 (heads/main:5600dd9, Nov 15 2023, 04:42:06) [MSC v.1937 64 bit (AMD64)]
Dependency Versions: cffi: 1.14.6 cherrypy: 18.6.1 dateutil: 2.8.1 docker-py: Not Installed gitdb: 4.0.7 gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.4.8 PyYAML: 6.0.1 PyZMQ: 25.0.2 relenv: 0.14.2 smmap: 4.0.0 timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4
System Versions: dist: locale: cp1252 machine: AMD64 release: 2019Server system: Windows version: 2019Server 10.0.17763 SP0 Multiprocessor Free
Looks very similar to my issue that I created 2 weeks ago #66288 @dwoz can you have a look at that issue too? Have yet to receive a response from anyone of the salt team.
Looks very similar to my issue that I created 2 weeks ago #66288 @dwoz can you have a look at that issue too? Have yet to receive a response from anyone of the salt team.
Agreed, looks similar, I just wasn't sure and found it after I've created it this one
Description I started to notice this recently on my lab environment,
after starting master service due it was left down for some time (a bunch of hours) (over the weekend, or over night due some maintenance, it's a lab server) The minions (Windows, the problem does not seem to affect the minion running on the master) don't seem to establish a proper connection to the server and when running test.ping from the master, it behaves erraticaly showing non consistent responses, and even weirder is that the same minion ID is shown multiple times in the output of the test.ping command The response may vary True or False
A following test.ping may get a different behavior. Running salt-call test.ping from the minions seem to be fine, but the issue does not go away till restarting the salt-minion on Windows so it re-establish a proper connection to the master again
Master:
Minion:
Setup Please be as specific as possible and give set-up details. 1 master - salt 3006.4 (Sulfur) 1 minion running on master - 3006.4 2 Windows minion - 3006.4
Steps to Reproduce the behavior (Include debug logs if possible and relevant)
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml PASTE HERE ```Additional context Add any other context about the problem here.