Open amalaguti opened 5 months ago
Including 3007.0, same behavior,
Fails the same way, during the scheduler execution looks like it fails to contact the other master, and so the job fails
salt.exceptions.SaltReqTimeoutError: Request client send timedout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\schedule.py", line 880, in handle_func
ret["return"] = self.functions[func](*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 160, in __call__
ret = self.loader.run(run_func, *args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 1233, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 1248, in _run_as
return _func_or_method(*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\modules\state.py", line 1425, in sls
st_ = salt.state.HighState(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 5007, in __init__
BaseHighState.__init__(self, opts)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 3764, in __init__
self.opts = self.__gen_opts(opts)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 3787, in __gen_opts
mopts = self.client.master_opts()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\fileclient.py", line 1477, in master_opts
return self._channel_send(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\fileclient.py", line 1147, in _channel_send
return self.channel.send(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\asynchronous.py", line 139, in wrap
raise exc_info[1].with_traceback(exc_info[2])
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\asynchronous.py", line 147, in _target
result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs))
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\ioloop.py", line 527, in run_sync
return future_cell[0].result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\gen.py", line 780, in run
yielded = self.gen.throw(exc)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 340, in send
ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\gen.py", line 767, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\gen.py", line 780, in run
yielded = self.gen.throw(exc)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 291, in _crypted_transfer
yield self.auth.authenticate()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\gen.py", line 767, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 832, in _authenticate
creds = yield self.sign_in(channel=channel)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\gen.py", line 767, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\gen.py", line 780, in run
yielded = self.gen.throw(exc)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 951, in sign_in
raise SaltClientError(
salt.exceptions.SaltClientError: Attempt to authenticate with the salt master failed with timeout error
2024-04-16 14:28:15,021 [salt.utils.schedule :963 ][DEBUG ][3568] schedule.handle_func: Removing C:\ProgramData\Salt Project\Salt\var\cache\salt\minion\proc\20240416142628380999
2024-04-16 14:28:15,881 [salt.utils.process :1157][DEBUG ][1168] Subprocess Schedule(name=schedule_new_task, jid=20240416142628380999) cleaned up
Issue also valid for 3006.8
2024-05-10 01:56:55,036 [salt.utils.schedule :411 ][INFO ][584] Updating job settings for scheduled job: schedule_new_task
2024-05-10 01:56:55,052 [salt.minion :2744][INFO ][584] Minion of '172.21.0.10' is handling event tag '/salt/minion/minion_schedule_add_complete'
2024-05-10 01:56:55,052 [salt.minion :2744][INFO ][584] Minion of '172.21.0.11' is handling event tag '/salt/minion/minion_schedule_add_complete'
2024-05-10 01:56:55,052 [salt.minion :2744][INFO ][584] Minion of '172.21.0.10' is handling event tag '/salt/minion/minion_schedule_add_complete'
2024-05-10 01:56:55,052 [salt.minion :2744][INFO ][584] Minion of '172.21.0.11' is handling event tag '/salt/minion/minion_schedule_add_complete'
2024-05-10 01:56:55,052 [salt.state :319 ][INFO ][7648] {'schedule_new_task': 'added'}
2024-05-10 01:56:55,052 [salt.state :2492][INFO ][7648] Completed state [schedule_new_task] at time 01:56:55.052369 (duration_in_ms=171.872)
2024-05-10 01:56:55,083 [salt.loaded.int.module.cmdmod :438 ][INFO ][7648] Executing command attrib in directory 'C:\Users\adrian'
2024-05-10 02:01:59,665 [salt.utils.schedule :1792][INFO ][584] Running scheduled job: schedule_new_task (runtime missed by 0.665372 seconds) with jid 20240510020159665372
2024-05-10 02:02:46,320 [salt.utils.schedule :913 ][ERROR ][8128] Unhandled exception running state.sls
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 768, in sign_in
payload = yield channel.send(sign_in_payload, tries=tries, timeout=timeout)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 337, in send
ret = yield self._uncrypted_transfer(load, timeout=timeout)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 308, in _uncrypted_transfer
ret = yield self.transport.send(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\zeromq.py", line 916, in send
ret = yield self.message_client.send(load, timeout=timeout)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\zeromq.py", line 595, in send
recv = yield future
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
salt.exceptions.SaltReqTimeoutError: Message timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\schedule.py", line 880, in handle_func
ret["return"] = self.functions[func](*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 159, in __call__
ret = self.loader.run(run_func, *args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 1245, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 1260, in _run_as
return _func_or_method(*args, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\modules\state.py", line 1425, in sls
st_ = salt.state.HighState(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 4954, in __init__
BaseHighState.__init__(self, opts)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 3754, in __init__
self.opts = self.__gen_opts(opts)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 3777, in __gen_opts
mopts = self.client.master_opts()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\fileclient.py", line 1481, in master_opts
return self._channel_send(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\fileclient.py", line 1151, in _channel_send
return self.channel.send(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\asynchronous.py", line 124, in wrap
raise exc_info[1].with_traceback(exc_info[2])
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\asynchronous.py", line 130, in _target
result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs))
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\ioloop.py", line 459, in run_sync
return future_cell[0].result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 340, in send
ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 290, in _crypted_transfer
yield self.auth.authenticate()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 657, in _authenticate
creds = yield self.sign_in(channel=channel)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
value = future.result()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 776, in sign_in
raise SaltClientError(
salt.exceptions.SaltClientError: Attempt to authenticate with the salt master failed with timeout error
Description Windows minion 3006.1 (and 3006.7, possibly all 3006.x), configured in multimaster, having connected to the first master in list, a scheduled job is triggered when this connected master is offline, the minion fails to run the job due it seems can't connect to the other available master. Looks like it remains stick to the first master connected, and does not flip the connection to the other master.
In many occassions the minion ssems to stay in not a healthy state, requiring the minion service to be restarted to re-establish healthy connection.
The failure is erratic and it's not consistent, some times it connects and runs fine in all minions, some times a limited set of the minions succeed while other group fail.
Tried multiple times, with different settings like ping_interval, master_tries, auth_safemode, and return_job: False no luck to make it consistent. No setting seem to be able to definitely fix it.
Clearly the problem is due the schedule job runnning state.sls which requires to establish connection to the master. If the job is tweaked to run something else that does not require master connection, it works fine
Even an exception can be seen while the minion tries to connect to the master(s), it's not always enough to determine the issue, despite the exception, sometimes it works fine
Setup Setup a multimaster minion Run a state to save a schedule job to run a few minutes later that calls state.sls Put the active master offline, usually the first one in the minion master list Wait for the schedule job to run (may or may not work) It's convenient to try with multiple minions, some may work, some may not.
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior (Include debug logs if possible and relevant)
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml PASTE HERE ```Additional context Add any other context about the problem here.