Open z900collector opened 7 months ago
Can you try stopping your Salt master and minion, then run find /opt/saltstack/salt -type f -iname "*.pyc" -print -delete
?Then start them back up. I believe that will resolve this issue.
@doesitblend , thanks for the oneliner script to cleanup the .pyc. I am still getting this error message from minion startup. Any suggestion ?
salt-minion[1567838]: [ERROR ] Publish server binding pub to /var/run/salt/minion/minion_event_0f4efbad65_pub.ipc ssl=None
Tried the one liner and now the minion reports:
2024-03-07 22:32:33,860 [salt.transport.zeromq:396 ][ERROR ][1756164] Exception while running callback Traceback (most recent call last): File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume await callback(msg) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback await callback(decoded) TypeError: object NoneType can't be used in 'await' expression
The master reports (lots and lots of these):
2024-03-07 22:23:13,633 [salt.transport.tcp:311 ][WARNING ][1699260] TCP Publish Client encountered an exception while connecting to /var/run/salt/master/master_event_pub.ipc: StreamClosedError('Stream is closed'), will reconnect in 1 seconds - File "/bin/salt-run", line 11, in
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/scripts.py", line 456, in salt_run client.run()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/run.py", line 34, in run ret = runner.run(full_return=True)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/runner.py", line 303, in run ret = self._proc_function(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/mixins.py", line 546, in _proc_function return instance.low(fun, low, full_return=full_return)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/mixins.py", line 383, in low data["return"] = func(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in call ret = self.loader.run(run_func, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1233, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1248, in _run_as return _func_or_method(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/runners/state.py", line 310, in event return statemod["state.event"](
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in call ret = self.loader.run(run_func, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1233, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1248, in _run_as return _func_or_method(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/state.py", line 2576, in event with salt.utils.event.get_event(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 127, in get_event return MasterEvent(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 928, in init super().init(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 265, in init self.connect_pub()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 323, in connect_pub self.subscriber = salt.utils.asynchronous.SyncWrapper(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 77, in init self.obj = cls(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 210, in ipc_publish_client return publish_client(opts, io_loop, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 152, in publish_client return salt.transport.tcp.PublishClient(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/tcp.py", line 219, in init super().init(opts, io_loop, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 398, in init super().init()
@z900collector , the one-liner worked for me to remove *.pyc files and I don't have your message above. I only have following error message from "systemctl status salt-minion" per another ticket https://github.com/saltstack/salt/issues/66179
salt-minion[1567838]: [ERROR ] Publish server binding pub to /var/run/salt/minion/minion_event_0f4efbad65_pub.ipc ssl=None
After running the oneliner on both master and minion, it is working again. Nevertheless I can still see a lot of occurence like this one on minion side:
2024-03-10 19:22:50,833 [salt.transport.zeromq:396 ][ERROR ][32857] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
But it does not prevent the minion from returning properly.
I'm facing the same issue, after deleting .pyc files the minions get online. However these error lines still flood the minion logs.
rocky9t01a test VM upgraded from 3006.6 to 3007.0. I thought next version 3006.8 will be latest one with this bug fix but latest one is 3007 without this bug fix.
[me@rocky9t01 salt]$ sudo salt -L rocky9t01 cmd.run "/usr/bin/tail -10 /var/log/salt/minion"
rocky9t01:
2024-04-09 16:40:43,823 [salt.loaded.int.module.cmdmod:1347][ERROR ][571240] Command 'systemctl' failed with return code: 1
2024-04-09 16:40:43,823 [salt.loaded.int.module.cmdmod:1352][ERROR ][571240] output: Job for salt-minion.service canceled.
2024-04-09 16:40:44,840 [salt.transport.tcp:1405][ERROR ][571328] Publish server binding pub to /var/run/salt/minion/minion_event_69362de666_pub.ipc ssl=None
2024-04-09 16:40:49,491 [salt.transport.zeromq:396 ][ERROR ][571328] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
[me@rocky9t01 salt]$
find /opt/saltstack/salt -type f -iname "*.pyc" -print -delete
This worked, but only for two days, errors returned on three minions so far.
Hi all.
I have the same issues with saltminion 3007.0-0
2024-03-12 10:27:41,901 [salt.transport.tcp:1410][ERROR ][16652] Publish server binding pub to 127.0.0.1:4510 ssl=None
Both on Linux as on windows
Hi
I see there are comments that this is a 'log flood' issue. I think it's a bit more than that - my minions disconnect from the master.
This issue has been opened for two weeks and it is still not gaining enough attention to be classified as bug.
Hi, my minions seem to become slow to respond and commands seem to have timeouts.
When I delete the .pyc files all is fast and happy. A few days later things go slow and time out again.
Log when minion does not respond, doing basic test.ping:
2024-03-19 14:58:45,810 [salt.transport.zeromq:396 ][ERROR ][817] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
2024-03-19 14:58:55,939 [salt.minion :2763][ERROR ][817] Timeout encountered while sending {'cmd': '_return', 'id': 'cpt-ter-blahblah1.atomic.ac', 'success': True, 'return': True, 'retcode': 0, 'jid': '20240319125840713889', 'fun': 'test.ping', 'fun_args': [], 'user': 'root', '_stamp': '2024-03-19T12:58:40.915616', 'nonce': '39f602421d8c4c9bb844bca46a7a8f83'} request
I've reinstall 3006.7 on all my minions but kept the Salt Master at 3007.0 until this is fixed.
Sid
On Tue, 19 Mar 2024, 23:05 SwimGeek, @.***> wrote:
Hi, my minions seem to become slow to respond and commands seem to have timeouts.
When I delete the .pyc files all is fast and happy. A few days later things go slow and time out again.
Log when minion does not respond, doing basic test.ping:
2024-03-19 14:58:45,810 [salt.transport.zeromq:396 ][ERROR ][817] Exception while running callback Traceback (most recent call last): File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume await callback(msg) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback await callback(decoded) TypeError: object NoneType can't be used in 'await' expression 2024-03-19 14:58:55,939 [salt.minion :2763][ERROR ][817] Timeout encountered while sending {'cmd': '_return', 'id': 'cpt-ter-blahblah1.atomic.ac', 'success': True, 'return': True, 'retcode': 0, 'jid': '20240319125840713889', 'fun': 'test.ping', 'fun_args': [], 'user': 'root', '_stamp': '2024-03-19T12:58:40.915616', 'nonce': '39f602421d8c4c9bb844bca46a7a8f83'} request
— Reply to this email directly, view it on GitHub https://github.com/saltstack/salt/issues/66177#issuecomment-2007126338, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLBVGBWHGUSQGLJO25CHLYZAZX3AVCNFSM6AAAAABEKLF7COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGEZDMMZTHA . You are receiving this because you were mentioned.Message ID: @.***>
Update, if I use --async things work ok... so something is causing minions to take very long time to respond.... but they all complete their jobs in the end.
In my case , on both 3006.7 and 3007 masters, I need to restart salt-master, just to get master process works.
[me@salt01 ~]$ sudo salt -N testminions test.ping
[ERROR ] Request client send timedout
Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.
[me@salt01 ~]$
We are also seeing TypeError: object NoneType can't be used in 'await' expression
in our minion logs after an upgrade from 3006.7 to 3007 on AlmaLinux 8.
Am also seeing this error with salt-minion 3007.0 on Debian 12.
Anyone has issue of salt-mater not responding to simple test.ping? my current workaround is restart salt-master and wait for say 60secs and * test.ping will work again(until it doesn't)
@dwoz , Thanks for taking on this issue. Do you need more reports to track down the root cause ?
Seems to me this is basically what's happening, see lines 473-487 of client.py:
>>> # This function seems to be a decorator for registering callbacks for receiving data
>>> def on_recv(callback):
... async def wrap_callback(messages):
... # decode messages
... await callback('decoded')
... return wrap_callback
...
>>> # The decorator is being used on a non-async function that returns None (instead of something awaitable)
>>> @on_recv
... def non_async_func(decoded):
... print('Non-async function called, returning None')
... return None
...
>>> # Data is received, and the callback is called
>>> import asyncio
>>> asyncio.run(non_async_func('messages'))
Non-async function called, returning None
Traceback (most recent call last):
File "<pyshell#22>", line 1, in <module>
asyncio.run(non_async_func('messages'))
File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "C:\Program Files\Python312\Lib\asyncio\base_events.py", line 684, in run_until_complete
return future.result()
File "<pyshell#18>", line 4, in wrap_callback
await callback('decoded')
TypeError: object NoneType can't be used in 'await' expression
Doing some debugging, I don't see on_recv
being used as a decorator, but a salt.minion.Minion().pub_channel
, which is a salt.channel.client.AsyncPubChannel
, has a on_recv()
method that is being called with the Minion's _handle_payload
method. The _handle_payload
method is not async and always returns None. The wrap_callback
function generated in AsyncPubChannel.on_recv()
tries to await the result of Minion._handle_payload
, which is None.
Would it be as simple as defining salt.minion.Minion._handle_payload
as async or wrapping it in an async function?
I'm not sure about others on this issue, but this issue seems to be a huge blocker. At first, we would get [No Response] and the minion log entries when running salt 'server01' state.apply
on the master and just run it again to get it to work. Now, just about every call from the master times out and we see the minion log entries as described above.
Doesn't seem like just a log entry but actually seems to be broken.
@KiyoIchikawa , I have to restart salt-master to get the master time out to respond again. Running salt commands again won't fix salt-master issue. Can you change to log level to debug ? I am doing this and hoping to get the error message in /var/log/salt/master log next time salt-master time out.
We just restarted our masters (we use the new load balancing feature) and things seem working again. We'll try changing our log level to catch any errors.
See my PR #66277 that does away with the errors (so far) for me.
Seeing similar issue. I upgraded both my master and minion to 3007 and even cleared all the .pyc files as mentioned previously.
See my PR #66277 that does away with the errors (so far) for me.
Thanks for this PR, I will try it out once my salt masters stable (without time out , not responding to salt commands issues).
We just restarted our masters (we use the new load balancing feature) and things seem working again. We'll try changing our log level to catch any errors.
Once I enable log level to debug, in /etc/salt/master, I saw my localfs path has some link files have no destinations. I removed all those bad links. so far , my salt-master is up for 20h straight without the need to be rebooted due to not responding.
Active: active (running) since Tue 2024-03-26 09:21:42 CDT; 20h ago
HI Guys, just throwing my hat into this, I found that on one of my minions, what was causing it to die anytime I tried to start it up, was a config file in my /etc/salt/minion.d folder. once I removed it, the minion was stable.
The culprit for me was /etc/salt/minion.d/_scheduled.conf with the contents:
schedule: __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2, minutes: 60, return_job: false, run_on_start: true}
I am concerned this issue may not be the root cause of minion's going offline. Can someone please test the changes from #66335 that has been experiencing minion outages? @tjyang @z900collector
@dwoz ,
AFAIK from my testing, "... await expression" in salt minion log doesn't impact salt-minion connection issue with salt-master.
I opened another issue to track my salt-master issue that not able to send out commands via 4505 publish port. when port 4505 malfunction after restart for a while , salt-minion still can pull down formulas. Pls see https://github.com/saltstack/salt/issues/66282 for details.
any news about that? a simple state.apply take ages (~5-10minutes), also the load grows to 1.0
this simple run tooks more than 10minutes
salt 'sympa*' state.apply
sympa.my.company.de:
Name: /etc/bash.bashrc - Function: file.append - Result: Changed - Started: 14:16:39.928980 - Duration: 744540.32 ms
Name: salt-minion - Function: service.running - Result: Changed - Started: 14:29:17.972168 - Duration: 326.765 ms
Name: create_host_to_zabbix - Function: zabbix_host.present - Result: Changed - Started: 14:29:18.319065 - Duration: 2267.021 ms
Name: /etc/apt/keyrings/GPG-KEY-elasticsearch - Function: file.managed - Result: Changed - Started: 14:29:31.059698 - Duration: 44.924 ms
Name: filebeat - Function: service.running - Result: Changed - Started: 14:29:35.289241 - Duration: 144.684 ms
Summary for sympa.my.company.de
-------------
Succeeded: 76 (changed=5)
Failed: 0
-------------
Total states run: 76
Total run time: 776.464 s
the debug output repeats always:
[DEBUG ] Using selector: EpollSelector
[DEBUG ] The functions from module 'jinja' are being loaded by dir() on the loaded module
[DEBUG ] LazyLoaded jinja.render
[DEBUG ] The functions from module 'yaml' are being loaded by dir() on the loaded module
[DEBUG ] LazyLoaded yaml.render
[INFO ] Starting a new job 20240415122238903939 with PID 126506
[DEBUG ] The functions from module 'direct_call' are being loaded by dir() on the loaded module
[DEBUG ] The functions from module 'splay' are being loaded by dir() on the loaded module
[DEBUG ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG ] The functions from module 'saltutil' are being loaded by dir() on the loaded module
[DEBUG ] LazyLoaded saltutil.find_job
[INFO ] Returning information for job: 20240415122238903939
[DEBUG ] Using selector: EpollSelector
[DEBUG ] Using selector: EpollSelector
[DEBUG ] Publisher connecting to /var/run/salt/minion/minion_event_de194ab817_pull.ipc
[DEBUG ] Minion of 'salt' is handling event tag '__master_req_channel_payload/salt'
[DEBUG ] Minion return retry timer set to 6 seconds (randomized)
[DEBUG ] Closing _TCPPubServerPublisher instance
[DEBUG ] minion return: {'success': True, 'return': {'pid': 126060, 'fun': 'state.apply', 'arg': [], 'tgt': 'sympa*', 'jid': '20240415121633003410', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_user'}, 'retcode': 0, 'jid': '20240415122238903939', 'fun': 'saltutil.find_job', 'fun_args': ['20240415121633003410'], 'user': 'sudo_user'}
[DEBUG ] Subprocess ProcessPayload(jid=20240415122238903939) cleaned up
[DEBUG ] PubChannel received: {'enc': 'aes', 'load': {'fun': 'saltutil.find_job', 'arg': ['20240415121633003410'], 'tgt': ['sympa.my.company.de'], 'jid': '20240415122248933074', 'ret': '', 'tgt_type': 'list', 'user': 'sudo_user'}, 'sig': b'\xab_\xee\xc7;Y\x0c:\xb8\xe4\xa7c.\xf6\xb9ic\x8e\x0ev\xfev\x14y4,\xfe\xd1\x07PBHLV\x069\x82\xacZ\xb3cv\x02t\xf9\xde\xf3\x81G[)m7\xbb\x1f\xd67\x11\xc1\xbe\xc7\xd7\xf4\xf0\x03\xc5\x1ea\xdf\xc0\xba\xd4\x9c(\x18\xcbsM\x9b?\xd0\x7foJ\xe5H\xbapVw\x16\xeb\xd4\xce8\xa2\x1b"\x1e\x04\xa3|Kjn]\xea\xcb5\'\x12\x8a\x88\xc0/\x8c\xd7H\x82\x0c\xe1\x1b_.$\xc3\xbf\xc6i\xad\xd7\xfa\x17\x11\x1f{vV\xf4\x96\x9d\xdd\x87\x84A\x1db\xbdZ\xb6\x84\xc7\xfb\xe0\x0fJ\x18r\x83b\x01\xd4\'\xcf)h\x1cK\xc3\x1c\xfb#>\x1a\xb4[d\xb5#\x83s\x12\xc6\x81\xe9\x95\xe4\xd5@l\x8a\x156} \x8d\xfb\xfe\x85f\xd9\n\x88$b\x82\x8d:\x02\x1f!d\x8f$\xa0\x14\x88\xe6KB\x12\x82\xe6\x1a\xc6\x83\x88\xcc\xf3\x83f\x03\x0b\x0er\x94q\xa9!\x99\x12\xab@+\x9c\xae\xf9N^4\x81\r\xb2\xa8wI'} <bound method Minion._handle_payload of <salt.minion.Minion object at 0x7f0e68b06cb0>>
[INFO ] User sudo_user Executing command saltutil.find_job with jid 20240415122248933074
[DEBUG ] Command details {'fun': 'saltutil.find_job', 'arg': ['20240415121633003410'], 'tgt': ['sympa.my.company.de'], 'jid': '20240415122248933074', 'ret': '', 'tgt_type': 'list', 'user': 'sudo_user'}
[DEBUG ] Subprocess ProcessPayload(jid=20240415122248933074) added
[ERROR ] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
Please tag this as a high priority bug.
same problem
[salt.transport.zeromq:396 ][ERROR ][13970] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
[salt.transport.zeromq:396 ][ERROR ][13970] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
Even a simple test.ping
is affected. Doing the ping from the master to an idle server on the same lan usually takes 0.6 seconds but every other ping takes 6 seconds or even hangs.
mika!2006# time salt 'cal*' test.ping
calvados:
True
real 0m5.786s
user 0m0.526s
sys 0m0.071s
mika!2006# time salt 'cal*' test.ping
calvados:
True
real 0m0.639s
user 0m0.500s
sys 0m0.049s
mika!2006# time salt 'cal*' test.ping
calvados:
True
real 0m5.697s
user 0m0.529s
sys 0m0.066s
mika!2006# time salt 'cal*' test.ping
calvados:
True
real 0m0.650s
user 0m0.500s
sys 0m0.061s
When the minion becomes unresponsive then stopping salt-minion on it takes very long and does not make a clean stop:
calvados!2039# systemctl stop salt-minion
calvados!2041# pgrep -a salt
76678 /usr/bin/python /usr/bin/salt-minion
79575 /usr/bin/python /usr/bin/salt-minion
So there is obviously something hanging.
calvados!2044# journalctl -fu salt-minion
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: State 'stop-sigterm' timed out. Killing.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Killing process 76675 (salt-minion) with signal SIGKILL.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Main process exited, code=killed, status=9/KILL
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Failed with result 'timeout'.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Unit process 76678 (salt-minion) remains running after unit stopped.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Unit process 79575 (salt-minion) remains running after unit stopped.
Apr 29 08:15:32 calvados systemd[1]: Stopped The Salt Minion.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Consumed 6.111s CPU time, 112.6M memory peak, 0B memory swap peak.
Apr 29 08:17:43 calvados salt-minion[76678]: Minion received a SIGTERM. Exiting.
Apr 29 08:17:43 calvados salt-minion[76678]: The Salt Minion is shutdown. Minion received a SIGTERM. Exited.
@StefanTT
I just want to provide my following test.ping case.
-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 15
# of minions returned: 15
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
real 0m5.792s user 0m1.072s sys 0m0.154s [me@salt01 ~]$
- This is on a fresh test server and minion lab, same subnet.
[me@rocky8t01 ~]$ time sudo salt -L rocky8t01,rocky8t02 test.ping rocky8t01: True rocky8t02: True
real 0m1.035s user 0m0.013s sys 0m0.012s [me@rocky8t01 ~]$
Sorry to be pushy but is there any eta on this defect? Having a "high" latency connection + this defect renders salt unusable. Had to downgrade to 3006 because of this, but on arch, python was raised to 3.12 two weeks ago and 3006 has issues with 3.12 so I'm kinda stuck in a one legged limbo right now. Would really appreciate it if this is tackled rather soon. That said, I'm happy with a patch or workaround as well in the meantime.
I believe this is fixed with #66335 being merged.
Things improved in 3007.1, but about 10% of minions still randomly have timeout issues.
Once I found out one can bring down 3007.x STS server by following command. I downgraded salt-master to LTS version which is much more stable andn less bug
for i in {1..30}; do openssl s_client -connect $1:4506 -tls1_2 </dev/null; sleep
.2; done
Upgraded 3006.7 to 3007 on both salt-master and a single salt-minion Salt minion log is recording the following errors constantly
/var/log/salt/minion 2024-03-07 15:53:49,411 [salt.transport.zeromq:396 ][ERROR ][2127705] Exception while running callback Traceback (most recent call last): File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume await callback(msg) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback await callback(decoded) TypeError: object NoneType can't be used in 'await' expression
/var/log/salt/master 2024-03-07 15:54:59,908 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault' 2024-03-07 15:54:59,979 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault' 2024-03-07 15:54:59,980 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault'
Initially spotted when doing a manual salt-call looked in /opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/ and spotted an old file after upgrade:
-rw-r--r-- 1 root root 9832 Mar 7 14:12 user.cpython-310.pyc -rw-r--r-- 1 root root 440 Mar 7 14:12 value.cpython-310.pyc -rw-r--r-- 1 root root 14016 Feb 26 09:08 vault.cpython-310.pyc -rw-r--r-- 1 root root 15913 Mar 7 14:12 verify.cpython-310.pyc -rw-r--r-- 1 root root 14925 Mar 7 14:12 versions.cpython-310.pyc
Salt Version: Salt: 3007.0
Python Version: Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]
Dependency Versions: cffi: 1.16.0 cherrypy: 18.8.0 dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: 0.15.1 smmap: Not Installed timelib: 0.3.0 Tornado: 6.3.3 ZMQ: 4.3.4
Salt Package Information: Package Type: onedir
System Versions: dist: oracle 8.9 locale: utf-8 machine: x86_64 release: 5.15.0-101.103.2.1.el8uek.x86_64 system: Linux version: Oracle Linux Server 8.9