saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.13k stars 5.47k forks source link

installed 3007 TypeError: object NoneType can't be used in 'await' expression #66177

Open z900collector opened 7 months ago

z900collector commented 7 months ago

Upgraded 3006.7 to 3007 on both salt-master and a single salt-minion Salt minion log is recording the following errors constantly

/var/log/salt/minion 2024-03-07 15:53:49,411 [salt.transport.zeromq:396 ][ERROR ][2127705] Exception while running callback Traceback (most recent call last): File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume await callback(msg) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback await callback(decoded) TypeError: object NoneType can't be used in 'await' expression

/var/log/salt/master 2024-03-07 15:54:59,908 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault' 2024-03-07 15:54:59,979 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault' 2024-03-07 15:54:59,980 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault'

Initially spotted when doing a manual salt-call looked in /opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/ and spotted an old file after upgrade:

-rw-r--r-- 1 root root 9832 Mar 7 14:12 user.cpython-310.pyc -rw-r--r-- 1 root root 440 Mar 7 14:12 value.cpython-310.pyc -rw-r--r-- 1 root root 14016 Feb 26 09:08 vault.cpython-310.pyc -rw-r--r-- 1 root root 15913 Mar 7 14:12 verify.cpython-310.pyc -rw-r--r-- 1 root root 14925 Mar 7 14:12 versions.cpython-310.pyc

Salt Version: Salt: 3007.0

Python Version: Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]

Dependency Versions: cffi: 1.16.0 cherrypy: 18.8.0 dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: 0.15.1 smmap: Not Installed timelib: 0.3.0 Tornado: 6.3.3 ZMQ: 4.3.4

Salt Package Information: Package Type: onedir

System Versions: dist: oracle 8.9 locale: utf-8 machine: x86_64 release: 5.15.0-101.103.2.1.el8uek.x86_64 system: Linux version: Oracle Linux Server 8.9

doesitblend commented 7 months ago

Can you try stopping your Salt master and minion, then run find /opt/saltstack/salt -type f -iname "*.pyc" -print -delete?Then start them back up. I believe that will resolve this issue.

tjyang commented 7 months ago

@doesitblend , thanks for the oneliner script to cleanup the .pyc. I am still getting this error message from minion startup. Any suggestion ?

salt-minion[1567838]: [ERROR ] Publish server binding pub to /var/run/salt/minion/minion_event_0f4efbad65_pub.ipc ssl=None

z900collector commented 7 months ago

Tried the one liner and now the minion reports:

2024-03-07 22:32:33,860 [salt.transport.zeromq:396 ][ERROR ][1756164] Exception while running callback Traceback (most recent call last): File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume await callback(msg) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback await callback(decoded) TypeError: object NoneType can't be used in 'await' expression

The master reports (lots and lots of these): 2024-03-07 22:23:13,633 [salt.transport.tcp:311 ][WARNING ][1699260] TCP Publish Client encountered an exception while connecting to /var/run/salt/master/master_event_pub.ipc: StreamClosedError('Stream is closed'), will reconnect in 1 seconds - File "/bin/salt-run", line 11, in sys.exit(salt_run())

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/scripts.py", line 456, in salt_run client.run()

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/run.py", line 34, in run ret = runner.run(full_return=True)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/runner.py", line 303, in run ret = self._proc_function(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/mixins.py", line 546, in _proc_function return instance.low(fun, low, full_return=full_return)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/mixins.py", line 383, in low data["return"] = func(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in call ret = self.loader.run(run_func, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1233, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1248, in _run_as return _func_or_method(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/runners/state.py", line 310, in event return statemod["state.event"](

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in call ret = self.loader.run(run_func, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1233, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1248, in _run_as return _func_or_method(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/state.py", line 2576, in event with salt.utils.event.get_event(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 127, in get_event return MasterEvent(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 928, in init super().init(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 265, in init self.connect_pub()

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 323, in connect_pub self.subscriber = salt.utils.asynchronous.SyncWrapper(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 77, in init self.obj = cls(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 210, in ipc_publish_client return publish_client(opts, io_loop, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 152, in publish_client return salt.transport.tcp.PublishClient(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/tcp.py", line 219, in init super().init(opts, io_loop, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 398, in init super().init()

tjyang commented 7 months ago

@z900collector , the one-liner worked for me to remove *.pyc files and I don't have your message above. I only have following error message from "systemctl status salt-minion" per another ticket https://github.com/saltstack/salt/issues/66179

salt-minion[1567838]: [ERROR   ] Publish server binding pub to /var/run/salt/minion/minion_event_0f4efbad65_pub.ipc ssl=None
doc75 commented 6 months ago

After running the oneliner on both master and minion, it is working again. Nevertheless I can still see a lot of occurence like this one on minion side:

2024-03-10 19:22:50,833 [salt.transport.zeromq:396 ][ERROR   ][32857] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression

But it does not prevent the minion from returning properly.

ZsBT commented 6 months ago

I'm facing the same issue, after deleting .pyc files the minions get online. However these error lines still flood the minion logs.

tjyang commented 6 months ago

rocky9t01a test VM upgraded from 3006.6 to 3007.0. I thought next version 3006.8 will be latest one with this bug fix but latest one is 3007 without this bug fix.

[me@rocky9t01 salt]$ sudo salt -L rocky9t01 cmd.run "/usr/bin/tail -10 /var/log/salt/minion"
rocky9t01:
    2024-04-09 16:40:43,823 [salt.loaded.int.module.cmdmod:1347][ERROR   ][571240] Command 'systemctl' failed with return code: 1
    2024-04-09 16:40:43,823 [salt.loaded.int.module.cmdmod:1352][ERROR   ][571240] output: Job for salt-minion.service canceled.
    2024-04-09 16:40:44,840 [salt.transport.tcp:1405][ERROR   ][571328] Publish server binding pub to /var/run/salt/minion/minion_event_69362de666_pub.ipc ssl=None
    2024-04-09 16:40:49,491 [salt.transport.zeromq:396 ][ERROR   ][571328] Exception while running callback
    Traceback (most recent call last):
      File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
        await callback(msg)
      File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
        await callback(decoded)
    TypeError: object NoneType can't be used in 'await' expression
[me@rocky9t01 salt]$ 
[me@rocky9t01 salt]$ sudo salt-minion -V Salt Version: Salt: 3007.0 Python Version: Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0] Dependency Versions: cffi: 1.16.0 cherrypy: 18.8.0 dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: 0.15.1 smmap: Not Installed timelib: 0.3.0 Tornado: 6.3.3 ZMQ: 4.3.4 Salt Package Information: Package Type: onedir System Versions: dist: rocky 9.3 Blue Onyx locale: utf-8 machine: x86_64 release: 5.14.0-362.18.1.el9_3.0.1.x86_64 system: Linux version: Rocky Linux 9.3 Blue Onyx [me@rocky9t01 salt]$
SwimGeek commented 6 months ago

find /opt/saltstack/salt -type f -iname "*.pyc" -print -delete

This worked, but only for two days, errors returned on three minions so far.

aloosnetmatch commented 6 months ago

Hi all.

I have the same issues with saltminion 3007.0-0

2024-03-12 10:27:41,901 [salt.transport.tcp:1410][ERROR ][16652] Publish server binding pub to 127.0.0.1:4510 ssl=None

Both on Linux as on windows

SwimGeek commented 6 months ago

Hi

I see there are comments that this is a 'log flood' issue. I think it's a bit more than that - my minions disconnect from the master.

tjyang commented 6 months ago

This issue has been opened for two weeks and it is still not gaining enough attention to be classified as bug.

SwimGeek commented 6 months ago

Hi, my minions seem to become slow to respond and commands seem to have timeouts.

When I delete the .pyc files all is fast and happy. A few days later things go slow and time out again.

Log when minion does not respond, doing basic test.ping:

2024-03-19 14:58:45,810 [salt.transport.zeromq:396 ][ERROR   ][817] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
2024-03-19 14:58:55,939 [salt.minion      :2763][ERROR   ][817] Timeout encountered while sending {'cmd': '_return', 'id': 'cpt-ter-blahblah1.atomic.ac', 'success': True, 'return': True, 'retcode': 0, 'jid': '20240319125840713889', 'fun': 'test.ping', 'fun_args': [], 'user': 'root', '_stamp': '2024-03-19T12:58:40.915616', 'nonce': '39f602421d8c4c9bb844bca46a7a8f83'} request
z900collector commented 6 months ago

I've reinstall 3006.7 on all my minions but kept the Salt Master at 3007.0 until this is fixed.

Sid

On Tue, 19 Mar 2024, 23:05 SwimGeek, @.***> wrote:

Hi, my minions seem to become slow to respond and commands seem to have timeouts.

When I delete the .pyc files all is fast and happy. A few days later things go slow and time out again.

Log when minion does not respond, doing basic test.ping:

2024-03-19 14:58:45,810 [salt.transport.zeromq:396 ][ERROR ][817] Exception while running callback Traceback (most recent call last): File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume await callback(msg) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback await callback(decoded) TypeError: object NoneType can't be used in 'await' expression 2024-03-19 14:58:55,939 [salt.minion :2763][ERROR ][817] Timeout encountered while sending {'cmd': '_return', 'id': 'cpt-ter-blahblah1.atomic.ac', 'success': True, 'return': True, 'retcode': 0, 'jid': '20240319125840713889', 'fun': 'test.ping', 'fun_args': [], 'user': 'root', '_stamp': '2024-03-19T12:58:40.915616', 'nonce': '39f602421d8c4c9bb844bca46a7a8f83'} request

— Reply to this email directly, view it on GitHub https://github.com/saltstack/salt/issues/66177#issuecomment-2007126338, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLBVGBWHGUSQGLJO25CHLYZAZX3AVCNFSM6AAAAABEKLF7COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGEZDMMZTHA . You are receiving this because you were mentioned.Message ID: @.***>

SwimGeek commented 6 months ago

Update, if I use --async things work ok... so something is causing minions to take very long time to respond.... but they all complete their jobs in the end.

tjyang commented 6 months ago

In my case , on both 3006.7 and 3007 masters, I need to restart salt-master, just to get master process works.

[me@salt01 ~]$ sudo salt -N testminions test.ping
[ERROR   ] Request client send timedout
Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.
[me@salt01 ~]$
themowski commented 6 months ago

We are also seeing TypeError: object NoneType can't be used in 'await' expression in our minion logs after an upgrade from 3006.7 to 3007 on AlmaLinux 8.

merlinz01 commented 6 months ago

Am also seeing this error with salt-minion 3007.0 on Debian 12.

tjyang commented 6 months ago

Anyone has issue of salt-mater not responding to simple test.ping? my current workaround is restart salt-master and wait for say 60secs and * test.ping will work again(until it doesn't)

tjyang commented 6 months ago

@dwoz , Thanks for taking on this issue. Do you need more reports to track down the root cause ?

merlinz01 commented 6 months ago

Seems to me this is basically what's happening, see lines 473-487 of client.py:

>>> # This function seems to be a decorator for registering callbacks for receiving data
>>> def on_recv(callback):
...     async def wrap_callback(messages):
...         # decode messages
...         await callback('decoded')
...     return wrap_callback
... 
>>> # The decorator is being used on a non-async function that returns None (instead of something awaitable)
>>> @on_recv
... def non_async_func(decoded):
...     print('Non-async function called, returning None')
...     return None
... 
>>> # Data is received, and the callback is called
>>> import asyncio
>>> asyncio.run(non_async_func('messages'))
Non-async function called, returning None
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    asyncio.run(non_async_func('messages'))
  File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
  File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "C:\Program Files\Python312\Lib\asyncio\base_events.py", line 684, in run_until_complete
    return future.result()
  File "<pyshell#18>", line 4, in wrap_callback
    await callback('decoded')
TypeError: object NoneType can't be used in 'await' expression
merlinz01 commented 6 months ago

Doing some debugging, I don't see on_recv being used as a decorator, but a salt.minion.Minion().pub_channel, which is a salt.channel.client.AsyncPubChannel, has a on_recv() method that is being called with the Minion's _handle_payload method. The _handle_payload method is not async and always returns None. The wrap_callback function generated in AsyncPubChannel.on_recv() tries to await the result of Minion._handle_payload, which is None.

Would it be as simple as defining salt.minion.Minion._handle_payload as async or wrapping it in an async function?

KiyoIchikawa commented 6 months ago

I'm not sure about others on this issue, but this issue seems to be a huge blocker. At first, we would get [No Response] and the minion log entries when running salt 'server01' state.apply on the master and just run it again to get it to work. Now, just about every call from the master times out and we see the minion log entries as described above.

Doesn't seem like just a log entry but actually seems to be broken.

tjyang commented 6 months ago

@KiyoIchikawa , I have to restart salt-master to get the master time out to respond again. Running salt commands again won't fix salt-master issue. Can you change to log level to debug ? I am doing this and hoping to get the error message in /var/log/salt/master log next time salt-master time out.

KiyoIchikawa commented 6 months ago

We just restarted our masters (we use the new load balancing feature) and things seem working again. We'll try changing our log level to catch any errors.

merlinz01 commented 6 months ago

See my PR #66277 that does away with the errors (so far) for me.

firdous22 commented 6 months ago

Seeing similar issue. I upgraded both my master and minion to 3007 and even cleared all the .pyc files as mentioned previously.

tjyang commented 6 months ago

See my PR #66277 that does away with the errors (so far) for me.

Thanks for this PR, I will try it out once my salt masters stable (without time out , not responding to salt commands issues).

tjyang commented 6 months ago

We just restarted our masters (we use the new load balancing feature) and things seem working again. We'll try changing our log level to catch any errors.

Once I enable log level to debug, in /etc/salt/master, I saw my localfs path has some link files have no destinations. I removed all those bad links. so far , my salt-master is up for 20h straight without the need to be rebooted due to not responding.

sgbell commented 6 months ago

HI Guys, just throwing my hat into this, I found that on one of my minions, what was causing it to die anytime I tried to start it up, was a config file in my /etc/salt/minion.d folder. once I removed it, the minion was stable. The culprit for me was /etc/salt/minion.d/_scheduled.conf with the contents: schedule: __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2, minutes: 60, return_job: false, run_on_start: true}

dwoz commented 5 months ago

I am concerned this issue may not be the root cause of minion's going offline. Can someone please test the changes from #66335 that has been experiencing minion outages? @tjyang @z900collector

tjyang commented 5 months ago

@dwoz ,

AFAIK from my testing, "... await expression" in salt minion log doesn't impact salt-minion connection issue with salt-master.

I opened another issue to track my salt-master issue that not able to send out commands via 4505 publish port. when port 4505 malfunction after restart for a while , salt-minion still can pull down formulas. Pls see https://github.com/saltstack/salt/issues/66282 for details.

colttt commented 5 months ago

any news about that? a simple state.apply take ages (~5-10minutes), also the load grows to 1.0

this simple run tooks more than 10minutes

salt 'sympa*' state.apply
sympa.my.company.de:
  Name: /etc/bash.bashrc - Function: file.append - Result: Changed - Started: 14:16:39.928980 - Duration: 744540.32 ms
  Name: salt-minion - Function: service.running - Result: Changed - Started: 14:29:17.972168 - Duration: 326.765 ms
  Name: create_host_to_zabbix - Function: zabbix_host.present - Result: Changed - Started: 14:29:18.319065 - Duration: 2267.021 ms
  Name: /etc/apt/keyrings/GPG-KEY-elasticsearch - Function: file.managed - Result: Changed - Started: 14:29:31.059698 - Duration: 44.924 ms
  Name: filebeat - Function: service.running - Result: Changed - Started: 14:29:35.289241 - Duration: 144.684 ms

Summary for sympa.my.company.de
-------------
Succeeded: 76 (changed=5)
Failed:     0
-------------
Total states run:     76
Total run time:  776.464 s

the debug output repeats always:

[DEBUG   ] Using selector: EpollSelector
[DEBUG   ] The functions from module 'jinja' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] The functions from module 'yaml' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded yaml.render
[INFO    ] Starting a new job 20240415122238903939 with PID 126506
[DEBUG   ] The functions from module 'direct_call' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'splay' are being loaded by dir() on the loaded module
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] The functions from module 'saltutil' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded saltutil.find_job
[INFO    ] Returning information for job: 20240415122238903939
[DEBUG   ] Using selector: EpollSelector
[DEBUG   ] Using selector: EpollSelector
[DEBUG   ] Publisher connecting to /var/run/salt/minion/minion_event_de194ab817_pull.ipc
[DEBUG   ] Minion of 'salt' is handling event tag '__master_req_channel_payload/salt'
[DEBUG   ] Minion return retry timer set to 6 seconds (randomized)
[DEBUG   ] Closing _TCPPubServerPublisher instance
[DEBUG   ] minion return: {'success': True, 'return': {'pid': 126060, 'fun': 'state.apply', 'arg': [], 'tgt': 'sympa*', 'jid': '20240415121633003410', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_user'}, 'retcode': 0, 'jid': '20240415122238903939', 'fun': 'saltutil.find_job', 'fun_args': ['20240415121633003410'], 'user': 'sudo_user'}
[DEBUG   ] Subprocess ProcessPayload(jid=20240415122238903939) cleaned up
[DEBUG   ] PubChannel received: {'enc': 'aes', 'load': {'fun': 'saltutil.find_job', 'arg': ['20240415121633003410'], 'tgt': ['sympa.my.company.de'], 'jid': '20240415122248933074', 'ret': '', 'tgt_type': 'list', 'user': 'sudo_user'}, 'sig': b'\xab_\xee\xc7;Y\x0c:\xb8\xe4\xa7c.\xf6\xb9ic\x8e\x0ev\xfev\x14y4,\xfe\xd1\x07PBHLV\x069\x82\xacZ\xb3cv\x02t\xf9\xde\xf3\x81G[)m7\xbb\x1f\xd67\x11\xc1\xbe\xc7\xd7\xf4\xf0\x03\xc5\x1ea\xdf\xc0\xba\xd4\x9c(\x18\xcbsM\x9b?\xd0\x7foJ\xe5H\xbapVw\x16\xeb\xd4\xce8\xa2\x1b"\x1e\x04\xa3|Kjn]\xea\xcb5\'\x12\x8a\x88\xc0/\x8c\xd7H\x82\x0c\xe1\x1b_.$\xc3\xbf\xc6i\xad\xd7\xfa\x17\x11\x1f{vV\xf4\x96\x9d\xdd\x87\x84A\x1db\xbdZ\xb6\x84\xc7\xfb\xe0\x0fJ\x18r\x83b\x01\xd4\'\xcf)h\x1cK\xc3\x1c\xfb#>\x1a\xb4[d\xb5#\x83s\x12\xc6\x81\xe9\x95\xe4\xd5@l\x8a\x156} \x8d\xfb\xfe\x85f\xd9\n\x88$b\x82\x8d:\x02\x1f!d\x8f$\xa0\x14\x88\xe6KB\x12\x82\xe6\x1a\xc6\x83\x88\xcc\xf3\x83f\x03\x0b\x0er\x94q\xa9!\x99\x12\xab@+\x9c\xae\xf9N^4\x81\r\xb2\xa8wI'} <bound method Minion._handle_payload of <salt.minion.Minion object at 0x7f0e68b06cb0>>
[INFO    ] User sudo_user Executing command saltutil.find_job with jid 20240415122248933074
[DEBUG   ] Command details {'fun': 'saltutil.find_job', 'arg': ['20240415121633003410'], 'tgt': ['sympa.my.company.de'], 'jid': '20240415122248933074', 'ret': '', 'tgt_type': 'list', 'user': 'sudo_user'}
[DEBUG   ] Subprocess ProcessPayload(jid=20240415122248933074) added
[ERROR   ] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
tuxthepenguin84 commented 5 months ago

Please tag this as a high priority bug.

rimskij commented 5 months ago

same problem

[salt.transport.zeromq:396 ][ERROR   ][13970] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
 [salt.transport.zeromq:396 ][ERROR   ][13970] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
StefanTT commented 5 months ago

Even a simple test.ping is affected. Doing the ping from the master to an idle server on the same lan usually takes 0.6 seconds but every other ping takes 6 seconds or even hangs.

mika!2006# time salt 'cal*' test.ping
calvados:
    True

real    0m5.786s
user    0m0.526s
sys 0m0.071s
mika!2006# time salt 'cal*' test.ping
calvados:
    True

real    0m0.639s
user    0m0.500s
sys 0m0.049s
mika!2006# time salt 'cal*' test.ping
calvados:
    True

real    0m5.697s
user    0m0.529s
sys 0m0.066s
mika!2006# time salt 'cal*' test.ping
calvados:
    True

real    0m0.650s
user    0m0.500s
sys 0m0.061s

When the minion becomes unresponsive then stopping salt-minion on it takes very long and does not make a clean stop:

calvados!2039# systemctl stop salt-minion
calvados!2041# pgrep -a salt
76678 /usr/bin/python /usr/bin/salt-minion
79575 /usr/bin/python /usr/bin/salt-minion

So there is obviously something hanging.

calvados!2044# journalctl -fu salt-minion
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: State 'stop-sigterm' timed out. Killing.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Killing process 76675 (salt-minion) with signal SIGKILL.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Main process exited, code=killed, status=9/KILL
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Failed with result 'timeout'.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Unit process 76678 (salt-minion) remains running after unit stopped.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Unit process 79575 (salt-minion) remains running after unit stopped.
Apr 29 08:15:32 calvados systemd[1]: Stopped The Salt Minion.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Consumed 6.111s CPU time, 112.6M memory peak, 0B memory swap peak.
Apr 29 08:17:43 calvados salt-minion[76678]: Minion received a SIGTERM. Exiting.
Apr 29 08:17:43 calvados salt-minion[76678]: The Salt Minion is shutdown. Minion received a SIGTERM. Exited.
tjyang commented 5 months ago

@StefanTT

I just want to provide my following test.ping case.

real 0m5.792s user 0m1.072s sys 0m0.154s [me@salt01 ~]$

- This is on a fresh test server and minion lab, same subnet.

[me@rocky8t01 ~]$ time sudo salt -L rocky8t01,rocky8t02 test.ping rocky8t01: True rocky8t02: True

real 0m1.035s user 0m0.013s sys 0m0.012s [me@rocky8t01 ~]$

Dr-Bone commented 4 months ago

Sorry to be pushy but is there any eta on this defect? Having a "high" latency connection + this defect renders salt unusable. Had to downgrade to 3006 because of this, but on arch, python was raised to 3.12 two weeks ago and 3006 has issues with 3.12 so I'm kinda stuck in a one legged limbo right now. Would really appreciate it if this is tackled rather soon. That said, I'm happy with a patch or workaround as well in the meantime.

merlinz01 commented 4 months ago

I believe this is fixed with #66335 being merged.

SwimGeek commented 3 months ago

Things improved in 3007.1, but about 10% of minions still randomly have timeout issues.

tjyang commented 3 months ago

Once I found out one can bring down 3007.x STS server by following command. I downgraded salt-master to LTS version which is much more stable andn less bug

for i in {1..30}; do openssl s_client -connect $1:4506 -tls1_2 </dev/null; sleep
 .2; done