Open ecstuchi opened 1 year ago
This is still happening with 3006.
$ salt -V
Salt Version:
Salt: 3006.0
Python Version:
Python: 3.10.11 (main, Apr 14 2023, 05:57:16) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 5.4.1
PyZMQ: 23.2.0
relenv: 0.11.2
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: rhel 7.9 Maipo
locale: utf-8
machine: x86_64
release: 3.10.0-1160.88.1.el7.x86_64
system: Linux
version: Red Hat Enterprise Linux Server 7.9 Maipo
Tailing the log:
yielded = next(result)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 431, in handle_message
payload = self.decode_payload(payload)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 455, in decode_payload
payload = salt.payload.loads(payload[0])
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/payload.py", line 121, in loads
raise SaltDeserializationError(exc_msg) from exc
salt.exceptions.SaltDeserializationError: Could not deserialize msgpack message. See log for more info.
2023-05-02 12:53:42,369 [salt.client :1906][ERROR ][19750] Message timed out
2023-05-02 12:54:42,907 [salt.client :1906][ERROR ][20667] Message timed out
Do you need me collecting more data for this?
Same for us with 3006.1. It crashes every few days and breaks a ton of our automation all the time. I'm gonna have to schedule a cron job that looks at the /var/log/salt/master file to automatically restart the service when this happens.
Same for us with 3006.1. It crashes every few days and breaks a ton of our automation all the time. I'm gonna have to schedule a cron job that looks at the /var/log/salt/master file to automatically restart the service when this happens.
I had to do that. I've setup a cron job running like a watchdog every 5 minutes and it restarts the service when the issue happens. Then it also email my team when it happens. It's crazy how often it happens and it doesn't matter the amount of minions on the master.
This sounds related to https://github.com/saltstack/salt/issues/64061
I've been seeing this same issue with some of masters.
We are using the postgres jsonb returner and I found if i disable that everything suddenly starts working fine.
Description I have a big Salt infrastructure with individual Salt Masters ranging from 16 to 3000 minions, where they are also Syndics. All of them are facing the same problem, intermittently. There is a Master of Master on top of them. Apparently the message bus has issues and Salt Master becomes a "zombie", not working anymore. Restarting the service fixes the issue, but Salt is unreliable this way.
Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior I cannot reproduce it, but it happens with almost all masters intermittently.
` Expected behavior Salt Master being reliable and not having these issues anymore.
Screenshots If applicable, add screenshots to help explain your problem.
Versions Report
```yaml [root@saltmaster ~]# salt --versions-report Salt Version: Salt: 3005.1 Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: Not Installed docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 2.11.1 libgit2: Not Installed M2Crypto: 0.35.2 Mako: Not Installed msgpack: 0.6.2 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: Not Installed pygit2: Not Installed Python: 3.6.8 (default, Aug 13 2020, 07:46:32) python-gnupg: Not Installed PyYAML: 3.13 PyZMQ: 18.0.1 smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.1.4 System Versions: dist: rhel 7.9 Maipo locale: UTF-8 machine: x86_64 release: 3.10.0-1160.36.2.el7.x86_64 system: Linux version: Red Hat Enterprise Linux Server 7.9 Maipo ```
Additional context