saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
13.98k stars 5.47k forks source link

[BUG] TCP Publish Client encountered an exception while connecting to /var/run/salt/master/master_event_pub.ipc #66563

Open BeehiveSystems opened 1 month ago

BeehiveSystems commented 1 month ago

Description Improper startup followed by Python errors.

Setup New Rocky Linux 9 master also running a minion on itself for testing. FirewallD disabled, SELinux set to permissive.

Steps to Reproduce the behavior Install 3007 onedir version for RHEL 9 following the instructions here. https://docs.saltproject.io/salt/install-guide/en/latest/topics/install-by-operating-system/rhel.html#install-salt-on-redhat-rhel-9-x86-64

Expected behavior Normal startup of the salt-master.

Screenshots See log output below.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml Salt Version: Salt: 3007.0 Python Version: Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0] Dependency Versions: cffi: 1.16.0 cherrypy: unknown dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: 0.15.1 smmap: Not Installed timelib: 0.3.0 Tornado: 6.3.3 ZMQ: 4.3.4 Salt Package Information: Package Type: onedir System Versions: dist: rocky 9.4 Blue Onyx locale: utf-8 machine: x86_64 release: 5.14.0-427.16.1.el9_4.x86_64 system: Linux version: Rocky Linux 9.4 Blue Onyx ```

Additional context

2024-05-20 22:14:33,428 [salt.transport.tcp:311 ][WARNING ][24352] TCP Publish Client encountered an exception while connecting to /var/run/salt/master/master_event_pub.ipc: StreamClosedError('Stream is closed'), will reconnect in 1 seconds -   File "/usr/bin/salt-master", line 11, in <module>
    sys.exit(salt_master())

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/scripts.py", line 88, in salt_master
    master.start()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/daemons.py", line 224, in start
    self.master.start()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/master.py", line 823, in start
    self.process_manager.add_process(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 530, in add_process
    process.start()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 1099, in start
    super().start()

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
    return Popen(process_obj)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/popen_fork.py", line 71, in _launch
    code = process_obj._bootstrap(parent_sentinel=child_r)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 994, in wrapped_run_func
    return run_func()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/master.py", line 995, in run
    with salt.utils.event.get_master_event(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 150, in get_master_event
    return MasterEvent(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 928, in __init__
    super().__init__(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 265, in __init__
    self.connect_pub()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 348, in connect_pub
    self.subscriber = salt.transport.ipc_publish_client(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 210, in ipc_publish_client
    return publish_client(opts, io_loop, **kwargs)

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 152, in publish_client
    return salt.transport.tcp.PublishClient(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/tcp.py", line 219, in __init__
    super().__init__(opts, io_loop, **kwargs)

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 398, in __init__
    super().__init__()

2024-05-20 22:14:33,432 [salt.transport.tcp:1405][ERROR   ][24350] Publish server binding pub to /var/run/salt/master/master_event_pub.ipc ssl=None
welcome[bot] commented 1 month ago

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

curry684 commented 2 weeks ago

Running into the same issue on a years old installation. Any ideas how to fix?

BeehiveSystems commented 2 weeks ago

Running into the same issue on a years old installation. Any ideas how to fix?

What Salt master version?

I ended up using Salt from the Rocky repo (3005) instead of 3007 from the Salt repo.

curry684 commented 2 weeks ago

Mmm a reboot did some of the job. I found a bunch of reports elsewhere that 3007 changed the permission model to run more salt components in userland instead of as root, which more or less leaves /var/run/salt in a broken state until a full restart.

Minion still broken on the master though, looking into that now.

curry684 commented 2 weeks ago

Minion is still spamming tons of errors but resumed working on its own shortly after reboot:

2024-06-18 08:50:05,586 [salt.transport.zeromq:396 ][ERROR   ][776] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
BeehiveSystems commented 2 weeks ago

I can confirm this is still an issue on Ubuntu 22.04 and 3007.1. I originally submitted this bug under CentOS in my home environment but ran into the issue at work as well on a clean server.

I attempted fixing it with:

sudo rm -f /var/run/salt/master/master_event_pub.ipc sudo systemctl restart salt-master

But that did not work. Like you said, a full reboot brought the service back to a healthy state.

mshields-frtservices commented 2 weeks ago

I'm seeing the same issue. Both salt server and minion are on v3007.0 on Rocky Linux 8. Rebooting both didn't clear messages.