saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.11k stars 5.47k forks source link

[BUG] Problem with reactor: SaltReqTimeoutError: Message timed out #65438

Open corentin-dev opened 11 months ago

corentin-dev commented 11 months ago

Description

I have a master and a minion that are different computers (both run 3006.3, Alma 9 for the master, Fedora 38 for the minion). When the event is fired, it fails with a time out, but if I run the command manually, it does not fails.

Setup

I setup a reactor:

reactor:
  - 'salt/minion/*/start':
    - /srv/reactor/start.sls

and my /src/reactor/start.tls:

{% set minion_id = data['id'] %}
highstate-for-{{ minion_id }}:
  local.state.apply:
    - tgt: {{ minion_id }}

When my minion fires a start event, it seems to correctly start the reactor but fails with a Salt request timed out. I tried to disable firewall with no success. I tried with a simpler reactor, but no luck.

The command salt 'minion-name' state.apply works in 0.1s... I have no idea why it is not working...

I made another test, in order to be sure that it was not coming from my state file:

reactor:
  - 'sayhello':
    - /srv/reactor/sayhello.sls
sayhello:
  local.cmd.run:
    - tgt: pc-corentin-test
    - arg:
      - echo hello > /tmp/hello

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior

Maybe setup a master + minion, salt 3006.3, and a reactor to run a local function.

Expected behavior

I expected my highstate to run, instead it fails with a time out.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml Salt Version: Salt: 3006.0rc3 Python Version: Python: 3.10.10 (main, Mar 28 2023, 22:53:52) [GCC 11.2.0] Dependency Versions: cffi: 1.14.6 cherrypy: unknown dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.2 libgit2: Not Installed looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.9.8 pygit2: Not Installed python-gnupg: 0.4.8 PyYAML: 5.4.1 PyZMQ: 23.2.0 relenv: 0.10.1 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: almalinux 9.2 Turquoise Kodkod locale: utf-8 machine: x86_64 release: 5.14.0-284.30.1.el9_2.x86_64 system: Linux version: AlmaLinux 9.2 Turquoise Kodkod ``` ```yaml Salt Version: Salt: 3006.3 Python Version: Python: 3.11.6 (main, Oct 3 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.0.3 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.4 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.0 pycparser: Not Installed pycrypto: Not Installed pycryptodome: 3.19.0 pygit2: Not Installed python-gnupg: Not Installed PyYAML: 6.0 PyZMQ: 24.0.1 relenv: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: fedora 38 locale: utf-8 machine: x86_64 release: 6.5.6-200.fc38.x86_64 system: Linux version: Fedora Linux 38 ```

Additional context

[INFO    ] Got return from pc-corentin-test for job 20231020131817079186
[ERROR   ] Message timed out
[ERROR   ] Reactor 'sayhello' failed to execute local 'cmd.run'
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/__init__.py", line 1904, in pub
    payload = channel.send(payload_kwargs, timeout=timeout)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 125, in wrap
    raise exc_info[1].with_traceback(exc_info[2])
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 131, in _target
    result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs))
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync
    return future_cell[0].result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 292, in send
    ret = yield self._uncrypted_transfer(load, timeout=timeout)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 267, in _uncrypted_transfer
    ret = yield self.transport.send(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 915, in send
    ret = yield self.message_client.send(load, timeout=timeout)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 625, in send
    recv = yield future
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
salt.exceptions.SaltReqTimeoutError: Message timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/__init__.py", line 387, in run_job
    pub_data = self.pub(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/__init__.py", line 1907, in pub
    raise SaltReqTimeoutError(
salt.exceptions.SaltReqTimeoutError: Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/reactor.py", line 436, in run
    ret = l_fun(*args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/reactor.py", line 476, in local
    self.client_cache["local"].cmd_async(tgt, fun, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/__init__.py", line 494, in cmd_async
    pub_data = self.run_job(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/__init__.py", line 409, in run_job
    raise SaltClientError(general_exception)
salt.exceptions.SaltClientError: Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.
welcome[bot] commented 11 months ago

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

corentin-dev commented 11 months ago

Stupid question, I reinstalled using bootstrap, did not configure anything on the master and now it is working. I look at the /etc/salt/master and everything is set to default (nothing is uncommented). Meaning that interface is 0.0.0.0

I had set up the interface to be my ip address, but could that be the problem?