saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
13.98k stars 5.47k forks source link

[BUG] [3007.0] Channels are closed while listened to #66568

Open AppCrashExpress opened 1 month ago

AppCrashExpress commented 1 month ago

Description

Hello!

We are currently trying to update Saltstack to 3007.0.

This issue was tested on the commit ID: 31c9d0df191009207c72ea73abfd3a1e3a0e6425

When using TCP transport to run something like salt-call grains.get fqdn, logs will contain two types of errors:

Setup This is shortened configuration, since we use patched custom installation and I'm not sure what I'm allowed to show:

master:
  - salt-master.example.org

ipv6: true
transport: tcp 
random_master: True

log_level: debug

Should you find it insufficient, please let me know.

Steps to Reproduce the behavior

  1. Install salt-master 3007 on remote
  2. Install salt-minion 3007 locally
  3. Run salt-call -l info grains.get fqdn on local machine

Running it you should see something akin to:

[ERROR   ] Caught exception in PubChannel connect callback AttributeError("'NoneType' object has no attribute 'fire_event'")
Traceback (most recent call last):
  File "/opt/salt/channel/client.py", line 542, in connect_callback
    self.event.fire_event({"master": self.opts["master"]}, "__master_connected")
    ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'fire_event'
...
local:
    salt-minion.example.org
Task was destroyed but it is pending!
task: <Task pending name='Task-7' coro=<RequestClient._stream_return() running at /opt/salt/transport/tcp.py:1761> wait_for=<Future finished exception=StreamClosedError('Stream is closed')>>

Expected behavior Closing channels and transports should properly wait on underlying streams before closing them fully.

Screenshots Probably inapplicable.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml Salt Version: Salt: 3007.0 Python Version: Python: 3.12.3 (main, May 13 2024, 10:19:24) [Clang 16.0.6 ] Dependency Versions: cffi: 1.16.0 cherrypy: Not Installed dateutil: 2.9.0.post0 docker-py: 7.0.0 gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.4 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: 0.38.0 Mako: 1.3.3 msgpack: 1.0.8 msgpack-pure: Not Installed mysql-python: 1.4.6 packaging: 21.3 pycparser: 2.22 pycrypto: Not Installed pycryptodome: Not Installed pygit2: Not Installed python-gnupg: 0.5.2 PyYAML: 5.4.1 PyZMQ: 25.1.2 relenv: Not Installed smmap: 5.0.1 timelib: 0.3.0 Tornado: 6.4 ZMQ: 4.1.2 Salt Package Information: Package Type: Not Installed System Versions: dist: ubuntu 22.04 jammy locale: utf-8 machine: x86_64 release: 5.15.0-30-generic system: Linux version: Ubuntu 22.04 jammy ```

Additional context

I was able to trace examples of code lines, where the errors happen.

For PubChannel, it happens in SMinion during master evaluation. Closing channel will both close underlying transport and destroy event queue as seen here: https://github.com/saltstack/salt/blob/v3007.0/salt/channel/client.py#L463-L470 However connect_callback and, by extension, self.event.fire_event still seem to run, causing None.fire_event

Somewhat same thing happens with SyncWrapper, where, as seen in RemotePillar, it both calls close method on AsyncReqChannel and destroys io_loop passed to it. However, as seen in error, underlying TCP transport fails to process StreamClosedError in time, leaving it as a Task in closed io_loop

Technically it also happens in TCP transport, but try catch wrapper simply catches StreamClosedError and puts it out as debug message.

AppCrashExpress commented 1 month ago

Also a seperate issue that doesn't really deserve a full write up (in my eyes), but TCP RequestClient needs to set self._closing = True to avoid TransportWarning: Unclosed transport! in logs