saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
13.98k stars 5.47k forks source link

[BUG] Regression in Salt 3007: Minion fails to start on Windows when using IPv6 #66603

Open smarsching opened 1 month ago

smarsching commented 1 month ago

Description There is a regresion in Salt 3007 (both 3007.0 and 3007.1) that causes the minion to not start correctly when IPv6 is enabled (ipv6: true is set in the options).

Setup

Steps to Reproduce the behavior

Add a configuration file C:\ProgramData\Salt Project\Salt\conf\minion.d\minion-custom-config.conf with the following line:

ipv6: true

Then restart the Salt minion. Observe the following error message in the minion log (and that the minion is not reachable from the master):

2024-05-29 19:38:17,810 [tornado.application:758 ][ERROR   ][6672] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x000002B17F0291E0>>, <Task finished name='Task-1' coro=<PublishServer.publisher() done, defined at C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\tcp.py:1385> exception=gaierror(11001, 'getaddrinfo failed')>)
Traceback (most recent call last):
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\ioloop.py", line 738, in _run_callback
    ret = callback()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\tornado\ioloop.py", line 762, in _discard_future_result
    future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\tcp.py", line 1421, in publisher
    sock.bind((self.pub_host, self.pub_port))
socket.gaierror: [Errno 11001] getaddrinfo failed
2024-05-29 19:38:17,857 [salt.transport.tcp:312 ][WARNING ][6672] TCP Publish Client encountered an exception while connecting to 127.0.0.1:4510: StreamClosedError('Stream is closed'), will reconnect in 1 seconds -   File "C:\Program Files\Salt Project\Salt\Lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,

  File "C:\Program Files\Salt Project\Salt\Lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)

  File "C:\Program Files\Salt Project\Salt\salt-minion.exe\__main__.py", line 7, in <module>
    sys.exit(salt_minion())

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\scripts.py", line 185, in salt_minion
    minion.start()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\cli\daemons.py", line 344, in start
    self._real_start()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\cli\daemons.py", line 356, in _real_start
    self.minion.tune_in()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1193, in tune_in
    self._bind()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1064, in _bind
    self.event = salt.utils.event.get_event(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 135, in get_event
    return SaltEvent(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 265, in __init__
    self.connect_pub()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 348, in connect_pub
    self.subscriber = salt.transport.ipc_publish_client(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 210, in ipc_publish_client
    return publish_client(opts, io_loop, **kwargs)

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 152, in publish_client
    return salt.transport.tcp.PublishClient(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\tcp.py", line 220, in __init__
    super().__init__(opts, io_loop, **kwargs)

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 398, in __init__
    super().__init__()

2024-05-29 19:38:20,004 [salt.transport.tcp:312 ][WARNING ][6672] TCP Publish Client encountered an exception while connecting to 127.0.0.1:4510: StreamClosedError('Stream is closed'), will reconnect in 1 seconds -   File "C:\Program Files\Salt Project\Salt\Lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,

  File "C:\Program Files\Salt Project\Salt\Lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)

  File "C:\Program Files\Salt Project\Salt\salt-minion.exe\__main__.py", line 7, in <module>
    sys.exit(salt_minion())

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\scripts.py", line 185, in salt_minion
    minion.start()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\cli\daemons.py", line 344, in start
    self._real_start()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\cli\daemons.py", line 356, in _real_start
    self.minion.tune_in()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1193, in tune_in
    self._bind()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1064, in _bind
    self.event = salt.utils.event.get_event(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 135, in get_event
    return SaltEvent(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 265, in __init__
    self.connect_pub()

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 348, in connect_pub
    self.subscriber = salt.transport.ipc_publish_client(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 210, in ipc_publish_client
    return publish_client(opts, io_loop, **kwargs)

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 152, in publish_client
    return salt.transport.tcp.PublishClient(

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\tcp.py", line 220, in __init__
    super().__init__(opts, io_loop, **kwargs)

  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 398, in __init__
    super().__init__()

Expected behavior The minion should start without error, like it does when using version 3006.8 with the same configuration.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml Salt Version: Salt: 3007.1 Python Version: Python: 3.10.14 (heads/main:c1ec015, Apr 3 2024, 21:36:37) [MSC v.1938 64 bit (AMD64)] Dependency Versions: cffi: 1.16.0 cherrypy: 18.8.0 dateutil: 2.8.2 docker-py: Not Installed gitdb: 4.0.10 gitpython: Not Installed Jinja2: 3.1.4 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: 0.16.0 smmap: 5.0.1 timelib: 0.3.0 Tornado: 6.3.3 ZMQ: 4.3.4 Salt Package Information: Package Type: onedir System Versions: dist: locale: utf-8 machine: AMD64 release: 2022Server system: Windows version: 2022Server 10.0.20348 SP0 Multiprocessor Free ```

Additional context By adding some debugging code, I found out that the problem is caused by trying to bind a socket using the AF_INET6 address family to the IP address 127.0.0.1. Windows does not allow binding IPv6 socket to IPv4 addresses.

@dwoz, @garethgreenaway I suspect that this bug was introduced in 6320f769ea8, which you authored:

Before, the IPC publisher was created in salt.minion.MinionManager._bind() by calling salt.utils.event.AsyncEventPublisher(), which delegated to salt.transport.ipc.IPCMessagePublisher.

Now, it is created by calling salt.transport.ipc_publish_server(), which indirectly delegates to salt.transport.tcp.PublishServer.

The code in salt.transport.ipc.IPCMessagePublisher always creates a socket using AF_INET, while the code in salt.transport.tcp.PublishServer uses AF_INET6 when opts['ipv6'] is True. On Windows, however, a socket created with AF_INET6 cannot be bound to an IPv4 address.

I can see three possible fixes for this:

  1. Revert the code in salt.minion.MinonManager._bind() back to using salt.utils.event.AsyncEventPublisher().
  2. Change the code in salt.transport.tcp.PublishServer to accept an additional flag that enforces AF_INET, even if opts['ipv6'] is set, and pass this flag from salt.transport.base.ipc_publish_server().
  3. Change the code in ipc_publish_server() to use ::1 instead of 127.0.0.1 when opts['ipv6'] is set. However, it might be necessary to also change this in other places (where the client socket that connects to this server is created).

The first fix looks like the simplest one, but I assume that you did this refactoring for a good reason, so it might not be desirable.

The second one makes the code somewhat more complex, while the third one is pretty straight forward but might necessitate changes in other places.

As I don not know the reasoning behind the refactoring, I cannot assess which of the three options is the most reasonable one, so I would appreciate your input.

dwoz commented 1 month ago

duplicate of #66567?

smarsching commented 1 month ago

@dwoz That bug might be related, but I do not think that it is the same bug.

66567 seems to happen when explicitly setting interface to :: and setting transport to tcp, while this bug happens when interface and transport are not set explicitly.

The error message also is slightly different:

For #66567, it is gaierror(-2, 'Name or service not known'), while for this bug it is gaierror(11001, 'getaddrinfo failed').

If I understand the code in ipc_publish_server() correctly, #66567 should not be triggered when creating the IPC service (the interface option is not used for that and the address 127.0.0.1 is hard-coded instead). I believe that #66567 rather happens when salt.transport.base.publish_server() is called and transport is tcp.