Open amalaguti opened 5 months ago
I have also experienced this issue on Windows minions (v3006.9). The master status checks always fail and cause the minions to re-highstate every master_alive_interval
.
It looks like problem is caused by this code incorrectly using local address and port rather than remote address and port for the connection check. https://github.com/saltstack/salt/blob/master/salt/modules/win_status.py#L500-L501
As a quick test I patched my local copy of win_status.py
and that seems to fix the problem.
@@ -497,8 +497,8 @@
for conn in conns:
if conn.status == psutil.CONN_ESTABLISHED:
- if conn.laddr.port == port:
- connected_ips.add(conn.laddr.ip)
+ if conn.raddr.port == port:
+ connected_ips.add(conn.raddr.ip)
return connected_ips
Before patching win_status.py
:
PS > salt-call status.master master=saltha-1.foo.com
[INFO ] Got list of available master addresses: ['saltha-1.foo.com', 'saltha-2.foo.com', 'saltha-3.foo.com']
False
PS > salt-call status.master master=saltha-2.foo.com
[INFO ] Got list of available master addresses: ['saltha-1.foo.com', 'saltha-2.foo.com', 'saltha-3.foo.com']
local:
False
PS > salt-call status.master master=saltha-3.foo.com
[INFO ] Got list of available master addresses: ['saltha-1.foo.com', 'saltha-2.foo.com', 'saltha-3.foo.com']
local:
False
After patching win_status.py
:
PS > salt-call status.master master=saltha-1.foo.com
[INFO ] Got list of available master addresses: ['saltha-1.foo.com', 'saltha-2.foo.com', 'saltha-3.foo.com']
True
PS > salt-call status.master master=saltha-2.foo.com
[INFO ] Got list of available master addresses: ['saltha-1.foo.com', 'saltha-2.foo.com', 'saltha-3.foo.com']
local:
False
PS > salt-call status.master master=saltha-3.foo.com
[INFO ] Got list of available master addresses: ['saltha-1.foo.com', 'saltha-2.foo.com', 'saltha-3.foo.com']
local:
False
Description 3006.8 Windows minion configured in multimaster failover, when minion starts, after each master_alive_interval it shows "Connection to master (second) lost", even when this master is still available, and tries to connect to the following master in the list, and this loop continues after each master_alive_interval, switching connection from master to master in the masters list
It also generates multiple /start events (on each master) (the minion service is not restarted, so there should not been /start events after the initial connection)
IMPORTANT: Don't see the same behavior in Windows minion 3006.7
Minion log (log messages have been changed from log.debug to log.info in salt/minion.py module to avoid using debug logging)
Setup 3006.8 master and Windows minion
Steps to Reproduce the behavior Configure minion as shown in multimaster failover, start minion and check minion log and event bus after each master_alive_interval
Expected behavior The minion should stay connected to the second master when the master_alive_interval runs, no additional /start event should be seen after the initial minion start
Versions Report 3006.8