Open clayoster opened 2 months ago
Yes the timeouts for the publish port should match the 'publish_session'. The docs need to be updated. This has been in my backlog for some time.
@dwoz Would you recommend leaving publish_session at the default of 24 hours, or setting it to something lower? Additionally, would it be a good idea to set ping_on_rotate: True
in this configuration?
I'm also trying to follow this tutorial and can't get the minion to "sign_in" with one of the masters. Even if I directly run the salt-minion
with the salt-master
in question. @clayoster are you minion configs simply having the master: salt.example.com
or is there other settings needed?
@anthonyra Correct, that is the only master-related setting in my minion config file. Using your example, salt.example.com
points to my HAProxy server which load balances connections to 3 masters. My HAProxy config follows the example in the master cluster tutorial aside from using 12h
instead of 1m
for the timeout values as mentioned in this issue.
Do the minion logs give an indication of what the issue may be?
I think it was a combination of the master default config setting the user as salt
while the minion was setting the user as root
and the bootstrap script auto starting the daemons (so when I changed the config I didn't restart the service just started it). I was able to get it working thank you for the help/feedback!
Description I have been testing the suggested HAProxy configuration from the Master Cluster tutorial and have found that the suggested client/server timeout values of
1m
cause unstable minion communication, specifically with the publisher port (4505).I am using the default transport with 3 masters and 50 minions all running 3007.1. My HAProxy version is 2.6.12-1 (Debian 12), though I have tested older and newer versions with the same results.
Adjusting TCP keepalive values on the masters and minions does not seem to affect HAProxy closing TCP sessions after 1 minute of inactivity. Reducing
tcp_keepalive_idle
does speed up minions reconnecting after HAProxy closes the connection though.It seems that no matter how frequently the master and minions send keepalives, HAProxy will close the sessions after 1 minute if no data is sent through the session. If I run something like
salt '*' test.ping
every 30 seconds, this keeps the sessions with the publisher port alive for longer than 1 minute.To Reproduce:
watch "netstat -nalpt | grep 4505"
and watch for TCP sessions to switch from "ESTABLISHED" to "TIME_WAIT". This should happen within a minute. Runsalt '*' test.ping
from the master while sessions are in this condition and you'll see minions fail to respond as they did not see the event published from the master.I am currently using timeout values of 12h on the publisher and request server ports to reduce the frequency TCP sessions being killed off. While this probably isn't the best solution, it does keep minion communication stable as it greatly reduces how often minions have to re-establish their connection with the master.
Suggested Fix Is there other configuration expected to be set on the master and minion to allow stable minion communication with the suggested 1 minute timeouts in HAProxy?
Type of documentation Tutorial
Location or format of documentation https://docs.saltproject.io/en/latest/topics/tutorials/master-cluster.html