Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
Description
When a minion is configured with a list of masters and the master_type is set to failover, the minion will connect to every master in the list on every state item completion. This increases the overall state run time several fold. If any master in the list is down, the overall state run time increases even more.
A state file that contains some number of items. The more items in the state, the longer the state will take...
Please be as specific as possible and give set-up details.
[x] on-prem machine
[x] VM (Virtualbox, KVM, etc. please specify)
[x] VM running on a cloud service, please be explicit and add details
[ ] container (Kubernetes, Docker, containerd, etc. please specify)
[x] or a combination, please be explicit
[ ] jails if it is FreeBSD
on-prem physical, virtual, and cloud minions and masters
Steps to Reproduce the behavior
salt-call state.apply <some_state> -l debug
each state item will have an entry like:
[INFO ] Completed state [my_state] at time 17:20:59.541977 (duration_in_ms=17.195)
[DEBUG ] Initializing new SAuth for ('/etc/salt/pki/minion', 'my_minion_id', 'tcp://10.10.10.2:4506')
[DEBUG ] salt.crypt.get_rsa_key: Loading private key
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'my_minion_id', 'tcp://10.10.10.100:4506', 'aes')
[DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'my_minion_id', 'tcp://10.10.10.100:4506')
[DEBUG ] Connecting the Minion to the Master URI (for the return server): tcp://10.10.10.100:4506
[DEBUG ] Trying to connect to: tcp://10.10.10.100:4506
[DEBUG ] Closing AsyncZeroMQReqChannel instance
These lines will repeat for every master in the master list, such as:
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'my_minion_id', 'tcp://<master2_ip>:4506', 'aes')
[DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'my_minion_id', 'tcp://<master2_ip>:4506')
[DEBUG ] Connecting the Minion to the Master URI (for the return server): tcp://<master2_ip>:4506
[DEBUG ] Trying to connect to: tcp://<master2_ip>:4506
[DEBUG ] Closing AsyncZeroMQReqChannel instance
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'my_minion_id', 'tcp://<master3_ip>:4506', 'aes')
[DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'my_minion_id', 'tcp://<master3_ip>:4506')
[DEBUG ] Connecting the Minion to the Master URI (for the return server): tcp://<master3_ip>:4506
[DEBUG ] Trying to connect to: tcp://<master3_ip>:4506
[DEBUG ] Closing AsyncZeroMQReqChannel instance
The "total run time" does not account for the delay inhibited for connecting to each master.
Expected behavior
The minion only communicates with the initial publish master for returns (unless it needs to switch mid-run due to a master failure)
Versions Report
tested with salt-minion 2018.3, 2019.2, 3001, 3003
salt-masters at 3001
a salt minion at version 2018.3 does not exhibit this behavior (it only returns to a single master)
Additional context
Anecdotally, A test state with 77 state items:
1) with "master_type" not explicitly set, runs in ~16 seconds of "real" time and reports a total run time of 2.2s-2.3s
2) with master_type set to failover, runs in ~1 minute 30 seconds of "real" time and reports a run time of 2.2s-2.3s
3) with master_type set to failover and one master in the list "offline", runs in ~8 minutes of "real" time and still reports a run time of 2.2s-2.3s
The master return connections are being caused by state_events: True being set on the master. This causes event.fire_master to be called, which then fires the event on every master in the list.
Description When a minion is configured with a list of masters and the master_type is set to failover, the minion will connect to every master in the list on every state item completion. This increases the overall state run time several fold. If any master in the list is down, the overall state run time increases even more.
Setup minion config:
A state file that contains some number of items. The more items in the state, the longer the state will take...
Please be as specific as possible and give set-up details.
[ ] jails if it is FreeBSD
on-prem physical, virtual, and cloud minions and masters Steps to Reproduce the behavior
each state item will have an entry like:
These lines will repeat for every master in the master list, such as:
The "total run time" does not account for the delay inhibited for connecting to each master.
Expected behavior The minion only communicates with the initial publish master for returns (unless it needs to switch mid-run due to a master failure)
Versions Report tested with salt-minion 2018.3, 2019.2, 3001, 3003 salt-masters at 3001
a salt minion at version 2018.3 does not exhibit this behavior (it only returns to a single master)
Additional context Anecdotally, A test state with 77 state items:
1) with "master_type" not explicitly set, runs in ~16 seconds of "real" time and reports a total run time of 2.2s-2.3s 2) with
master_type
set to failover, runs in ~1 minute 30 seconds of "real" time and reports a run time of 2.2s-2.3s 3) withmaster_type
set to failover and one master in the list "offline", runs in ~8 minutes of "real" time and still reports a run time of 2.2s-2.3s