Closed fzk-rec closed 5 years ago
What salt transport are you using? ZeroMQ? What is the Failover setting on the ELB?
@damon-atkins I am using ZeroMQ.
What do you mean with Failover setting? I assume the ELB splits incoming traffic 50/50, or maybe based on CPU usage.
Sorry, should have said general settings. e.g. timeout, health check, etc?
Salt as two ports I assume both from a single client would need to go to the same Salt Master?
I cannot tell you, what you are trying will work or not.
Have you read https://docs.saltstack.com/en/latest/topics/highavailability/index.html
https://docs.saltstack.com/en/latest/topics/tutorials/multimaster.html
yes as @damon-atkins pointed out we already have the ability to load balance masters without a loadbalancer in front of them with the docs he pointed you to.
I am not sure if we support the master failover how you have it setup in front of a load balancer though. ping @saltstack/team-core do you know if this is a possibility currently within salt?
Thanks for your answer guys, but: @damon-atkins : yes I have read all the HA/multimaster docs I could find :) I think however, you might be onto something. The AWS ELB is unable to provide session stickiness for TCP ports. So, maybe you're right and the minions get sent to Master1 on port 4505 and to Master2 on port 4506. I guess that would cause problems.
@Ch3LL : The problem is that your view on 'loadbalancing' the masters is not sexy for cloud environments, since you always have to specify IPs or names. That's stuff that can change in a cloud environment all the time. Which is why we want to run the salt-masters behind an ELB, so that no minion ever needs to know the actual IP of a salt-master. We basically want to run 'Salt as a service' and provide the salt-api and the salt-masters to our internal services and consumers through one simple ELB DNS entry. This would give us massive benefits as we could:
I'm afraid that our setup won't work because of missing session stickyness :( I assume there is no way to have the salt-masters share one ZeroMQ?
I would be very surprised if the zeromq transport worked behind an ELB.
You might try the tcp transport layer, and see if that one works?
The better solution would be to possibly use master_type: func
which you could have a custom execution module that looks up your masters in the aws api and return a list of all the masters, and then opens an Active-Active connection to each one of the masters, instead of failover.
https://docs.saltstack.com/en/latest/ref/configuration/minion.html#list-of-masters-syntax
Then you don't have to have the minions store the information, but you don't need a load balancer.
@gtmanfred thx for the tip with the master_type: func! I didn't know about that, but I believe this could solve our problem. We could 'autodiscover' the salt-masters via EC2-Tag and simply add their IPs as a list to the minions and have the salt-minion restart on a regular basis.
Just two questions:
I don't understand what you mean with 'try the transport layer'. Where would I configure that? Is that a setting from the minion config?
1) you would need to provide it to the minion in the modules extension modules directory.
https://docs.saltstack.com/en/latest/ref/file_server/dynamic-modules.html
One problem is that you will need to sync this before you can run the function as the startup type, so you might need to have it in the fileserver for a masterless minion, and call salt-call --local saltutils.sync_modules
first and then start the minion.
And as a lesser known feature, you could write a pip module that you could install, that has the salt.loader
as part of the entry points like this
https://github.com/saltstack/salt/pull/31218
but point to module_dirs, and you should be able to pip install that on the minion. (there isn't a good documentation on this, other than what is in the PR above.)
2) I believe that the module would have to handle the errors, otherwise the minion would end up with an empty list without any minions
And for the transport layers, you would need to configure both the master and minion to use the tcp transport layer https://docs.saltstack.com/en/latest/topics/transports/tcp.html
Cool beans! I will definately look into that tomorrow. Thanks alot!
Not sure if I want to use that experimental TCP feature :)
I would suggest you will need scripts as part of installing salt that work out where the master is and call salt-call --master=abc --local saltutils.sync_modules as suggest by gtmanfred before swamping to master_type: func
also look at https://docs.saltstack.com/en/latest/topics/cloud/aws.html
Update this issue when you work out the best solution.
Ok, so I will have to work on some other topic and put this 'master behind ELB' on hold for a while, but this is the solution we will most likely use going forward:
I wrote a script that we will put on the salt-minion hosts and which will be triggered by a scheduled task like once per hour (maybe more frequently, we'll see).
from __future__ import print_function
import boto3
import urllib2
import logging
import yaml
import fileinput
import re
log = logging.getLogger('saltmaster-discovery')
hdlr = logging.FileHandler('C:\Deployment\saltmaster-discovery.log')
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
log.addHandler(hdlr)
log.setLevel(logging.DEBUG)
# Get region
region = urllib2.urlopen('http://169.254.169.254/latest/meta-data/placement/availability-zone').read()[:-1]
client = boto3.client('ec2', region)
log.info('Initializing saltmaster discovery...')
try:
response = client.describe_instances(
Filters=[
{
'Name': 'tag-key',
'Values': [
'Function',
]
},
{
'Name': 'tag-value',
'Values': [
'Saltmaster',
'saltmaster'
]
}
]
)
except:
log.error('AWS API call failed!')
exit(1)
api_private_ips = []
for res in response["Reservations"]:
try:
for inst in res["Instances"]:
try:
api_private_ips.append(inst["PrivateIpAddress"])
except:
continue
except:
continue
log.debug('List of private ips from the salt-masters: %s' % api_private_ips)
with open('C:\salt\conf\minion', 'r') as fp_:
report = yaml.safe_load(fp_.read())
minion_private_ips = report['master']
log.debug('Current master value in the minion config: %s' % report['master'])
# Compare the items from the minion config with the list of available masters. If they are not equal, update the
# values in the minion config with the values from the API call and restart the minion
regex = re.compile('master: \[.*\]', re.IGNORECASE)
if set(minion_private_ips) == set(api_private_ips):
log.info('The list of masters in the minion config is up to date')
else:
log.info('The list of masters in the minion config will be updated')
f = fileinput.FileInput('C:\salt\conf\minion', inplace=True, backup='.bak')
for line in f:
line = regex.sub('master: ' + str(api_private_ips), line)
print(line, end='')
f.close()
# Trigger the salt-minion restart
log.info('Restarting the salt-minion service')
from subprocess import call
call("C:\salt\salt-call.bat --local service.restart salt-minion", shell=True)
This script calls the AWS API to find all ec2-instances with the tag 'Function: saltmaster/Saltmaster'. Then it checks the values of the salt-minion config (it searches for the line master: ['some ip', 'another ip']) and if they match the list of private ips that we got back from the api call. If the lists don't match, we update the minion config with the list from the api call and then restart the minion.
The script is most certainly not perfect and is not final (f.e. I want to implement a check to see if the mininon is currently working on a salt-job before restarting the service), but I just wanted to share what I've got so far.
Additional info: Amazon just introduced Network Loadbalancer. I tried those out quickly, but couldn't get it to work :(
This looks awesome, thanks for sharing!
Daniel
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Hello team,
I hope everyone is doing great.
I came across this thread while researching ways of implementing the very same architecture in our environment. Has this been addressed?
Our problem statement is: because our minions are actually our product customers' instances, we need the salt masters to be publicly available since they live in different networks/VPCs, and VPC peering doesn't work here due to its connections limitations. Having said that, for security purposes, we need to find an alternative to make the salt masters private, but still reachable from the minions. Two things come to mind: a) iptables (not scalable); and b) AWS PrivateLink (requires a NLB to front the salt masters).
Since this was not available back in 2017, I thought of having a syndic master of masters behind the NLB alongside two other msters, but I am not sure if this would be possible, or if load balancing is even possible with a multimaster architecture. Anyone able to shed some light here?
I really appreciate any help.
Thanks, Gabriel
Check out the new master cluster feature that is part of 3007rc1: https://github.com/saltstack/salt/blob/master/doc/topics/tutorials/master-cluster.rst
For more details see https://github.com/saltstack/salt-enhancement-proposals/blob/1433501a1417f78c895345a675e21a8b6382bb61/0000-master-cluster.md
Description of Issue/Question
The connection of the salt-masters that run behind an AWS ELB to the salt-minions is flaky. Sometimes they work, most times they don't. I would like to know if there is some flaw in my setup that I am not seeing, or if Salt only works with an HA Proxy as an ELB?
Or maybe Salt doesn't work at all behind an ELB?
Setup
I am running the following setup at AWS:
The minion config looks as follows:
In the master log files, I can see on both masters:
2017-09-05 10:06:18,118 [salt.utils.verify][DEBUG ][35] This salt-master instance has accepted 2 minion keys.
A salt-key -L on both masters yield the same result:
So it looks like all is fine and everything should work. However, a test.ping is extremely flaky. Sometimes it works, but most of the time it doesnt. Most of the time neither master gets any return from the minion and on the minion side I can see in the log that the minion never receives the message to execute 'test.ping' from the master. Example 1: test.ping from Master1:
I am aware that the redis error will be fixed soon https://github.com/saltstack/salt/issues/43295
Example 2: test.ping from Master1, ~ 1 Minute after Example 1:
Also during my tests, a test.ping from Master2 never succeeded.
Steps to Reproduce Issue
Versions Report