Add service discover for -mesh.peers

iksaif commented 7 years ago

Currently -mesh.peers takes a static list of host:port. If you want to use service discovery to discover your peers you need to do that before the command line invocation.

It more or less work with consul because you can use something like dig +noall +answer foo.service.dc.consul SRV, but that is far from optimal.

Do you think it would make sense to add service discovery support for this in the alertmanager ?

stuartnelson3 commented 7 years ago

I think this sounds like a good addition, but it is worth noting that the current gossip model will discover additional peers. The list you provide at the start is not the "set in stone" list of your peers. For example, if you start peerA, and then start the following peers B, C, and D, all configured with peerA as their initial peer list, they will discover each other eventually.

As far as finding an additional list of peers, though, I do think supporting service discovery makes sense.

brancz commented 7 years ago

I've spoken to @fabxc about this and we also came up with something in the direction of what @stuartnelson3 wrote. It is mainly interesting for the initial discovery, and for that reason we didn't think taking on the huge dependency, would be worth it. Actually dynamically discovering and changing peers could disrupt the mesh network, so it may actually harm it - or seem like it doesn't work because it would in fact not dynamically reconfigure like Prometheus does.

The way we handle the mesh network initialization it in the Prometheus Operator is that we don't connect the Alertmanagers through an IP address but through a stable DNS name. All instances deployed then have all other instances as arguments. While this may look odd when scaling up and down it actually always works out as all initial instances are always part of the mesh and by the mesh sharing all connections all peers are eventually connected to each other.

matthiasr commented 7 years ago

Since it's only used at startup, I think relying on scripting around for cases where DNS is not available is good enough, compared to the complexity of having full blown SD but then using it very differently. Keep in mind that many Prometheus SDs (all but DNS and file?) rely on relabeling to actually select targets from the global lost, so we'd have to pull that in too.

I would reconsider if someone can demonstrate a case that is trivial with SD and impossible to solve otherwise, but for now I'd say no.

brian-brazil commented 7 years ago

I think we'll ultimately have to add it, but let's not do it earlier than we need to.

fabxc commented 7 years ago

I don't think we ever have to add it. There is really no point as it is only needed during bootstrapping.

Agreeing with @matthiasr on all ends.

There any script that populates the startup flags with results from querying service discovery serves the purpose just fine. All the complexity of relabeling makes the cost/benefit of adding this underwhelming. Being not a stateless application and having a reachable UI all instances of AM will likely have a stable DNS name and in general not even a complex script is required querying an actual SD system.

I'd like to close this as I consider it a non-goal for the time being and don't want long-standing "maybe but probably not" issues idling in the tracker.

iksaif commented 7 years ago

I guess the idea was that it would be more user friendly not to have to script it, but it certainly is not a priority

fabxc commented 7 years ago

I could see it as part of amtool #636 possibly but it's probably not trivial to do without still having a config file for it.

stuartnelson3 commented 7 years ago

Closing issue as it is currently not planned to be implemented

discordianfish commented 7 years ago

Could we at least support DNS RR based discovery for the initial peers? That seems straight forward and it's what consul is doing too for example. Basically just resolve the DNS name and use all returned IPs as initial peer list. Without that, you need a rather complex kubernetes setup to still make it work. And even in that case, you need to generate the manifest because 'replicas' need to match the parameters. With "DNS SD" all you need is a deployment and headless service.

fabxc commented 7 years ago

I think DNS is reasonable request and makes a lot of deployments significantly simpler. I'd only agree to this though if we can make a firm decision to leave it at that. I've no interest in supporting the full set of SD we do in Prometheus. Especially as all the meta data non-DNS gives us is not required here.

On Wed, Jun 28, 2017 at 7:20 PM discordianfish notifications@github.com wrote:

Could we at least support DNS RR based discovery for the initial peers? That seems straight forward and it's what consul is doing too for example. Basically just resolve the DNS name and use all returned IPs as initial peer list. Without that, you need a rather complex kubernetes setup to still make it work. And even in that case, you need to generate the manifest because 'replicas' need to match the parameters. With "DNS SD" all you need is a deployment and headless service.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/prometheus/alertmanager/issues/645#issuecomment-311728661, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuA8vHTrA0tCXd7cFcO-tPFOkQ3i61rks5sIotsgaJpZM4MVkcW .

iksaif commented 7 years ago

Maybe full DNS-SD support isn't required and just supporting SRV records (to be sure to get the right port, which is important if you run in marathon/kube/whatever).

Currently we need to do that:

#!/bin/bash                                                                                                                                                                                                       

set -x                                                                                                                                                                                                            
service=$1                                                                                                                                                                                                        
peers=                                                                                                                                                                                                            
for dc in dc1 dc2 dc3 dc4 dc5 dc6; do                                                                                                                                                                     
  peer=$(dig +noall +answer ${service}.service.${dc}.consul SRV  | perl -pe 's/.+\s+(\d+)\s+(.+)\.$/-mesh.peer=$2:$1/g')                                                                                          
  if [[ -n $peer ]]; then                                                                                                                                                                                         
     peers="$peers $peer"                                                                                                                                                                                         
  fi                                                                                                                                                                                                              
done                                                                                                                                                                                                              
if [[ -z $peers ]]; then                                                                                                                                                                                          
    export ALERTMANAGER_MESH_PEERS_OPTION=                                                                                                                                                                        
else                                                                                                                                                                                                              
    export ALERTMANAGER_MESH_PEERS_OPTION=$peers                                                                                                                                                                  
fi                                                                                                                                                                                                                
echo $ALERTMANAGER_MESH_PEERS_OPTION

And alertmanager -mesh.peers $(./mesh_peers.sh) which is hacky :)

discordianfish commented 7 years ago

@fabxc Right, DNS-SD and nothing else. I'd say we should support A records and SRV records. All other SD systems usually support some from of pushing the state to DNS so it would help these users too. Or to put it this way: At every job I ever had, if I wanted to introduce the AM, DNS-SD would help a lot.

@iksaif Agreed, something like that works too but I'd like to avoid "glue code" where possible. It's not possible everywhere but I think here we could.

brancz commented 7 years ago

What needs to be clear is that a "discovery" mechanism like this is only for discovery at startup time, after that the mesh library takes over and makes the peers discover each other. Updates from another discovery mechanism would break the mesh.

I'm torn whether this "glue code" might actually be the right solution, I think it might be. Is it really necessary that we build these initial discovery mechanisms into all distributed systems? It seems error prone and duplicating. What speaks against it in this case is that I don't think the Alertmanager container image contains dig (I haven't actually checked).

discordianfish commented 7 years ago

@brancz I imaging the flag staying the same, that should convey it's only used for initial peers. That the same thing etcd or consul are doing.

iksaif commented 7 years ago

@brancz it's honestly a pain to have to maintain and deploy this script. I'm sure that in "some way" the service discovery code could be put into a lib, and then programs like the AM would just do "mesh = discover_once(mesh_target)".

Short term, simply supporting DNS would be nice :) Note for others: if you use consul you can just write a prepared query fatch returns any working instance and use that.

brian-brazil commented 7 years ago

What needs to be clear is that a "discovery" mechanism like this is only for discovery at startup time, after that the mesh library takes over and makes the peers discover each other. Updates from another discovery mechanism would break the mesh.

I'm not too sure about that, what if all peers one AM knows about die?

brancz commented 7 years ago

My point was that after initial discovery (that each peer performs, including new peers), the mesh library performs discovery, mixing that with an additional discovery mechanism that perform discovery at runtime is likely to break the discovery that the mesh library performs at runtime.

discordianfish commented 7 years ago

As a workaround, this is the clue script I came up with (which works in the existing AM image):

nslookup prometheus-am-mesh \ 
  | awk -v ORS=' ' '/^Name: / { found=1 }; found && /^Address / { print "-mesh.peer="$3 }')

Which can be used directly in a kubernetes manifest like this:

         command:
            - /bin/sh
            - -c
            - |
              exec /bin/alertmanager \ 
                -config.file=/etc/prometheus/alertmanager.yml \ 
                -storage.path=/alertmanager \ 
                $(nslookup prometheus-am-mesh \ 
                  | awk -v ORS=' ' '/^Name: / { found=1 }; found && /^Address / { print "-mesh.peer="$3 }')

(Where prometheus-am-mesh is a headless service, thanks to @brancz for that suggestion).

That seems to work for me, just would like to avoid this ugly and fragile glue. :)

brancz commented 7 years ago

@discordianfish in the case of Kubernetes I would still encourage using a StatefulSet. As it gives you stable IPs, DNS names (although that's mainly thanks to the governing/headless service), and most importantly stable storage, so that when a node fails it can load from a previous snapshot, and only gossip and merge the delta through mesh, assuming there is at least one healthy remaining member.

The only downside that remained is that StatefulSets did not have rolling updates, however in 1.7 they are added and 1.7 is being released today :slightly_smiling_face: .

discordianfish commented 7 years ago

@brancz StatefulSets are much more complex than deployments in every regard. Stable storage is something I hope I don't need for the alertmanager..? I assumed that syncing a tiny mesh like this should be done <1s. Storage isn't free and it's hard und not very reliable outside mature clouds.. Even on azure I wouldn't want to depend on storage for something like the AM.

brancz commented 7 years ago

Mesh is very fast yes, I just played through some rolling deploy scenarios that showed that what you mentioned does work in the happy case (in fact haven't encountered a failure so far), but in the non-happy case which is still possible, you still want a backup/snapshot to restore from, otherwise you'll get notifications you don't want and more importantly you'll loose all silences and possibly get notifications that are supposed to be silenced.

discordianfish commented 7 years ago

@brancz I don't know what non-happy cases you think but nothing prevent you from losing your disks too. On the cloud you can somewhat expect that your volumes are persisted but you can't depend on it either. If you need to keep your silences, you should do a real backup. Then again, I'm probably fine to lose silences in these rare cases.

brancz commented 7 years ago

Having another layer of confidence definitely can only be good for your distributed system. I also don't agree with

StatefulSets are much more complex than deployments in every regard

But I think that's another discussion that shouldn't happen on this thread.

fabxc commented 7 years ago

If someone cares about examples, oklog supports some DNS flavors for discovery that seem reasonably simple: https://github.com/oklog/oklog/blob/master/cmd/oklog/forward.go#L144-L186

discordianfish commented 7 years ago

@fabxc Yes, was thinking along these lines. All I really care about is A record discovery. SRV is probably useful for others too but we don't need reverse lookups and can keep it even simpler. I'd expect 90% of all prometheus-am users use that to setup their cluster.

So can we open the issue again?

discordianfish commented 7 years ago

Since this also came up on the ml, I'm going to re-open. Hope that's okay.

AndreaGiardini commented 6 years ago

Just adding my comment to say that the workaround of @discordianfish works great with an headless service :+1: thanks!

discordianfish commented 6 years ago

@AndreaGiardini Full disclosure: In the end I went with @brancz original suggestion of using stateful sets since I got into split brain situations otherwise. But yeah I'm not happy with it since it's much more complicated than it should be for a 'cloud native' project. But I think we agree on that now and just need somebody to add a light DNS SD like the one in consul or oklog.

brancz commented 6 years ago

Basically my point from above stands, I think for initial peer discovery a simple DNS discovery is fine, but this should only be for initial discovery, after that the mesh should take over for adding/removing peers.

damomurf commented 6 years ago

@brancz Apologies for referencing a comment you made quite a while ago, but is a StatefulSet still the recommended approach for a Kubernetes setup with the Alertmanager Mesh configuration? A StatefulSet is the only way I can get a functioning Mesh, but offers no ability to expose each Pod within the Mesh via its own service so that Prometheus can ensure each member receives the alerts from Prometheus.

(I assumed this wouldn't be a deal breaker and that the mesh would replicate the alerts across the Mesh, but in the configuration I've currently got I'm not seeing the alert itself replicate).

Is there a go-to example of running a Mesh setup in Kubernetes that's current?

brancz commented 6 years ago

@damomurf alerts are never replicated, only notifications that have been successfully sent.

I recommend using the Prometheus Operator to orchestrate Alertmanager deployments. It uses StatefulSets under the hood. You don't actually need a service per pod with a StatefulSet, as you can make use of what's called a "governing service" of a StatefulSet (also automatically managed by the Prometheus Operator). When setup correctly, a governing Service sets up DNS records for each Pod of the StatefulSet in the form of pod-name.governing-service-name.namespace.svc. Beyond that the Prometheus Operator also is thoroughly tested for performing zero-downtime rolling updates and automatically handles all breaking changes of Alertmanager.

If you do not want to make use of the Prometheus Operator, then I do recommend using a StatefulSet anyways.

disclaimer: I am one of the maintainers of the Prometheus Operator.

prometheus / alertmanager

Add service discover for -mesh.peers #645