haproxy / haproxy

HAProxy Load Balancer's development branch (mirror of git.haproxy.org)
https://git.haproxy.org/
Other
4.64k stars 771 forks source link

Trigger script or webhook alert #696

Open bluepuma77 opened 4 years ago

bluepuma77 commented 4 years ago

Currently haproxy offers an email-alert to notify in case of failure. It would be great to have an option to create alerts using scripts/commands or web-hooks.

At the moment people are improvising and using SMTP-HTTP gateways to trigger webhooks or abuse external checks to run their own script.

It would be great to have similar alerts to theemail-alert running scripts (or commands) and passing some info as command line parameters or triggering web hooks with some HTTP(S) POST JSON payload (like GitHub, or just form data). It should include the info if a service is going up or down.

script-alert </path/to/script> 
script-alert level alert

webhook-alert <http(s)://user:pass@server.tld/path> 
webhook-alert level notice

I found the mailer alert sources and how you run external scripts or send http requests, but C is just not my language.

bluepuma77 commented 4 years ago

I put some more thoughts into the whole topic.

Use cases:

  1. Trigger phone calls with Twilio
  2. Send push messages to Threema
  3. Share with chat groups like Slack
  4. Post to logging tools like Elastic
  5. Trigger scripts to restart a server/service
  6. Move of a floating IP, haproxy monitors itself 😃

Specifics It would be great to have the alert as global setting but also just for a backend section, especially for use cases 5 and 6

Settings

script-alert </path/to/script> 
script-alert level alert
script-alert params val | keyval | formdata | json

call script with individual values: script service1 down call script with key/value pairs: script "service=service1" "status=down" call script with formdata as single parameter: script "multi-line key=val" call script with json as single parameter: script '{"key":"val", "some":"more"}'

webhook-alert <http(s)://user:pass@server.tld:port/path?key=val> 
webhook-alert level notice
webhook-alert cookie key val
webhook-alert params formdata | json | query

Set cookies for example for authorisation, also enable different param styles.

Conclusion This will create a wealth of options for the admins working with and monitoring haproxy services. If I have to choose, I would pick the script-alert / command-alert / external-alert, as it can easily service as a web hook by simply using curl.

wtarreau commented 4 years ago

It must absolutely be done in a separate process. Would it be a manager running as a sidecar, or maybe just the master, I don't know but you NEVER EVER want that your internet-facing proxy which takes all the dirt in the face has any way to execute anything on your local system! Or maybe you're suicidal, but in this case there are other less painful ways.

bluepuma77 commented 4 years ago

So I should not let haproxy call curl to trigger a phone call to wake me up a night to fix things?

You recommend another additional system to monitor and notify on such events?

bluepuma77 commented 4 years ago

There seems to be a need for this kind of triggers. You rather want people to use SMTP-HTTP gateways or abuse external checks to trigger actions?

It really depends on your situation, some have a rather limited (time) budget for this whole topic and don't want yet another application to be installed, set up, maintained and monitored for this kind of job.

chipitsine commented 4 years ago

I agree that such a usage might be abusive (for any kind of triggers).

anyway, can lua be used for implementation?

wtarreau commented 4 years ago

So I should not let haproxy call curl to trigger a phone call to wake me up a night to fix things?

Definitely! I even find it amazingly shocking that you ask such a question!

You recommend another additional system to monitor and notify on such events?

Yep. And doing so in a separate process provides you even more classes of events such as cluster-wide conditions, and could even detect an haproxy crash. Because I think you'd like to know if your proxy crash, and it's not by asking it to tell you once it's about to die that you'll protect yourself.

wtarreau commented 4 years ago

There seems to be a need for this kind of triggers. You rather want people to use SMTP-HTTP gateways or abuse external checks to trigger actions?

Not at all, anything based on logs instead, as everybody does. It has always worked well. I have never used the SMTP stuff.

It really depends on your situation, some have a rather limited (time) budget for this whole topic and don't want yet another application to be installed, set up, maintained and monitored for this kind of job.

Those who explain that they have a limited time/budget are often the same who explain that they got hacked due to a limited time/budget to do things properly. The fact is that the only ones who succeed and grow seamlessly are actually those who invest the time to do things correctly initially and not to try to plug holes after the problems start to appear.

By monitoring from a centralized system you'll be able to trivially manage tens of LBs if you need and it will even work if you deploy in multiple clouds. Don't waste your time doing it the wrong way, it's not worth it and you're only attacking the visible part of the iceberg which is the smallest and least useful one.

TimWolla commented 4 years ago

Definitely! I even find it amazingly shocking that you ask such a question!

I mean, it's clear to me that starting external processes are an issue. But I understood “call curl” not as “literally curl”, but rather as “send HTTP request”. HAProxy does that all the time with checks. So would a HTTP request on a status change be an issue? In fact we already have that for SMTP as @bluepuma77 reported. I believe it is pretty fragile, but the main reason for that as I understand it is that the mailers feature just is / was not well-maintained. But HAProxy has a pretty solid HTTP implementation and with the rewritten httpchk system a good way to send “arbitrary” requests based on the configuration. Extending that to fire off a request when a server status changes should not be too bad. At least not worse than the mailers.

I mean, yeah. By not externally monitoring anything you lose the ability to detect if HAProxy fails, but I'd wager the guess that the backend services fail way more often than HAProxy itself.

wtarreau commented 4 years ago

You're right, using HTTP natively to perform outgoing requests isn't an issue at all from this perspective! And as Ilya mentioned, Lua could already be used for this. We may have to define certain triggers, sort of internal actions, to make this easier, but I'm totally open to this. What I'd stay away from is anything specific to "the-standard-of-the-day". You know, a new monitoring solution that everyone jumps onto, then forgets in 6 months to the benefit of another one. But if it's a programmable action onto which we can plug 3 lines of Lua it could trivially be addressed. And if it's just defining a URL to send a JSON request with various elements, it could be easy as well.

But indeed it will not detect haproxy's death if that happens.

bluepuma77 commented 4 years ago

More details on how this could look like

webhook-alert http(s)://user:pass@server.tld:port/path?key1=val1&key2=val2
webhook-alert method GET | POST // select method for request
webhook-alert auth user pass // alternative if you don't want to include it in URL (needed?)
webhook-alert level notice // when to trigger, taken from smtp-alert (makes sense?)
webhook-alert cookie key3 val3 // enable setting multiple cookies
webhook-alert cookie key4=val4 // alternative with = (needed?)
webhook-alert formdata key5 val5 // enable setting multiple POST form parameters
webhook-alert formdata key6=val6 // alternative with = (needed?)
webhook-alert params formdata | json | query // select how haproxy details will be transmitted
bluepuma77 commented 4 years ago

And my seemingly controversial idea to monitor haproxy with haproxy explained:

haproxy2 http-checks haproxy1. If haproxy1 is unavailable it just moves the floating IP to itself. In my case this is done with a http request to the provider's backend. And vice versa.

chipitsine commented 4 years ago

@bluepuma77 , what you are talking about is very similar to ExaBGP

https://github.com/Exa-Networks/exabgp

it is healthcheck + BGP announce based on successful healthcheck. In terms of BGP there's no "floating IP move", cluster IP is announced from several servers at a time (depending on healthcheck)

capflam commented 4 years ago

Sending HTTP alerts to report health-check failures is a good idea. But the syntax will probably not be so custom. I don't evaluated the feature yet but my very first idea would be to reference the backend to use to send the report. This way, it would be possible to rely on http-request rules to set some extra info.

chipitsine commented 4 years ago

sending http alerts is bad idea.

I would say there are many time series databases (Prometheus, Graphite, Tick, ... ) and alterting on top of their data.

Generally, if we can expose health check failures as Prom (or any other time series), we are cool.

lukastribus commented 4 years ago

I disagree that sending HTTP alerts is a bad idea, I think it's very useful. We don't have to be cool as much as we have to be useful for a wide variety of deployments.

Just because you can use timeseries databases with alerting on top of it, doesn't mean that it's useful, possible or necessary in every single deployment out there.

Regarding the usefulness of haproxy health checks result in general: I would like to emphasize that not every external health check will necessarily have the same exact result as the haproxy internal health checks, and only the latter actually impacts production traffic:

There are a lot of reasons why haproxy health check can consider the backend (or all of them) down, while external monitoring solutions will see the backend up and running, I assume this is why Simon Horman implemented the SMTP notifications in the first place.

So I disagree with the notion that external monitoring is "the right and only thing to do and ticks all the boxes". Sure, external monitoring with automated log analysis subsequently triggering alerts for backend state changes, yes, that will do the job just fine. But I think HTTP alerting can bridge this, in my opinion important, gap between full-blown log analysis and archaic SMTP alerting or not knowing haproxy's point of view at all.

chipitsine commented 4 years ago

naive approach "if healthcheck failed, I wish to send an email" has caveats. I'm not sure people take all that into consideration (I recall myself 10 years ago). If you perform outgoing smtp/http query, you depend on how remote server responds. What if it times out ? What if it is extremely slow ?

I started with similar approach and I ended with several outages soon.

I doubt that such features are used (even email alerting), but I've no idea how to confirm that.

Also, I'd say that we need some described best practices. For example, if ExaBGP would have been described in details, I think people would consider it instead of "I run couple of instances with moving IP between them based on cross healthchecks").

lukastribus commented 4 years ago

SMTP/HTTP requests can fail just as syslog messages can vanish.

I'm not saying SMTP/HTTP alerting covers everything. I am saying SMTP/HTTP alerting is covering an important part that is often overlooked.

There is no one-size-fits all.

chipitsine commented 4 years ago

that's different story. syslog is "best effort" approach. because of UDP. you can fail, but you are not blocked on it. if you want to be reliable and not lose any message, the best approach is to keep them as counters (until someone, e.g. prom exporter, reads them)

lukastribus commented 4 years ago

We don't block on TCP either. Can we stop this off topic bikeshedding now?

wtarreau commented 4 years ago

I agree with Lukas above. Note that when I'm talking about external monitoring, I'm not meaning "external monitor of servers", that would be pointless for the reasons Lukas explained. I mean "external monitoring of haproxy" which is the only way to reliably know that haproxy died. But of course as long as haproxy is trusted, logs/smtp/http are all equivalent.

I don't know if there are standards for alerting over HTTP. The thing I fear the most is "the standard of the day", which is very common over HTTP... A new tool starts to appear and to become the new de-facto standard, everyone jumps on it, and one year later nobody knows about it anymore because a new totally incompatible one replaced it, but we still have to support the previous one for existing deployments and possibly backport support for the new one. That's why we need to be careful in how to make this generic and extensible enough to support various solutions. And I really think that being able to trigger Lua code on alerts ought to be a first step, because it would allow one to implement support for any new solution even in LTS versions.

Thus I think that we should not think in terms of HTTP alerting but on alerting. We could define various ways to send an alert (e.g. Lua, HTTP, e-mail). Or maybe we should allow the code to register for alert notifications (which is more or less what SMTP does today).