warden-stack / Warden

Define "health checks" for your applications, resources and infrastructure. Keep your Warden on the watch.
https://getwarden.net
MIT License
616 stars 66 forks source link

Cachet Integration #104

Closed jbrooksuk closed 8 years ago

jbrooksuk commented 8 years ago

Integrate with Cachet to report down time.

spetz commented 8 years ago

Queued as my next task after finishing the MS SQL integration.

spetz commented 8 years ago

So I was looking at the docs, could you point me what should I mostly focus on? I was thinking about creating an integration that would expose the methods for sending HTTP requests to the Cachet API. However I'm not sure which ones are the most important ones and if I should include all of them or is it somehow possible to create a sort of batch/bulk requests?

jbrooksuk commented 8 years ago

Components & Incidents.

Cachet doesn't support bulk requests though, so you'll need to update one thing at a time.

spetz commented 8 years ago

Ok, so basically POST/PUT/DELETE requests for these 2 types, that would be filled with some valuable data like a name of the watcher, description of the performed check etc.?

jbrooksuk commented 8 years ago

Yep :)

spetz commented 8 years ago

Alright, doesn't seem too difficult, hopefully, I'll have some working next week :).

jbrooksuk commented 8 years ago

Awesome! Once it's done, we can do a blog post about the integration! :)

spetz commented 8 years ago

That would be great, I'll let you know as soon as I have something working :).

jbrooksuk commented 8 years ago

👍

spetz commented 8 years ago

Just a quick update - integration is going well so far. And I have one question - it doesn't really matter if I send to the specific endpoint an object that has more properties than it's listed in the docs (so I could reuse them), right?

jbrooksuk commented 8 years ago

Nope, you can send whatever.

spetz commented 8 years ago

Ok, cool, thanks for the confirmation.

spetz commented 8 years ago

I've completed most of the integration and have one remaining piece - set of methods that can accept e.g. iteration or check result object as an argument and send a request to the API. If I understand correctly, the component would be a watcher type (e.g. Web or MongoDB) and the incident would be the result of the monitoring operation, for example the information that something has failed or it works just fine?

jbrooksuk commented 8 years ago

Correct :)

spetz commented 8 years ago

I have a great news, integration is pretty much completed. I've also setup the temporary VM to test the API, you can see the results here http://52.166.244.180 :).

spetz commented 8 years ago

Cachet integration has been officially released! :) And a short info on blog.

jbrooksuk commented 8 years ago

Yay! Thanks!

spetz commented 8 years ago

You're welcome, any feedback will be greatly appreciated :). I had to design some basic flow, so that the same components and incidents wouldn't be created all the time, instead I just update'em based on the name and the actual date.

janpieterz commented 8 years ago

I'm trying to combine both Warden and Cachet and am noticing a couple of things:

spetz commented 8 years ago

As for incidents, the thing is that the Cachet integration has to know in a first place if there were some failures in the past, that's why it creates an incident with the status "Fixed" even when the service being monitored works all the time. I'll need to think how to resolve this issue.

For the grouping, there's a "groupId" parameter, yet it's a global one for the whole integration configuration, so again, this will probably need to be separated so that the watchers could have their own, separate groups.

janpieterz commented 8 years ago

Yeah the incidents is interesting. Maybe it shouldn't create them automatically? I commented it out, if I want to create an incident I'll probably want to make it a manual action but that's of course just me.

The grouping I solved using the same name, and doing the pushing to Cachet a bit more manual. It's a bit ugly, but to be honest I think this is actually good enough, there will be so many different configurations for this that a little bit of config like this should be fine:

var webIterations = iteration.Results.Where(x => x.WatcherCheckResult.WatcherGroup == "ArkeWebsite").ToList();
var webIteration = webIterations.FirstOrDefault(x => !x.IsValid);
if (webIteration == null)
{
    webIteration = webIterations.FirstOrDefault();
}
await cachet.SaveCheckResultAsync(webIteration, true);

var otherIterations = iteration.Results.Where(x => !webIterations.Contains(x));
foreach (IWardenCheckResult checkResult in otherIterations)
{
    await cachet.SaveCheckResultAsync(checkResult, true);
}
spetz commented 8 years ago

Good to know that you've found out a temporary workaround. I think I could set the same "groupId" based on the "WatcherGroup" name (just need an additional configuration for such mapping). As for the incidents I'll need to think about this :).

spetz commented 8 years ago

@janpieterz I've just pushed the fixed code that will hopefully resolve your 2 issues. In order to configure the groups, you can use the WithWatcherGroups() where you provide the mappings between the watcher group name and id used in the Cachet. For the incidents, there's a new parameter "saveValidIncidents" (false by default) that will prevent creating a Fixed incident if there were no errors before. Please let me know if it's good enough.

janpieterz commented 8 years ago

@spetz super, cheers for that, that does make it a lot easier. I also stumbled upon some other interesting things:

spetz commented 8 years ago

Cool, I've just pushed some changes that might help to resolve your latest issues: