warden-stack / Warden

Define "health checks" for your applications, resources and infrastructure. Keep your Warden on the watch.
https://getwarden.net
MIT License
615 stars 66 forks source link

Logic Issue around Aggregate Hooks & completion of an ExecuteAsync Iteration #146

Open eoincampbell opened 7 years ago

eoincampbell commented 7 years ago

Guys,

This is a bit of a discussion item, but I've noticed 2 separate but related issues with the way the ExecuteAsync operation occurs, what it considers an iteration, and how it subsequently executes the aggregate hooks.

  1. Aggregate Hooks don't execute until after the interval delay of the current iteration.

Lets say you have 2 web tests, both configured for 15m interval. The logic currently runs as

The net effect is that the aggregate hooks (useful for sending combined notifications for all results, e.g. a summary via email of all failures if any) don't get executed until 15 minutes after the watchers executed.

I'd propose moving the Task.Delay statement, before the attempt to execute. there are pro's and con's to this.

  1. Watchers that don't have equally divisible Intervals knock the sequence out of order for other watchers.

Consider 3 WebWatchers with 3 different intervals. Web 1 - 20sec Web 2 - 30sec Web 3 - 40sec

I'd expect them to execute as follows

Web 1: 0,     20,     40,     60,     80,     100,     120
Web 2: 0,         30,         60,         90,         120
Web 3: 0,             40,             80,             120

But that's not what happens. Because of the way executions per iterations are calculated against the max interval for all configured watchers.

The max interval is 40 secs. The system will establish that web 1 can occur twice in 40 seconds (@0s and @20s), it will also establish that web 3 can occur once in 40 seconds (@0s). But web 2 will fit into 40s twice. so it runs twice and pushes out the full iteration window to 60s.

To be honest, i don't have a good solution for this. The upshot is if you have a mixture of jobs of completely different types you want to run, then this is going to cause issues. e.g. the following would result in iterations that last 90 minutes and only fires the aggregatehooks function at the end of the 90min iteration. Web Watcher - (1 Minute) SQL Query - (45 Minute) VM Disk Checks - (1 Hour)

spetz commented 7 years ago

Hi, Thank you for the detailed description. Recently, I had no time to work on the Warden, due to other open source projects I've been developing, yet I'll try to work on resolving this issue. In the meantime, if you have your own idea, please feel free to submit a PR.