Open shaikatz opened 6 years ago
I'll add that with the current codebase there is no way to know that we've "entered" a chaos period (a period with no exclusions) or "exit" (a period that excluded). The only way to understand that as I see it, is to check if the previous interval was in exclusion time.
Could we add a prometheus metric for that?
I often saw some "binary" metrics that you are supposed to "query" via labels and then either return 0 or 1, e.g.
up{application="dnsmasq-node", ...} 1
up{instance="ip-172-31-8-124.eu-central-1.compute.internal", ...} 1
...
We could expose something similar for chaoskube, e.g.
chaoskube_active 1 # or 0
It's not comparable to a notification but at least you would have a structured way to find out whether chaoskube is in an active period right now.
We can definitely add a prometheus metric for that, but that would serve us for alerts and monitoring.
I would still like to make a slack notification for that, so the team will be actively aware when a chaos is running, and be prepared if action is required.
Do you prefer to avoid adding slack capabilities to this tool?
Not at all. I just wanted to provide an inferior alternative that might save us some effort.
I'm fine with adding slack support. How about putting it in a separate package and hiding it behind a nice Notifer
interface?
Let me know how I can help. I'm looking forward to seeing what you come up with.
Great, I'll try to work on that soon and I hope to return with a PR 👍
I often use alertmanager to send notifications to different channels, including Slack. With this approach, a Prometheus metric is enough and there is a clear separation of responsibilities.
Hi,
I'm looking for a way to notify my team every-time the chaos bot started to perform actions. As Slack usage is widely used, that will be my preference.
I want to start and implement that capability for chaoskube.
Any thoughts?