Open ritazh opened 4 years ago
Could you add a link or something so we know what the goal of this bug would be?
This possibly sounds like an enforcement action. I also wonder how this would interact with alerts sourced from Prometheus.
One danger to watch out for: API request volume to the admin server can be extremely high for some kinds, so we risk spamming the alert pipeline without some volume-reducing solution.
Hello,
I think I am waiting for same. I have deployed Gatekeeper with dryrun enabled. I can see Violations in status field. However am not sure how to set alert for this violations in Datadog / Prometheus / slack anywhere.
Could you please help?
What kind of alerts are you looking to have?
Alerts as in send a slack message or show it in logs / metrics in datadog saying violations found wit details.
We can do a write-up about integrating gatekeeper metrics with prometheus and alertmanager (which includes integrations with slack, datadog and many others)
Other than violations over a certain threshold, is there anything else you would like alerts on?
That would be great :)
Also, the logs can be parsed for more detailed data about rejections to alert on.
Hello there 👋 One of the teams I'm working with have deployed OPA Gatekeeper, and we would like to do the same to monitor every policy/compliance violation, not yet block deployments (or the devs would kill us).
Ideally, we would need alerts sent over webhooks in json or syslog, containing all the info about the violations.
Is this possible/configurable at this moment, or planned? I would gladly help if needed.
Thanks
We are emitting audit violations via stderr/stdout logs. Are you able to pipe those into syslog/ELK/other log aggregator and use those to drive alerts?
That would probably give you the most detailed violation information.
What I have do is -
enforcementAction: dryrun
--log-denies
After this I could see violations in logs, which I am streaming to Datadog.
In Datadog, I have created charts & added monitors by tracing those unique log messages. The things I can do with this approach are pretty limited.
If I get dryrun_violation_count etc in metrics, things will become much more easier.
@swapnild2111 you can get violations count, like:
gatekeeper_violations{enforcement_action="deny"} 19
gatekeeper_violations{enforcement_action="dryrun"} 7
See https://github.com/open-policy-agent/gatekeeper/blob/master/docs/Metrics.md for list of all metrics
thank you, it worked perfectly for me :)
It would be useful to have the constraint details as metric tags. e.g. Having the constraint name and type as tags in the violation metrics will be enough to set alerts on Prometheus.
@swapnild2111 how did you leverage those metrics via Datadog? While I plan on parsing the logs ingested to DD for violations, ideally I'd like to be able to use the metrics in DD.
@morganwalker sorry for very late reply.
I have below annotations on my deployment to send prometheus metrics to Datadog:
ad.datadoghq.com/manager.check_names: '["prometheus"]'
ad.datadoghq.com/manager.init_configs: '[{}]'
ad.datadoghq.com/manager.instances: '[{"prometheus_url":"http://%%host%%:8888/metrics", "namespace": "gatekeeper-system", "metrics":["*"]}]'
prometheus.io/port: "8888"
prometheus.io/scrape: "true"
Would that be helpful for you?
Anyone working on slack yet?
allow an optional enable slack feature, then you just need 2 inputs usually which is "slack webhook url" and "which channel to send to". Also since webhook is used, you just need a HTTP Post to make it work so not much dependency.
My suggestion is to have 2 kind of slack massage
Same as many other frameworks, slack webhook URL could be created as a k8s secret
helm example
slack:
enable: true ### default false
slack_channel: ""
slack_title: "" ## If we monitor all security audit in 1 single room, it will be helpful to have a title to know
slack_text_prefix: "" ## we can create prefix to tell which cluster is this message from, staging/production
slack_text_subfix: ""
slack_webhook_url: "" ## we don't want create the k8s secret, just give URL.
slack_secret_name: "" ## where we define slack webhook URL
slack_report:
slack_cron: "* 2 * * *" ## minute, hour, day (month), month, day (week)
@sozercan did we ever document alert manager integrations? That seems like it would address use case #2.
As for use case #1, that sounds similar to:
The push based pipeline referenced in #897
IIRC we were also thinking about generic webhook-based reporting at some point
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Is this feature available now?
@swapnild2111 @maxsmythe
I am trying to implement the violation alerting with MS teams for Gatekeeper Operator installled on OpenShift V.4.13 but not able to achieve as everything coming out of operator. any idea how i can integrate here
e.g. send violations to slack
This is from the CNCF webinar. To summary the ask from the webinar: When a violation is detected, it would be good to get an alert from this event into systems like Slack, Datadog, or Prometheus.