open-policy-agent / gatekeeper

🐊 Gatekeeper - Policy Controller for Kubernetes
https://open-policy-agent.github.io/gatekeeper/
Apache License 2.0
3.7k stars 759 forks source link

Alerting integration for violations #580

Open ritazh opened 4 years ago

ritazh commented 4 years ago

e.g. send violations to slack

This is from the CNCF webinar. To summary the ask from the webinar: When a violation is detected, it would be good to get an alert from this event into systems like Slack, Datadog, or Prometheus.

maxsmythe commented 4 years ago

Could you add a link or something so we know what the goal of this bug would be?

This possibly sounds like an enforcement action. I also wonder how this would interact with alerts sourced from Prometheus.

One danger to watch out for: API request volume to the admin server can be extremely high for some kinds, so we risk spamming the alert pipeline without some volume-reducing solution.

swapnild2111 commented 4 years ago

Hello,

I think I am waiting for same. I have deployed Gatekeeper with dryrun enabled. I can see Violations in status field. However am not sure how to set alert for this violations in Datadog / Prometheus / slack anywhere.

Could you please help?

maxsmythe commented 4 years ago

What kind of alerts are you looking to have?

swapnild2111 commented 4 years ago

Alerts as in send a slack message or show it in logs / metrics in datadog saying violations found wit details.

sozercan commented 4 years ago

We can do a write-up about integrating gatekeeper metrics with prometheus and alertmanager (which includes integrations with slack, datadog and many others)

Other than violations over a certain threshold, is there anything else you would like alerts on?

swapnild2111 commented 4 years ago

That would be great :)

maxsmythe commented 4 years ago

Also, the logs can be parsed for more detailed data about rejections to alert on.

bytemare commented 4 years ago

Hello there 👋 One of the teams I'm working with have deployed OPA Gatekeeper, and we would like to do the same to monitor every policy/compliance violation, not yet block deployments (or the devs would kill us).

Ideally, we would need alerts sent over webhooks in json or syslog, containing all the info about the violations.

Is this possible/configurable at this moment, or planned? I would gladly help if needed.

Thanks

maxsmythe commented 4 years ago

We are emitting audit violations via stderr/stdout logs. Are you able to pipe those into syslog/ELK/other log aggregator and use those to drive alerts?

That would probably give you the most detailed violation information.

swapnild2111 commented 4 years ago

What I have do is -

  1. Enabled enforcementAction: dryrun
  2. Added --log-denies
  3. Added unique log message for violations.

After this I could see violations in logs, which I am streaming to Datadog.

In Datadog, I have created charts & added monitors by tracing those unique log messages. The things I can do with this approach are pretty limited.

If I get dryrun_violation_count etc in metrics, things will become much more easier.

sozercan commented 4 years ago

@swapnild2111 you can get violations count, like:

gatekeeper_violations{enforcement_action="deny"} 19
gatekeeper_violations{enforcement_action="dryrun"} 7

See https://github.com/open-policy-agent/gatekeeper/blob/master/docs/Metrics.md for list of all metrics

swapnild2111 commented 4 years ago

thank you, it worked perfectly for me :)

lechuk47 commented 4 years ago

It would be useful to have the constraint details as metric tags. e.g. Having the constraint name and type as tags in the violation metrics will be enough to set alerts on Prometheus.

morganwalker commented 4 years ago

@swapnild2111 how did you leverage those metrics via Datadog? While I plan on parsing the logs ingested to DD for violations, ideally I'd like to be able to use the metrics in DD.

swapnild2111 commented 3 years ago

@morganwalker sorry for very late reply.

I have below annotations on my deployment to send prometheus metrics to Datadog:

ad.datadoghq.com/manager.check_names: '["prometheus"]'
ad.datadoghq.com/manager.init_configs: '[{}]'
ad.datadoghq.com/manager.instances: '[{"prometheus_url":"http://%%host%%:8888/metrics", "namespace": "gatekeeper-system", "metrics":["*"]}]'
prometheus.io/port: "8888"
prometheus.io/scrape: "true"

Would that be helpful for you?

teochenglim commented 3 years ago

Anyone working on slack yet?

allow an optional enable slack feature, then you just need 2 inputs usually which is "slack webhook url" and "which channel to send to". Also since webhook is used, you just need a HTTP Post to make it work so not much dependency.

My suggestion is to have 2 kind of slack massage

  1. realtime, scan per violet message
  2. every hour/day/week report of all how many occurrence count, report type. a bit more complex because you need to hold variable, but maybe can reuse prometheus existing metrics?

Same as many other frameworks, slack webhook URL could be created as a k8s secret

helm example

slack:
  enable: true ### default false
  slack_channel: ""
  slack_title: ""  ## If we monitor all security audit in 1 single room, it will be helpful to have a title to know
  slack_text_prefix: ""   ## we can create prefix to tell which cluster is this message from, staging/production
  slack_text_subfix: ""
  slack_webhook_url: ""  ## we don't want create the k8s secret, just give URL. 
  slack_secret_name: ""  ## where we define slack webhook URL
  slack_report:
    slack_cron: "* 2 * * *" ## minute, hour, day (month), month, day (week)
maxsmythe commented 3 years ago

@sozercan did we ever document alert manager integrations? That seems like it would address use case #2.

As for use case #1, that sounds similar to:

1037

898

The push based pipeline referenced in #897

IIRC we were also thinking about generic webhook-based reporting at some point

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

debu99 commented 1 year ago

Is this feature available now?

a-thorat commented 1 year ago

@swapnild2111 @maxsmythe
I am trying to implement the violation alerting with MS teams for Gatekeeper Operator installled on OpenShift V.4.13 but not able to achieve as everything coming out of operator. any idea how i can integrate here