rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.52k stars 229 forks source link

Spike: Notifications #2876

Closed manno closed 2 weeks ago

manno commented 1 month ago

We want to notify external services about events in Fleet, so users can build workflows around Fleet as a deployer.

For now we are focusing on one type of outgoing notification, a http request, e.g targeted at a webhook endpoint.

Notifications are configured by creating a new custom resource, with fields like: 
URL, template for request body, credential references, which events to react to A new controller will watch for notification configs and then generate requests from that config when an event happens.

Research

What events should generate a notification?

Do we need logic to combine events, e.g. only notify if A and B happened? How do other projects deal with notifications? Are there "generic" notifiers we can use, besides http requests?

bigkevmcd commented 1 month ago

@manno https://pkg.go.dev/k8s.io/client-go/tools/record ?

weyfonk commented 2 weeks ago

Findings

Scope

An MVP will need to include:

Out of scope

Design

Notifications should be sent:

Each notification should store the latest date/time when it was sent, to ease tracking (for users, and later on automated retries) and troubleshooting.

Configuration

Notifications must be configurable at 3 distinct levels:

Ideas on reconciler implementation

Links - Possibly of interest

bigkevmcd commented 2 weeks ago

I don't see any indication of the desired guarantees around notification delivery, e.g. "at least once" or "at most once" or "exactly once"?

For example, GitHub webhooks are not retried (but you can do this manually).

A lot of care needs to be taken to not send notifications out-of-order.

Imagine reporting the deployment of 2dd90c5 before 468c517 when they are ordered differently on main because the original attempt to delivery 468c517 failed the first time.

Some decisions could be made around translating events into hook notifications to upstream services (e.g. Slack, GitHub etc) assuming that in the longer term, Rancher components will all be sending notifications.