resmoio / kubernetes-event-exporter

Export Kubernetes events to multiple destinations with routing and filtering
Apache License 2.0
753 stars 149 forks source link

Improves leader election so that we don't lose events during leadership changes #153

Closed ronaknnathani closed 6 months ago

ronaknnathani commented 7 months ago

Details

Currently, when a replica loses its leadership, a new leader isn't elected until leaseDuration seconds. Here, that is 15s. The max time till we get a new leader is leaseDuration (15s) + retryPeriod (2s) = 17s.

This commit updates the shutdown process such that if the leader replica is sent a shutdown signal, it sleeps for leaseDuration seconds. This allows the leader replica to continue to export events until a new leader is elected. And a new leader is elected only if lease hasn't been renewed and leaseDuration expires.

In addition to this, other smaller changes include:

For use cases where no event loss is tolerable, users should use maxEventAgeSeconds to > 1.

Issues addressed

https://github.com/resmoio/kubernetes-event-exporter/pull/34 is abandoned and this change takes care of what it was trying to do. Once this PR is shipped, https://github.com/resmoio/kubernetes-event-exporter/pull/34 can be closed. I already discussed it with @xmcqueen and he's okay with me taking this forward.

Testing done

Tests ``` go test -cover -mod=mod -v ./... ? github.com/resmoio/kubernetes-event-exporter [no test files] === RUN TestSimpleWriter --- PASS: TestSimpleWriter (0.00s) === RUN TestCorrectnessManyTimes --- PASS: TestCorrectnessManyTimes (0.07s) === RUN TestLargerThanBatchSize --- PASS: TestLargerThanBatchSize (0.00s) === RUN TestSimpleInterval --- PASS: TestSimpleInterval (0.06s) === RUN TestIntervalComplex --- PASS: TestIntervalComplex (0.06s) === RUN TestIntervalComplexAfterFlush --- PASS: TestIntervalComplexAfterFlush (0.06s) === RUN TestRetry --- PASS: TestRetry (0.20s) PASS github.com/resmoio/kubernetes-event-exporter/pkg/batch coverage: 100.0% of statements ok github.com/resmoio/kubernetes-event-exporter/pkg/batch 0.740s coverage: 100.0% of statements ? github.com/resmoio/kubernetes-event-exporter/pkg/metrics [no test files] ? github.com/resmoio/kubernetes-event-exporter/pkg/version [no test files] === RUN Test_ParseConfig --- PASS: Test_ParseConfig (0.00s) === RUN TestValidate_IsCheckingMaxEventAgeSeconds_WhenNotSet {"level":"info","time":"2024-01-17T15:48:01-05:00","message":"setting config.maxEventAgeSeconds=5 (default)"} {"level":"warn","time":"2024-01-17T15:48:01-05:00","message":"metrics name prefix is empty, setting config.metricsNamePrefix='event_exporter_' is recommended"} --- PASS: TestValidate_IsCheckingMaxEventAgeSeconds_WhenNotSet (0.00s) === RUN TestValidate_IsCheckingMaxEventAgeSeconds_WhenThrottledPeriodSet --- PASS: TestValidate_IsCheckingMaxEventAgeSeconds_WhenThrottledPeriodSet (0.00s) === RUN TestValidate_IsCheckingMaxEventAgeSeconds_WhenMaxEventAgeSecondsSet --- PASS: TestValidate_IsCheckingMaxEventAgeSeconds_WhenMaxEventAgeSecondsSet (0.00s) === RUN TestValidate_IsCheckingMaxEventAgeSeconds_WhenMaxEventAgeSecondsAndThrottledPeriodSet --- PASS: TestValidate_IsCheckingMaxEventAgeSeconds_WhenMaxEventAgeSecondsAndThrottledPeriodSet (0.00s) === RUN TestValidate_MetricsNamePrefix_WhenEmpty --- PASS: TestValidate_MetricsNamePrefix_WhenEmpty (0.00s) === RUN TestValidate_MetricsNamePrefix_WhenValid --- PASS: TestValidate_MetricsNamePrefix_WhenValid (0.00s) === RUN TestValidate_MetricsNamePrefix_WhenInvalid --- PASS: TestValidate_MetricsNamePrefix_WhenInvalid (0.00s) === RUN TestSetDefaults --- PASS: TestSetDefaults (0.00s) === RUN TestEngineNoRoutes --- PASS: TestEngineNoRoutes (0.00s) === RUN TestEngineSimple --- PASS: TestEngineSimple (0.00s) === RUN TestEngineDropSimple --- PASS: TestEngineDropSimple (0.00s) === RUN TestEmptyRoute --- PASS: TestEmptyRoute (0.00s) === RUN TestBasicRoute --- PASS: TestBasicRoute (0.00s) === RUN TestDropRule --- PASS: TestDropRule (0.00s) === RUN TestSingleLevelMultipleMatchRoute --- PASS: TestSingleLevelMultipleMatchRoute (0.00s) === RUN TestSubRoute --- PASS: TestSubRoute (0.00s) === RUN TestSubSubRoute --- PASS: TestSubSubRoute (0.00s) === RUN TestSubSubRouteWithDrop --- PASS: TestSubSubRouteWithDrop (0.00s) === RUN Test_GHIssue51 --- PASS: Test_GHIssue51 (0.00s) === RUN TestEmptyRule --- PASS: TestEmptyRule (0.00s) === RUN TestBasicRule --- PASS: TestBasicRule (0.00s) === RUN TestBasicNoMatchRule --- PASS: TestBasicNoMatchRule (0.00s) === RUN TestBasicRegexRule --- PASS: TestBasicRegexRule (0.00s) === RUN TestLabelRegexRule --- PASS: TestLabelRegexRule (0.00s) === RUN TestOneLabelMatchesRule --- PASS: TestOneLabelMatchesRule (0.00s) === RUN TestOneLabelDoesNotMatchRule --- PASS: TestOneLabelDoesNotMatchRule (0.00s) === RUN TestTwoLabelMatchesRule --- PASS: TestTwoLabelMatchesRule (0.00s) === RUN TestTwoLabelRequiredRule --- PASS: TestTwoLabelRequiredRule (0.00s) === RUN TestTwoLabelRequiredOneMissingRule --- PASS: TestTwoLabelRequiredOneMissingRule (0.00s) === RUN TestOneAnnotationMatchesRule --- PASS: TestOneAnnotationMatchesRule (0.00s) === RUN TestOneAnnotationDoesNotMatchRule --- PASS: TestOneAnnotationDoesNotMatchRule (0.00s) === RUN TestTwoAnnotationsMatchesRule --- PASS: TestTwoAnnotationsMatchesRule (0.00s) === RUN TestTwoAnnotationsRequiredOneMissingRule --- PASS: TestTwoAnnotationsRequiredOneMissingRule (0.00s) === RUN TestComplexRuleNoMatch --- PASS: TestComplexRuleNoMatch (0.00s) === RUN TestComplexRuleMatches --- PASS: TestComplexRuleMatches (0.00s) === RUN TestComplexRuleAnnotationsNoMatch --- PASS: TestComplexRuleAnnotationsNoMatch (0.00s) === RUN TestComplexRuleMatchesRegexp --- PASS: TestComplexRuleMatchesRegexp (0.00s) === RUN TestComplexRuleNoMatchRegexp --- PASS: TestComplexRuleNoMatchRegexp (0.00s) === RUN TestMessageRegexp --- PASS: TestMessageRegexp (0.00s) === RUN TestCount --- PASS: TestCount (0.00s) PASS github.com/resmoio/kubernetes-event-exporter/pkg/exporter coverage: 68.9% of statements ok github.com/resmoio/kubernetes-event-exporter/pkg/exporter 1.635s coverage: 68.9% of statements === RUN TestEnhancedEvent_DeDot === RUN TestEnhancedEvent_DeDot/nothing === RUN TestEnhancedEvent_DeDot/dedot --- PASS: TestEnhancedEvent_DeDot (0.00s) --- PASS: TestEnhancedEvent_DeDot/nothing (0.00s) --- PASS: TestEnhancedEvent_DeDot/dedot (0.00s) === RUN TestEnhancedEvent_DeDot_MustNotAlternateOriginal --- PASS: TestEnhancedEvent_DeDot_MustNotAlternateOriginal (0.00s) === RUN TestEventWatcher_EventAge_whenEventCreatedBeforeStartup --- PASS: TestEventWatcher_EventAge_whenEventCreatedBeforeStartup (0.00s) === RUN TestEventWatcher_EventAge_whenEventCreatedAfterStartupAndBeforeMaxAge --- PASS: TestEventWatcher_EventAge_whenEventCreatedAfterStartupAndBeforeMaxAge (0.00s) === RUN TestEventWatcher_EventAge_whenEventCreatedAfterStartupAndAfterMaxAge --- PASS: TestEventWatcher_EventAge_whenEventCreatedAfterStartupAndAfterMaxAge (0.00s) === RUN TestOnEvent_WithObjectMetadata --- PASS: TestOnEvent_WithObjectMetadata (0.00s) === RUN TestOnEvent_DeletedObjects --- PASS: TestOnEvent_DeletedObjects (0.00s) PASS github.com/resmoio/kubernetes-event-exporter/pkg/kube coverage: 29.7% of statements ok github.com/resmoio/kubernetes-event-exporter/pkg/kube 0.614s coverage: 29.7% of statements === RUN Test_ParseConfigFromBytes_ExampleConfigIsCorrect --- PASS: Test_ParseConfigFromBytes_ExampleConfigIsCorrect (0.00s) === RUN Test_ParseConfigFromBytes_NoErrors --- PASS: Test_ParseConfigFromBytes_NoErrors (0.00s) === RUN Test_ParseConfigFromBytes_ErrorWhenCurlyBracesNotEscaped --- PASS: Test_ParseConfigFromBytes_ErrorWhenCurlyBracesNotEscaped (0.00s) === RUN Test_ParseConfigFromBytes_OkWhenCurlyBracesEscaped --- PASS: Test_ParseConfigFromBytes_OkWhenCurlyBracesEscaped (0.00s) === RUN Test_ParseConfigFromBytes_ErrorErrorNotWithCurlyBraces --- PASS: Test_ParseConfigFromBytes_ErrorErrorNotWithCurlyBraces (0.00s) PASS github.com/resmoio/kubernetes-event-exporter/pkg/setup coverage: 100.0% of statements ok github.com/resmoio/kubernetes-event-exporter/pkg/setup 2.660s coverage: 100.0% of statements === RUN TestOpsCenterSink_Send === RUN TestOpsCenterSink_Send/Simple_Create === RUN TestOpsCenterSink_Send/Invalid_Priority:_Want_err --- PASS: TestOpsCenterSink_Send (0.00s) --- PASS: TestOpsCenterSink_Send/Simple_Create (0.00s) --- PASS: TestOpsCenterSink_Send/Invalid_Priority:_Want_err (0.00s) === RUN TestTeams_Send --- PASS: TestTeams_Send (0.00s) === RUN TestTeams_Send_WhenTeamsReturnsRateLimited --- PASS: TestTeams_Send_WhenTeamsReturnsRateLimited (0.00s) === RUN TestLayoutConvert --- PASS: TestLayoutConvert (0.00s) PASS github.com/resmoio/kubernetes-event-exporter/pkg/sinks coverage: 13.4% of statements ok github.com/resmoio/kubernetes-event-exporter/pkg/sinks 3.337s coverage: 13.4% of statements ```
When leader election is disabled **No sleeping during shutdown** ``` ➜ ~ kube logs event-exporter-7ddc6ff9b-rbs6w -f {"level":"info","time":"2024-01-17T20:53:00Z","message":"Reading config file /data/config.yaml"} {"level":"info","time":"2024-01-17T20:53:00Z","message":"Starting with config: {LogLevel:info LogFormat:json ThrottlePeriod:0 MaxEventAgeSeconds:0 ClusterName: Namespace: LeaderElection:{Enabled:false LeaderElectionID:} Route:{Drop:[] Match:[] Routes:[{Drop:[] Match:[{Labels:map[] Annotations:map[] Message: APIVersion: Kind: Namespace: Reason: Type: MinCount:0 Component: Host: Receiver:dump}] Routes:[]}]} Receivers:[{Name:dump InMemory: Webhook: File: Syslog: Stdout:0x40004023c0 Elasticsearch: Kinesis: Firehose: OpenSearch: Opsgenie: Loki: SQS: SNS: Slack: Kafka: Pubsub: Opscenter: Teams: BigQuery: EventBridge: Pipe:}] KubeQPS:5 KubeBurst:10 MetricsNamePrefix:event_exporter_ OmitLookup:false CacheSize:1024}"} {"level":"info","time":"2024-01-17T20:53:00Z","message":"setting config.maxEventAgeSeconds=5 (default)"} {"level":"info","time":"2024-01-17T20:53:00Z","message":"config.metricsNamePrefix='event_exporter_'"} {"level":"info","name":"dump","type":"*sinks.Stdout","time":"2024-01-17T20:53:00Z","message":"Registering sink"} {"level":"info","time":"2024-01-17T20:53:00Z","message":"levelinfomsgListening onaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:53:00Z","message":"levelinfomsgTLS is disabled.http2falseaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:53:00Z","message":"leader election disabled"} {"metadata":{"name":"event-exporter-7ddc6ff9b-rbs6w.17ab3dc027e84cbd","namespace":"monitoring","uid":"63cea172-2877-4834-91d0-08a25b49bc5b","resourceVersion":"4094494","creationTimestamp":"2024-01-17T20:52:59Z"},"reason":"Scheduled","message":"Successfully assigned monitoring/event-exporter-7ddc6ff9b-rbs6w to kind-control-plane","source":{},"firstTimestamp":null,"lastTimestamp":null,"type":"Normal","eventTime":"2024-01-17T20:52:59.402379Z","action":"Binding","reportingComponent":"default-scheduler","reportingInstance":"default-scheduler-kind-control-plane","clusterName":"","involvedObject":{"kind":"Pod","namespace":"monitoring","name":"event-exporter-7ddc6ff9b-rbs6w","uid":"6ab948ac-9da6-42bc-9fa7-acebf96233e7","apiVersion":"v1","resourceVersion":"4094488","labels":{"app":"event-exporter","pod-template-hash":"7ddc6ff9b","version":"v1"},"annotations":{"prometheus.io/path":"/metrics","prometheus.io/port":"2112","prometheus.io/scrape":"true"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"event-exporter-7ddc6ff9b","uid":"ef0e28c8-f041-4757-9b60-d1ee0ed8844a","controller":true,"blockOwnerDeletion":true}],"deleted":false}} ... {"level":"info","time":"2024-01-17T20:53:44Z","message":"Received signal to exit. Stopping."} {"level":"info","time":"2024-01-17T20:53:44Z","message":"Closing sinks"} ... {"level":"info","sink":"dump","time":"2024-01-17T20:53:44Z","message":"Closing the sink"} {"level":"info","sink":"dump","time":"2024-01-17T20:53:44Z","message":"Closed"} {"level":"info","time":"2024-01-17T20:53:44Z","message":"All sinks closed"} ```
When leader election is enabled **We have 2 replicas** ``` ➜ ~ kube get po NAME READY STATUS RESTARTS AGE event-exporter-7ddc6ff9b-fhcvq 1/1 Running 0 101s <------- LEADER event-exporter-7ddc6ff9b-hkzcw 1/1 Running 0 27s ``` **Non-leader instance stops right awat and doesn't wait for leaseDuration before stopping** ``` ✗ kube logs event-exporter-7ddc6ff9b-hkzcw -f {"level":"info","time":"2024-01-17T20:58:03Z","message":"Reading config file /data/config.yaml"} {"level":"info","time":"2024-01-17T20:58:03Z","message":"Starting with config: {LogLevel:info LogFormat:json ThrottlePeriod:0 MaxEventAgeSeconds:0 ClusterName: Namespace: LeaderElection:{Enabled:true LeaderElectionID:} Route:{Drop:[] Match:[] Routes:[{Drop:[] Match:[{Labels:map[] Annotations:map[] Message: APIVersion: Kind: Namespace: Reason: Type: MinCount:0 Component: Host: Receiver:dump}] Routes:[]}]} Receivers:[{Name:dump InMemory: Webhook: File: Syslog: Stdout:0x4000417900 Elasticsearch: Kinesis: Firehose: OpenSearch: Opsgenie: Loki: SQS: SNS: Slack: Kafka: Pubsub: Opscenter: Teams: BigQuery: EventBridge: Pipe:}] KubeQPS:5 KubeBurst:10 MetricsNamePrefix:event_exporter_ OmitLookup:false CacheSize:1024}"} {"level":"info","time":"2024-01-17T20:58:03Z","message":"setting config.maxEventAgeSeconds=5 (default)"} {"level":"info","time":"2024-01-17T20:58:03Z","message":"config.metricsNamePrefix='event_exporter_'"} {"level":"info","name":"dump","type":"*sinks.Stdout","time":"2024-01-17T20:58:03Z","message":"Registering sink"} {"level":"info","time":"2024-01-17T20:58:03Z","message":"levelinfomsgListening onaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:58:03Z","message":"levelinfomsgTLS is disabled.http2falseaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:58:03Z","message":"leader election enabled"} I0117 20:58:03.963426 1 leaderelection.go:248] attempting to acquire leader lease monitoring/kubernetes-event-exporter... {"level":"info","time":"2024-01-17T20:58:03Z","message":"new leader observed: event-exporter-7ddc6ff9b-fhcvq_b624ce71-c9a4-4e01-9795-3b833c475ae3"} {"level":"info","time":"2024-01-17T20:58:49Z","message":"Context was cancelled, stopping leader election loop"} {"level":"info","time":"2024-01-17T20:58:49Z","message":"Received signal to exit. Stopping."} {"level":"info","time":"2024-01-17T20:58:49Z","message":"Closing sinks"} {"level":"info","sink":"dump","time":"2024-01-17T20:58:49Z","message":"Closing the sink"} {"level":"info","sink":"dump","time":"2024-01-17T20:58:49Z","message":"Closed"} {"level":"info","time":"2024-01-17T20:58:49Z","message":"All sinks closed"} ``` When the non-leader instance is deleted, we get another replica ``` ➜ ~ kube get po NAME READY STATUS RESTARTS AGE event-exporter-7ddc6ff9b-6vvgj 1/1 Running 0 93s event-exporter-7ddc6ff9b-fhcvq 1/1 Running 0 3m33s <----- STILL THE LEADER ``` **Leader instance waits for leaseDuration before stopping** ``` ➜ ~ kube logs event-exporter-7ddc6ff9b-fhcvq -f {"level":"info","time":"2024-01-17T20:56:50Z","message":"Reading config file /data/config.yaml"} {"level":"info","time":"2024-01-17T20:56:50Z","message":"Starting with config: {LogLevel:info LogFormat:json ThrottlePeriod:0 MaxEventAgeSeconds:0 ClusterName: Namespace: LeaderElection:{Enabled:true LeaderElectionID:} Route:{Drop:[] Match:[] Routes:[{Drop:[] Match:[{Labels:map[] Annotations:map[] Message: APIVersion: Kind: Namespace: Reason: Type: MinCount:0 Component: Host: Receiver:dump}] Routes:[]}]} Receivers:[{Name:dump InMemory: Webhook: File: Syslog: Stdout:0x4000497ce0 Elasticsearch: Kinesis: Firehose: OpenSearch: Opsgenie: Loki: SQS: SNS: Slack: Kafka: Pubsub: Opscenter: Teams: BigQuery: EventBridge: Pipe:}] KubeQPS:5 KubeBurst:10 MetricsNamePrefix:event_exporter_ OmitLookup:false CacheSize:1024}"} {"level":"info","time":"2024-01-17T20:56:50Z","message":"setting config.maxEventAgeSeconds=5 (default)"} {"level":"info","time":"2024-01-17T20:56:50Z","message":"config.metricsNamePrefix='event_exporter_'"} {"level":"info","name":"dump","type":"*sinks.Stdout","time":"2024-01-17T20:56:50Z","message":"Registering sink"} {"level":"info","time":"2024-01-17T20:56:50Z","message":"levelinfomsgListening onaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:56:50Z","message":"levelinfomsgTLS is disabled.http2falseaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:56:50Z","message":"leader election enabled"} I0117 20:56:50.138717 1 leaderelection.go:248] attempting to acquire leader lease monitoring/kubernetes-event-exporter... {"level":"info","time":"2024-01-17T20:56:50Z","message":"new leader observed: event-exporter-7ddc6ff9b-qdfqk_c4e05734-4456-4ab0-b310-3551e4557e9d"} I0117 20:57:05.385050 1 leaderelection.go:258] successfully acquired lease monitoring/kubernetes-event-exporter {"level":"info","time":"2024-01-17T20:57:05Z","message":"new leader observed: event-exporter-7ddc6ff9b-fhcvq_b624ce71-c9a4-4e01-9795-3b833c475ae3"} {"level":"info","time":"2024-01-17T20:57:05Z","message":"leader election won"} {"metadata":{"name":"event-exporter.17ab3e06eafa3243","namespace":"monitoring","uid":"16ee7e87-d582-4454-943c-e0657c90786f","resourceVersion":"4094980","creationTimestamp":"2024-01-17T20:58:03Z"},"reason":"ScalingReplicaSet","message":"Scaled up replica set event-exporter-7ddc6ff9b to 2 from 1","source":{"component":"deployment-controller"},"firstTimestamp":"2024-01-17T20:58:03Z","lastTimestamp":"2024-01-17T20:58:03Z","count":1,"type":"Normal","eventTime":null,"reportingComponent":"deployment-controller","reportingInstance":"","clusterName":"","involvedObject":{"kind":"Deployment","namespace":"monitoring","name":"event-exporter","uid":"6357931f-4225-47cd-904f-c22f4ce524bf","apiVersion":"apps/v1","resourceVersion":"4094977","annotations":{"deployment.kubernetes.io/revision":"1","kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{},\"name\":\"event-exporter\",\"namespace\":\"monitoring\"},\"spec\":{\"replicas\":1,\"selector\":{\"matchLabels\":{\"app\":\"event-exporter\",\"version\":\"v1\"}},\"strategy\":{\"rollingUpdate\":{\"maxSurge\":0,\"maxUnavailable\":1},\"type\":\"RollingUpdate\"},\"template\":{\"metadata\":{\"annotations\":{\"prometheus.io/path\":\"/metrics\",\"prometheus.io/port\":\"2112\",\"prometheus.io/scrape\":\"true\"},\"labels\":{\"app\":\"event-exporter\",\"version\":\"v1\"}},\"spec\":{\"containers\":[{\"args\":[\"-conf=/data/config.yaml\"],\"image\":\"kubernetes-event-exporter:local\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"event-exporter\",\"securityContext\":{\"allowPrivilegeEscalation\":false,\"capabilities\":{\"drop\":[\"ALL\"]}},\"volumeMounts\":[{\"mountPath\":\"/data\",\"name\":\"cfg\"}]}],\"securityContext\":{\"runAsNonRoot\":true,\"seccompProfile\":{\"type\":\"RuntimeDefault\"}},\"serviceAccountName\":\"event-exporter\",\"volumes\":[{\"configMap\":{\"name\":\"event-exporter-cfg\"},\"name\":\"cfg\"}]}}}}\n"},"deleted":false}} ... {"level":"info","time":"2024-01-17T21:00:37Z","message":"Context was cancelled, stopping leader election loop"} {"level":"info","time":"2024-01-17T21:00:37Z","message":"waiting leaseDuration seconds before stopping: 15s"} ... {"metadata":{"name":"event-exporter-7ddc6ff9b-s6zfg.17ab3e2b101bc05a","namespace":"monitoring","uid":"e054b91c-92ca-4331-8b61-27f26ea503c4","resourceVersion":"4095306","creationTimestamp":"2024-01-17T21:00:38Z"},"reason":"Started","message":"Started container event-exporter","source":{"component":"kubelet","host":"kind-control-plane"},"firstTimestamp":"2024-01-17T21:00:38Z","lastTimestamp":"2024-01-17T21:00:38Z","count":1,"type":"Normal","eventTime":null,"reportingComponent":"kubelet","reportingInstance":"kind-control-plane","clusterName":"","involvedObject":{"kind":"Pod","namespace":"monitoring","name":"event-exporter-7ddc6ff9b-s6zfg","uid":"fe1e9f75-3118-43cf-b48c-1ac347152a6b","apiVersion":"v1","resourceVersion":"4095296","fieldPath":"spec.containers{event-exporter}","labels":{"app":"event-exporter","pod-template-hash":"7ddc6ff9b","version":"v1"},"annotations":{"prometheus.io/path":"/metrics","prometheus.io/port":"2112","prometheus.io/scrape":"true"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"event-exporter-7ddc6ff9b","uid":"dd51869d-b43a-43ad-b1ac-8821572496b6","controller":true,"blockOwnerDeletion":true}],"deleted":false}} ... {"level":"info","time":"2024-01-17T21:00:52Z","message":"Received signal to exit. Stopping."} {"level":"info","time":"2024-01-17T21:00:52Z","message":"Closing sinks"} {"level":"info","sink":"dump","time":"2024-01-17T21:00:52Z","message":"Closing the sink"} {"level":"info","sink":"dump","time":"2024-01-17T21:00:52Z","message":"Closed"} {"level":"info","time":"2024-01-17T21:00:52Z","message":"All sinks closed"} ``` **The other pod becomes the new leader** When the leader replica, `event-exporter-7ddc6ff9b-fhcvq`, is deleted, the other pod becomes the leader. ``` ✗ kube logs event-exporter-7ddc6ff9b-6vvgj -f {"level":"info","time":"2024-01-17T20:58:50Z","message":"Reading config file /data/config.yaml"} {"level":"info","time":"2024-01-17T20:58:50Z","message":"Starting with config: {LogLevel:info LogFormat:json ThrottlePeriod:0 MaxEventAgeSeconds:0 ClusterName: Namespace: LeaderElection:{Enabled:true LeaderElectionID:} Route:{Drop:[] Match:[] Routes:[{Drop:[] Match:[{Labels:map[] Annotations:map[] Message: APIVersion: Kind: Namespace: Reason: Type: MinCount:0 Component: Host: Receiver:dump}] Routes:[]}]} Receivers:[{Name:dump InMemory: Webhook: File: Syslog: Stdout:0x4000368920 Elasticsearch: Kinesis: Firehose: OpenSearch: Opsgenie: Loki: SQS: SNS: Slack: Kafka: Pubsub: Opscenter: Teams: BigQuery: EventBridge: Pipe:}] KubeQPS:5 KubeBurst:10 MetricsNamePrefix:event_exporter_ OmitLookup:false CacheSize:1024}"} {"level":"info","time":"2024-01-17T20:58:50Z","message":"setting config.maxEventAgeSeconds=5 (default)"} {"level":"info","time":"2024-01-17T20:58:50Z","message":"config.metricsNamePrefix='event_exporter_'"} {"level":"info","name":"dump","type":"*sinks.Stdout","time":"2024-01-17T20:58:50Z","message":"Registering sink"} {"level":"info","time":"2024-01-17T20:58:50Z","message":"levelinfomsgListening onaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:58:50Z","message":"levelinfomsgTLS is disabled.http2falseaddress[::]:2112"} {"level":"info","time":"2024-01-17T20:58:50Z","message":"leader election enabled"} I0117 20:58:50.372635 1 leaderelection.go:248] attempting to acquire leader lease monitoring/kubernetes-event-exporter... {"level":"info","time":"2024-01-17T20:58:50Z","message":"new leader observed: event-exporter-7ddc6ff9b-fhcvq_b624ce71-c9a4-4e01-9795-3b833c475ae3"} I0117 21:00:53.972282 1 leaderelection.go:258] successfully acquired lease monitoring/kubernetes-event-exporter {"level":"info","time":"2024-01-17T21:00:53Z","message":"new leader observed: event-exporter-7ddc6ff9b-6vvgj_e3d6ae20-1980-4218-b797-537ac942e5ea"} {"level":"info","time":"2024-01-17T21:00:53Z","message":"leader election won"} {"level":"warn","event age":"16.978700554s","event namespace":"monitoring","event name":"event-exporter-7ddc6ff9b-fhcvq.17ab3e2ae97a4b57","time":"2024-01-17T21:00:53Z","message":"Event discarded as being older then maxEventAgeSeconds"} ``` **Leadership transition** It can be seen in the logs that the old leader pod shuts down at `2024-01-17T21:00:52Z` and the other pod becomes leader at `2024-01-17T21:00:53Z`.
ronaknnathani commented 7 months ago

@mustafaakin Can I get a review for this?

https://github.com/resmoio/kubernetes-event-exporter/pull/34 is abandoned and this change takes care of what it was trying to do. Once this PR is shipped, https://github.com/resmoio/kubernetes-event-exporter/pull/34 can be closed. I already discussed it with @xmcqueen and he's okay with me taking this forward.

Let me know if you have any questions.

Thanks!

ronaknnathani commented 7 months ago

@mustafaakin gentle ping. any thoughts on this? if you're busy, is there someone else on the team who can review this?

mustafaakin commented 6 months ago

Thanks a lot for a detailed PR. I wanted to switch to Lease at some point and there are issues where events are missed so I think it would resolve at least a few of them just due to lag in leader election.