kubeshop / botkube

An app that helps you monitor your Kubernetes cluster, debug critical deployments & gives recommendations for standard practices
https://botkube.io
MIT License
2.14k stars 289 forks source link

Segmentation Fault in botkube/kubernetes Plugin with autoscaling/v2 HPA Events Monitoring #1474

Closed washswat-west closed 2 days ago

washswat-west commented 1 week ago

Description

The botkube/kubernetes plugin encounters repeated memory access errors, specifically segmentation faults and nil pointer dereference issues, when configured with certain sources such as k8s-hpa-events. The issue consistently arises when attempting to monitor the autoscaling/v2 API group for horizontalpodautoscalers events related to HPA scaling. This results in the plugin crashing, and although the Plugin Health Monitor attempts to restart it several times, the plugin ultimately becomes deactivated.

I’m using BotKube v1.13.0, and the issue seems to occur during the plugin’s interaction with the Kubernetes API for these particular resources.

Expected behavior

With the k8s-hpa-events source enabled, BotKube should monitor horizontalpodautoscalers resources and successfully capture SuccessfulRescale events without crashing. These events should then be forwarded as notifications to Slack.

Actual behavior

When the k8s-hpa-events source is enabled, the following issues occur:

1.  The kubernetes plugin crashes with a plugin process exited error.
2.  The Plugin Health Monitor retries the plugin multiple times, but after several failures, it deactivates the plugin.
3.  Repeated segmentation faults and memory access errors such as invalid memory address or nil pointer dereference are logged.

Steps to reproduce

1. Install BotKube v1.13.0 via Helm and configure it to monitor autoscaling/v2 API group and horizontalpodautoscalers resources.

3. Apply the following configuration for k8s-hpa-events:
sources:
  'k8s-hpa-events':
    displayName: "HPA Scaling Events"
    botkube/kubernetes:
      context: &default-plugin-context
        rbac:
          group:
            type: Static
            prefix: ""
            static:
              values: ["botkube-plugins-default"]
      enabled: true
      config:
        namespaces:
          include:
            - ".*"
        resources:
          - type: autoscaling/v2/horizontalpodautoscalers
            event:
              types:
                - Normal
              reason:
                include:
                  - "SuccessfulRescale"
  1. Observe that when the k8s-hpa-events source is enabled, the plugin repeatedly crashes, with segmentation fault errors, and eventually deactivates.

Relevant logs:

{"level":"debug","msg":"github.com/kubeshop/botkube/internal/source/kubernetes.(*backgroundProcessor).Run.func1()","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:56:53Z"}
{"level":"debug","msg":"\t/home/runner/work/botkube/botkube/internal/source/kubernetes/bg_processor.go:48 +0x30","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:56:53Z"}
{"level":"debug","msg":"golang.org/x/sync/errgroup.(*Group).Go.func1()","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:56:53Z"}
{"level":"debug","msg":"\t/home/runner/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 +0x58","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:56:53Z"}
{"level":"debug","msg":"created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 20","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:56:53Z"}
{"level":"debug","msg":"\t/home/runner/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:72 +0x98","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:56:53Z"}
time="2024-11-01T06:56:53Z" level=error msg="canceling streaming: error reading from server: EOF"
time="2024-11-01T06:56:53Z" level=error msg="canceling streaming: error reading from server: EOF"
{"err":"rpc error: code = Unavailable desc = error reading from server: EOF","level":"debug","msg":"received EOF, stopping recv loop","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.stdio","time":"2024-11-01T06:56:53Z"}
{"error":"exit status 2","level":"error","msg":"plugin process exited","path":"/tmp/botkube/source_v1.13.0_kubernetes","pid":65,"plugin":"botkube/kubernetes","time":"2024-11-01T06:56:53Z"}
{"component":"Plugin Health Monitor","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: error while dialing: dial unix /tmp/plugin1232016362: connect: connection refused\"","level":"error","msg":"Plugin \"botkube/kubernetes\" is not responding.","time":"2024-11-01T06:57:03Z"}
{"component":"Plugin Health Monitor","level":"info","msg":"Restarting source plugin \"botkube/kubernetes\", attempt 10/10...","time":"2024-11-01T06:57:03Z"}
{"level":"debug","msg":"plugin exited","plugin":"botkube/kubernetes","time":"2024-11-01T06:57:03Z"}
{"args":["/tmp/botkube/source_v1.13.0_kubernetes"],"level":"debug","msg":"starting plugin","path":"/tmp/botkube/source_v1.13.0_kubernetes","plugin":"botkube/kubernetes","time":"2024-11-01T06:57:03Z"}
{"level":"debug","msg":"plugin started","path":"/tmp/botkube/source_v1.13.0_kubernetes","pid":70,"plugin":"botkube/kubernetes","time":"2024-11-01T06:57:03Z"}
{"level":"debug","msg":"waiting for RPC address","path":"/tmp/botkube/source_v1.13.0_kubernetes","plugin":"botkube/kubernetes","time":"2024-11-01T06:57:03Z"}
{"address":"/tmp/plugin4262477382","level":"debug","msg":"plugin address","network":"unix","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z","timestamp":"2024-11-01T06:57:04.020Z"}
{"level":"debug","msg":"using plugin","plugin":"botkube/kubernetes","time":"2024-11-01T06:57:04Z","version":3}
{"component":"Plugin Health Monitor","level":"info","msg":"Starting plugin \"botkube/kubernetes\" health watcher...","time":"2024-11-01T06:57:04Z"}
{"level":"info","msg":"Starting a new stream for plugin \"botkube/kubernetes\"","time":"2024-11-01T06:57:04Z"}
{"level":"info","msg":"Start source streaming...","pluginName":"botkube/kubernetes","sourceName":"k8s-hpa-events","time":"2024-11-01T06:57:04Z"}
{"level":"info","msg":"Starting a new stream for plugin \"botkube/kubernetes\"","time":"2024-11-01T06:57:04Z"}
{"level":"info","msg":"Start source streaming...","pluginName":"botkube/kubernetes","sourceName":"k8s-recommendation-events","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"panic: runtime error: invalid memory address or nil pointer dereference","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"[signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x1462eb8]","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"goroutine 24 [running]:","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"github.com/kubeshop/botkube/internal/source/kubernetes.mergeResourceEvents(0x4000cab228?)","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"\t/home/runner/work/botkube/botkube/internal/source/kubernetes/router.go:159 +0x1c8","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"github.com/kubeshop/botkube/internal/source/kubernetes.(*Router).BuildTable(0x4000cab7d8, 0xe?)","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"\t/home/runner/work/botkube/botkube/internal/source/kubernetes/router.go:67 +0x30","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"github.com/kubeshop/botkube/internal/source/kubernetes.(*Source).configureProcessForSources(0x400057c630, {0x1f5a430, 0x4000427770}, 0x1?, {0x4000773800?, 0x78d?, 0x800?}, {0x1f6de68?, 0x400023e9a0}, 0x1a3185c5000, ...)","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"\t/home/runner/work/botkube/botkube/internal/source/kubernetes/source.go:192 +0x7fc","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"github.com/kubeshop/botkube/internal/source/kubernetes.(*Source).Stream.(*Source).genFnForKubeconfig.func1({0x1f5a430?, 0x4000427770?})","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"\t/home/runner/work/botkube/botkube/internal/source/kubernetes/source.go:344 +0x64","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"github.com/kubeshop/botkube/internal/source/kubernetes.(*backgroundProcessor).Run.func1()","plugin":"botkube/kubernetes","subsystem_name":"botkube/kubernetes.source_v1.13.0_kubernetes","time":"2024-11-01T06:57:04Z"}
{"level":"debug","msg":"\t/home/runner/work/botkube/botkube/internal/source/kubernetes/bg_processor.go:48 +0x30","plugin":"botkub

BotKube version: v1.13.0

mszostok commented 2 days ago

Hi @washswat-west,

Thanks a lot for reporting this issue! I've already created a PR to address that bug.

However, you can already apply a workaround with version v1.13.0, which you are currently using. You just need to define the top-level event type:

sources:
  'k8s-hpa-events':
    displayName: "HPA Scaling Events"
    botkube/kubernetes:
      context: &default-plugin-context
        rbac:
          group:
            type: Static
            prefix: ""
            static:
              values: ["botkube-plugins-default"]
      enabled: true
      config:
        namespaces:
          include:
            - ".*"
        event:
          types:   # <------ in v1.13.0 this is required and will cause a panic if not set
            - create
            - delete
            - update
            - error

        resources:
          - type: autoscaling/v2/horizontalpodautoscalers
            event:
              types:
                - create
                - delete
                - error
                - update
              reason:
                include:
                  - "SuccessfulRescale"