grafana / beyla

eBPF-based autoinstrumentation of web applications and network metrics
https://grafana.com/oss/beyla-ebpf/
Apache License 2.0
1.23k stars 78 forks source link

Beyla crashing in daemonset mode #536

Closed uptickmetachu closed 4 months ago

uptickmetachu commented 5 months ago

Running beyla:latest (1.02?) as a daemonset on EKS (1.25) I get this error when I enable kube metadata parsing.

9e18da67bee480acaf59ba5adc4e0ed6220e71fe7]
E0110 05:28:05.067152   16713 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x19bd600), concrete:(*abi.Type)(0x1ca9380), asserted:(*abi.Type)(0x1cad8c0), missingMethod:""} (interface conversion: interface {} is *kube.PodInfo, not *v1.Pod)
goroutine 66 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1a2f1a0?, 0xc0025e3ef0})
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x1a2f1a0?, 0xc0025e3ef0?})
        /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/grafana/beyla/pkg/internal/transform/kube.(*Metadata).initContainerDeletionListeners.func1({0x1ca9380?, 0xc001696f00?})
        /opt/app-root/pkg/internal/transform/kube/informer.go:177 +0x3bf
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
        /opt/app-root/vendor/k8s.io/client-go/tools/cache/controller.go:257
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
        /opt/app-root/vendor/k8s.io/client-go/tools/cache/shared_informer.go:978 +0x9f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00052c738?, {0x1f91000, 0xc000392000}, 0x1, 0xc000390000)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc00052c788?)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000280ea0)
        /opt/app-root/vendor/k8s.io/client-go/tools/cache/shared_informer.go:967 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 15
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
panic: interface conversion: interface {} is *kube.PodInfo, not *v1.Pod [recovered]
        panic: interface conversion: interface {} is *kube.PodInfo, not *v1.Pod

goroutine 66 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x1a2f1a0?, 0xc0025e3ef0?})
        /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/grafana/beyla/pkg/internal/transform/kube.(*Metadata).initContainerDeletionListeners.func1({0x1ca9380?, 0xc001696f00?})
        /opt/app-root/pkg/internal/transform/kube/informer.go:177 +0x3bf
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
        /opt/app-root/vendor/k8s.io/client-go/tools/cache/controller.go:257
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
        /opt/app-root/vendor/k8s.io/client-go/tools/cache/shared_informer.go:978 +0x9f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00052c738?, {0x1f91000, 0xc000392000}, 0x1, 0xc000390000)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc00052c788?)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000280ea0)
        /opt/app-root/vendor/k8s.io/client-go/tools/cache/shared_informer.go:967 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 15
        /opt/app-root/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
grcevski commented 5 months ago

Hi @uptickmetachu, there are a few known issues with our 1.0 release with the kubernetes support, which is why we never documented it, it simply wasn't ready yet. We've completely reimplemented it in beyla:main and we believe that works well now. We still haven't released 1.1, but it should happen soon.

If you'd like you can try our main branch, we have examples in our tests with k8s support, as well as choosing what to instrument by pod name. But it's still an unreleased version.

uptickmetachu commented 5 months ago

Sorry this is crashing with 1.1.0.

The image is the one below.

    Image:         grafana/beyla:1.1.0
    Image ID:      docker.io/grafana/beyla@sha256:ae7997624fd99b250a1a533be34d339fde3328a24688c373dc4b2d3862941f4b

I also tried 1.0.0 and 1.0.2 and that fails on: E0110 21:47:11.489684 19250 reflector.go:148] k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: unable to sync list result: couldn't enqueue object: not indexing service without ClusterIP

despite having cluster permissions to watch services.

grcevski commented 5 months ago

I see, do you mind trying with grafana/beyla:main please? I believe we cut some early pre-release 1.1 before the kubernetes work was finished.

grcevski commented 5 months ago

Based on what I see in the error it may not help, but I just want to make sure we are not working on something that's already fixed. Thanks a ton in advance!

grcevski commented 5 months ago

@mariomac this looks like a bug in the k8s library we use, or simply they may not support EKS?

panic: interface conversion: interface {} is *kube.PodInfo, not *v1.Pod
uptickmetachu commented 5 months ago

I see, do you mind trying with grafana/beyla:main please? I believe we cut some early pre-release 1.1 before the kubernetes work was finished.

That did the trick :). I'll pin it for now and continue playing with it. I'm eagerly awaiting the proper 1.1 release!

grcevski commented 4 months ago

Just FYI, we just released 1.2 which has these changes, so you can pin to that, rather than main, which might bring surprises :).