open-telemetry / community

OpenTelemetry community content
https://opentelemetry.io
Apache License 2.0
774 stars 233 forks source link

[Donation Proposal]: Beyla, eBPF auto-instrumentation tool for metrics and traces #2406

Open grcevski opened 2 days ago

grcevski commented 2 days ago

Description

Grafana Labs would like to offer the donation of Beyla to the OpenTelemetry project.

Beyla is a mature eBPF-based auto-instrumentation tool for OpenTelemetry metrics and traces, for multiple languages and protocols. It enables cluster-wide/system-wide auto-instrumentation of applications without the need for application code/configuration changes or application restarts. To achieve this, Beyla uses a combination of protocol-level instrumentation based on network events and language/runtime-level instrumentation where needed. While Beyla works on bare metal installations, virtual machines, etc., the tool is also fully Kubernetes-aware and can be deployed as a daemonset or as a sidecar. Beyla is used by a number of customers in production, including Grafana Labs itself for the Grafana Cloud hosted offering.

Some of the main uses of Beyla are:

Some of the core features of Beyla include:

Benefits to the OpenTelemetry community

Donating Beyla will fill a gap in the overall OpenTelemetry application level instrumentation ecosystem, for applications which use programming languages which are not supported by the OpenTelemetry SDKs, which use proprietary frameworks or use older technologies. We also believe that it will fill in a gap with network level monitoring for the purpose of building solutions for service graphs and connectivity tracking.

This donation has a lot of synergy with the OpenTelemetry Profiling Agent, and we believe that in the future we can create a non-intrusive, generic profiling to TraceID correlation by leveraging the two projects.

Reasons for donation

We at Grafana Labs prefer that customers use the upstream OpenTelemetry SDKs for application level instrumentation, however we often find that certain customers are unable to use the recommended approach because of their current technology use. We built Beyla as an easy way for our customers to get started with OpenTelemetry, while they are in their transition process of upgrading their software, which sometimes takes years. Oftentimes, customers also use binary distributions of software, and are unable to instrument these applications depending on the technology the binaries are built with.

We believe that we are not alone in this need to move customers to OpenTelemetry quicker, where they can’t currently leverage the existing OpenTelemetry ecosystem. This is why we’d like to make this project a community project, where multiple companies can be stakeholders and we can build a better community around it, compared to what Grafana Labs can do alone.

Relation with Other OpenTelemetry Projects

We also see this donation as an opportunity to combine the eBPF based auto-instrumentation OpenTelemetry efforts. Our project borrows parts of the OpenTelemetry Go Auto-Instrumentation project and some of our Beyla maintainers participate in that project too. We’d like to fully merge our work on Go with OpenTelemetry Go Auto-Instrumentation and avoid the double contribution we do at the moment. Beyla’s support for auto-instrumentation goes way beyond Go auto-instrumentation, which is why we are proposing a new project donation. We’d like to fully merge all of our work on Go with the OpenTelemetry Go Auto-Instrumentation project and vendor it in Beyla as an import once the merge is complete. We are also open to combining the Go Auto-Instrumentation project into a new project for out-of-process auto-instrumentation with our donation.

We also see this donation as an opportunity to re-invigorate the OpenTelemetry eBPF Networking project. Beyla includes support for the majority of the functionality of that project, but it’s built with eBPF-Go (libbpf), which means it uses CO-RE and it can be deployed on any kernel without specific kernel builds or deploying compilation toolchain on the target system.

Our development stack is identical to what’s used by OpenTelemetry Go Auto-Instrumentation and the OpenTelemetry eBPF Profiler. Developers on those projects will easily be able to contribute to this project and it will bring all of the OpenTelemetry eBPF tooling at the same level.

Repository

https://github.com/grafana/beyla

Existing usage

Beyla is used by hundreds of users in production, including Grafana Cloud itself. We have a strong open-source community usage, the number of pulls of our Docker image is around 100,000 a month and it has been growing steadily since inception of the project. For example, our Docker image pulls in April of 2024 were around 30,000 a month.

Maintenance

We have 4 full-time maintainers on the project which will move work full-time on the OpenTelemetry project if accepted. We have over 40 contributors on the project, most of which are not Grafana Labs employees or affiliated in any way with Grafana Labs.

Licenses

Apache 2.0 License Our eBPF probe source is dual licensed with GPL/MIT as per the requirements of the Linux Kernel. This is identical to the approach used by OpenTelemetry Go Auto-Instrumentation and OpenTelemetry Profiler.

Trademarks

The name Beyla currently appears in a number of places in the codebase and is a Grafana Labs Trademark. We are happy to donate the name too, however we understand that it’s not compatible with how OpenTelemetry projects are typically named. We are happy to remove any of these name references when the project is donated, if the name donation is not acceptable.

Other notes

This proposal has been socialized with @MrAlias (maintainer of OpenTelemetry Go Auto Instrumentation) and @atoulme (maintainer of OpenTelemetry eBPF Networking)

edeNFed commented 2 days ago

I’m looking forward to Beyla's potential donation to the OpenTelemetry project, as it helps cover important gaps in auto-instrumentation for unsupported languages and environments.

That said, this donation comes with some challenges since a lot of Beyla’s work overlaps with existing OpenTelemetry projects like Go auto-instrumentation, eBPF Profiler, eBPF networking and OpenTelemetry Operator. The community already has efforts addressing these areas, so it’s important to understand how Beyla will fit in and integrate with these projects.

As part of the donation, it’s crucial to ensure the current core OpenTelemetry repositories remain the main source of truth, and that we avoid duplicating code or functionality. It would be helpful to see how Beyla and existing projects can come together without redundancy.

I’m also interested in how Beyla will eventually be integrated as a collector receiver in the OpenTelemetry architecture. To make this work smoothly, Beyla should be able to use existing components as dependencies rather than duplicating what’s already there.

grcevski commented 2 days ago

That said, this donation comes with some challenges since a lot of Beyla’s work overlaps with existing OpenTelemetry projects like Go auto-instrumentation, eBPF Profiler, eBPF networking and OpenTelemetry Operator. The community already has efforts addressing these areas, so it’s important to understand how Beyla will fit in and integrate with these projects.

Thanks for the comments Eden. The main overlap in functionality is related to Go Auto Instrumentation, for which we propose to merge our functionality there and vendor it in the new project. The main challenge I see is the multi-process support, which we need for fleet wide monitoring, however I'm sure we can overcome these challenges. For eBPF Networking, I think we can use this as an opportunity to bring the functionality at the same level as Go Auto, using similar development stack and libbpf CO-RE based approach.

I don't think the donation overlaps in any way with the OpenTelemetry Operator or the OpenTelemetry eBPF Profiler. I think providing a generic way to extract trace/span information for the eBPF Profiler will be great to be able to correlate traces with profilers.

grcevski commented 2 days ago

I’m also interested in how Beyla will eventually be integrated as a collector receiver in the OpenTelemetry architecture. To make this work smoothly, Beyla should be able to use existing components as dependencies rather than duplicating what’s already there.

I'm not sure there's much duplication there, except with the eBPF networking component, which we addressed in relationships to existing OpenTelemetry Projects. There's a recent request to add Beyla as a component in the OpenTelemetry Collector, which this would help a lot. https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34321

damemi commented 2 days ago

Thanks for the detailed proposal @grcevski! I think this is great for building progress on OpenTelemetry/eBPF and covering existing gaps.

To mirror what @edeNFed said, avoiding confusion and duplication is important. But I think you have explained that the idea is to vendor the existing Go Auto-Instrumentation as a dependency into the Beyla donation. That makes sense to me, as it fits with the goals we've been working on together in Go Auto (ie, to make that repo a library/API/SDK that can be imported by other implementations).

To that, it makes sense that OpenTelemetry would provide both (a) an open-source library/framework for eBPF instrumentation with a "raw" agent as the default artifact and (b) an open-source component consuming that framework to provide second-level functionality and usability. @jsuereth and I were actually talking about this, and he compared this situation to roughly to how the collector works.

I think the potential overlap with the OpenTelemetry Operator is in the fact that the Operator does deploy that default agent from Go Auto-Instrumentation, but that's about it. To draw back to the collector comparison, I would say that the Operator is to the Collector as Beyla is to Collector-Contrib: built on a stable, minimal core with added functionality. Both exist to give users options based on their needs.

All that said, we should make sure to apply the same standards for donation that we are also applying to the Compile-time Go Instrumentation donation. Specifically:

All in all, I wouldn't be surprised to see these 3 projects collaborate and converge more often as time goes on. Thanks for your work on this @grcevski!

svrnm commented 1 day ago

I am by no means an expert on ebpf but one thing I'd like to ask:

would it be possible to work towards one ebpf solution that combines what beyla does (auto instrumentation with traces, metrics I suppose + networking) + the profiler?

Because at the end what people want (see this discussion for example: https://github.com/open-telemetry/opentelemetry-specification/issues/4255) is a combination of all four signals, but if those 2 projects are separate we either need a way to install them side-by-side or people have to choose.

damemi commented 1 day ago

I think that one ebpf solution would be something like Beyla. But, I don't think that idea means all of the code for every signal+language lives in one monorepo with the higher-level component. That's what I mean by separate repos at least

RonFed commented 1 day ago

I agree with @edeNFed and @damemi comments.

Having projects handling auto-instrumentation and on top of them higher level implementations (like the Operator or Beyla) which uses multiple other projects is a good structure in my opinion.

As a maintainer in the go-auto-instrumentation project, I'd be happy to accept donations from Beyla to the current project.

dashpole commented 1 day ago

I'm excited to see this donation proposal! I have made a few contributions to Beyla in the past, and have found the maintainers knowledgeable, kind, and helpful. I also think Beyla fills an important gap by providing language-agnostic telemetry. There are definitely details to work out, but i'm very supportive of this proposal.