open-telemetry / opentelemetry-network

eBPF Collector
https://opentelemetry.io
Apache License 2.0
291 stars 46 forks source link

How to migrate the build environment GitHub repository into open-telemetry? #2

Closed yonch closed 1 year ago

yonch commented 2 years ago

opentelemetry-ebpf is written in C++. Unlike Golang, C++ lacks a standardized package manager across platforms that manages C++ dependencies: developers on Mac and different Linux distros are likely to have different package versions installed.

Non-uniform package versions create a problem: non-reproducible builds. Code that works for one developer can easily fail for another developer that has a different set of package versions. We also want to ensure that developers can control the versions so a release includes a known minimal set of security patches.

To enable reproducible builds, the project uses a build container (in this repository). At build time, dependencies are compiled from specific hashes of their open-source repositories. The base image provides the package dependencies that are not built explicitly into the build container. This enables all developers to build using the same packages.

OpenTelemetry appears to have several build repositories:

build-tools has a handful of tools used in various projects, including the c++ SDK (cpp_format_tools has linters). However the ebpf build environment would overwhelm that repository (17 directories vs. 5 in build-tools). Further, it seems like many of the packages are ebpf specific and are less likely to be reused in other open-telemetry repositories: bcc, llvm, curl, libuv, openssl, etc.

We would like to complete the migration of the build environment into the open-telemetry GitHub organization.

Option A (the "default")

It seems the lowest effort option is to migrate the existing repository as-is to a repository in the open-telemetry GitHub organization. If we follow the go example above, it might be opentelemetry-ebpf-build-tools. We could have the same teams configured for that repository (ebpf-maintainers, ebpf-approvers, ebpf-triagers).

Option B

Another option that would require more work would be to move the contents of the build environment repository into a subdirectory of the opentelemetry-ebpf repo. This would require changes to the build environment's build however.

Any thoughts?

cc @open-telemetry/ebpf-maintainers , @tigrannajaryan

EDIT: fixed the current repository URL

tigrannajaryan commented 2 years ago

I don't know enough about ebpf or our existing C++ tooling to have a reasonable opinion on this. Based on what little I know Option A does sound reasonable since it allows you to work independently.

cc @open-telemetry/build-tools-approvers and @open-telemetry/cpp-approvers if you have any thoughts.

bogdandrutu commented 2 years ago

Why not using bazel or something like that?

Pryz commented 2 years ago

Any consideration for a Go based (re-)implementation ? C++ can make hard to gain more contributors. There is a big Go (and even a growing Rust) community around eBPF.

yonch commented 2 years ago

Why not using bazel or something like that?

iirc just LLVM takes 1-3 hours to build, we wanted to offload that so automated builds could do this compile once, then every developer can use the built libraries. Using containers worked for us, but is obviously not ideal.. Does bazel have mechanisms to better solve this? @bogdandrutu

yonch commented 2 years ago

Any consideration for a Go based (re-)implementation ? C++ can make hard to gain more contributors. There is a big Go (and even a growing Rust) community around eBPF.

Agreed that a higher level language would be much easier to develop in. The original implementation in 2016 was in Python, with C++ code wrapped in SWIG for the critical parts.

The biggest concern with moving languages would be overhead -- can we get comparable overhead with Go or Rust, and how much effort would that entail? Various versions of the C++ code were measured in multiple realistic deployments to have <0.25% CPU overhead, and the team spent significant effort to bring it there.

Achieving low overhead involved some low-level mechanisms. Some that come to mind are memory management, message encoding/decoding, RPC dispatch, shared in-memory queues, socket read and write batching, and "scatter-gather" metric aggregation. If we want to switch to Go or Rust, we should probably need to go through each of those and find an adequate solution. Alternatively we can forgo the overhead requirements (but regression might hurt adoption).

I'm happy to collaborate on this line of inquiry further.. @Pryz

lalitb commented 2 years ago

https://github.com/orgs/open-telemetry/teams/cpp-approvers if you have any thoughts.

For OpenTelemtry C++, we keep it simple by distributing source code archives through Github releases. It's practically not possible for us to distribute the binaries for all OS/architecture combinations. We do use bazel and cmake as the build systems in our CI pipeline, and it also enables applications using one of these build systems to easily integrate the SDK.

Regarding package managers, as rightly mentioned there is no standard offering for C++, so we don't push source/binary packages anywhere. There are OpenTelemetry C++ releases distributed through vcpkg and Conan which are not officially supported by us. More details here - https://github.com/open-telemetry/opentelemetry-cpp/blob/main/INSTALL.md#using-package-managers

bjandras commented 1 year ago

We went with option A.