open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.21k stars 440 forks source link

Proposal: distribution image builder service #1817

Open frzifus opened 1 year ago

frzifus commented 1 year ago

This proposal is primarily about finding out whether there would be interest in such a service.

Background

In the past few months, discussions at the Collector-SIG meeting have raised the question of whether the contrib image should be published at all. With the growing number of components, the container size continues to increase. In addition, there are security concerns. As a result, the idea was born to create a community page to provide simple custom distributions.

Problem

In order to put a customized image together, you have to build it yourself with ocb or (a non-existing community page) and then host it yourself. But if a security problem occurs in a component or in go itself, the process has to be repeated. It's just not straight forward.

The same problem also occurs when you use a productized operator/collector bundle and a certain processor or exporter is not supported. In this case, it is up to the user to extend the distribution with the desired component, rebuild and host/maintain it somehow.

Solution

The idea would be to provide a service that builds the desired distributions based on a given configuration inside a cluster.

Builder CR

From a user perspective, this could look like the following OpenTelemetryCollectorBuilder CR draft:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollectorBuilder
metadata:
  name: dev-image
spec:
  image:
    allowUnsupportedComponents: true
    allowDeviatingVersions: true
    description: "Basic OTEL Collector distrbution for Developers"
    receivers:
      - gomod:
          go.opentelemetry.io/collector/receiver/otlpreceiver
    processors:
      - gomod:
          go.opentelemetry.io/collector/processor/batchprocessor v0.79.0
    exporters:
      - gomod:
          go.opentelemetry.io/collector/exporter/loggingexporter v0.79.1
      - gomod:
          github.com/open-telemetry/opentelemetry-collector-contrib/exporter/jaegerexporter
  destination:
    repository: ...
    credentials: ...
status:
  buildStatus: complete
  allowUnsupportedComponents: true
  allowDeviatingVersions: true
  checkSum: xxxxxxx
  publishedAt: 2023-08-06
  repository: docker.io/someone/dev-image:xxxxxxx
Image

Contains all the information needed to build and publish a new collector image.

e.g.:

Destination

Usage

To make it as easy as possible to use, it would be a good thing if we could reference the builder CR.

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: simplest
spec:
  imageRef: dev-image
  config: ...

Build proccess

There I've still a few open questions. Building an image within a container without access to the docker daemon or similar seems to be difficult. The problem could be bypassed by mounting the docker.sock, for example. However, one cannot assume that e.g. docker is available on the nodes.

Another option but also a dependency could be Kaniko.

How does kaniko work?

The kaniko executor image is responsible for building an image from a Dockerfile and pushing it to a registry. Within the executor image, we extract the filesystem of the base image (the FROM image in the Dockerfile). We then execute the commands in the Dockerfile, snapshotting the filesystem in userspace after each one. After each command, we append a layer of changed files to the base image (if there are any) and update image metadata.

Maybe buildah is another option?

https://insujang.github.io/2020-11-09/building-container-image-inside-container-using-buildah/

wdyt?


Update

Discussed in the Operator-SIG call on 08.06.2023.

jpkrohling commented 1 year ago

I love this proposal, especially because of its ability to rebuild a distribution in case something changes, like a new version of a component is released. I think there are several implications that need to be thought of, like: what to do when an image is built, but the pod can't start? Those and other failure scenarios have to be collected and planned for.

frzifus commented 1 year ago

Recently we had a short brainstorming session about how best to make the otel collector available in a linux distro. Here it came down to the same problem, which components should be supported? There are edge cases where the image should only support a few necessary components and cases where the collector covers many functionalities.

One solution would be to split the collector.

Would that bring values to the Operator too? What I have in mind is that if the collector could be subdivided in this way, you could deliver selected components with the operator itself. Other components could be compiled externally and made available to the operator.

apiVersion: v1
kind: ConfigMap
metadata:
  name: basicauth-config
  annotations:
    component.opentelemetry.io/version: "v0.80.0"
data:
  basicauthextension: basicauthextension.{so,wasm,other}
frzifus commented 8 months ago

cc @chilleregeravi