open-telemetry / opentelemetry-helm-charts

OpenTelemetry Helm Charts
https://opentelemetry.io
Apache License 2.0
390 stars 469 forks source link

Allow option to create opentelemetry-collector CR along with otel-operator helm installation #69

Open VineethReddy02 opened 3 years ago

VineethReddy02 commented 3 years ago

I am currently evaluating opentelemetry-operator helm chart. I am stuck on an issue. I wanted to add the opentelemetry-collector custom resource along with otel-operator helm chart installation.

The otel-col CR creation fails during installation with the below error:

Internal error occurred: conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post "https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s": dial tcp 10.102.92.197:443: connect: connection refused

Because the CR creation triggers the mutating webhook request to the operator and on creating otel-col CR during installation the otel-operator is still in the container creating phase, so the CR creation is failing. Is there any solution to fix this issue? I want to create otelcol CR along with the otel operator installation.

As a workaround, I am installing otel-col CR alongside otel-operator installation by adding helm pre-install webhook to create otel-col CR prior to the creation of webhooks and I am myself adding the otel-col default mode as deployment and labels that are configured using mutating webhook as below

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry-collector
  labels:
    "app.kubernetes.io/managed-by": "opentelemetry-operator"
  annotations:
    "helm.sh/hook": "pre-install"
spec:
  mode: deployment

Are there any alternative solutions to do this? Is my workaround an appropriate fix? cc @Saber-W @jpkrohling

Saber-W commented 3 years ago

Are there any alternative solutions to do this? Is my workaround an appropriate fix?

I think using pre-install mechnism is an appropriate workaround if you want to "create otelcol CR along with the otel operator installation". If you can accept Collector to be installed slight after the Operator's readiness, I think kubectl wait would also be a good fix.

VineethReddy02 commented 3 years ago

But how can I use kubectl wait inside helm chart installation?

I'm adding a otel-col CR inside /templates directory to create a otel-col with provided config alongside the otel-operator installation.

jpkrohling commented 3 years ago

If you install the CR before the hook is installed, your CR will not be validated and we might not normalize it (defaulter). Not sure how to do it with Helm, but you do need to create the CR after the operator is ready.

VineethReddy02 commented 3 years ago

I am trying to create otel-col in deployment mode. So the validating webhook has very little to do with type deployment as more validations logic is on other types of deployment modes. And mutating webhook is responsible for injecting the labels "app.kubernetes.io/managed-by": "opentelemetry-operator" I am adding it myself in config as with pre-hook CR creation cannot perform mutation on existing resources.

It works for now. But I know adding new functionality into operator webhooks will definitely break this. :)

And I am not sure of any other alternatives to include otel-col CR within the otel-operator helm installation. :/

jpkrohling commented 3 years ago

It might be a short-term solution for you, but we definitely need a proper solution for the wider community.

mars64 commented 3 years ago

Isn't this more of an issue with helm/your release process, than an issue with this chart? (though, if your pre-install hook works for you then awesome!)

I think its generally recommended to not tie together the CRD and CR in the same chart (see Helm's docs on this topic and its caveats -- specifically that bit about unintentional data loss).

Also according to helm:

there is currently no community consensus around how to handle CRDs and their lifecycle

... which is why I generally prefer Method 2 - Separate Charts.

Precisely how to make that happen in a single step depends on your release system, which is why I think its not exactly an issue with this chart.

For example in my case I can use ArgoCD sync waves to ensure the CRD is stood up in a way that admission/validation webhooks fire correctly by the time I install the relevant CustomResource -- and I prefer keeping the CRD chart separate from the CR anyway, for the reasons above.

VineethReddy02 commented 3 years ago

@mars64 thanks for the input.

But dealing with separate charts makes it more complex right? I mean the upgrades and management of multiple charts, which represent a single system i.e. the actual components and the supporting components.

mars64 commented 3 years ago

But dealing with separate charts makes it more complex right? I mean the upgrades and management of multiple charts, which represent a single system i.e. the actual components and the supporting components.

The way I read your request, the complexity comes from the goal of combining two different resources (CRD and the CR that relies on the CRD -- each with different lifecycles) into a single step (i.e. install a single helm chart). I'm trying to point out that the helm community generally doesn't seem to have a good answer for this. In other words, it's additional complexity either way.

Personally, I think its less complex to rely on two different charts that more directly serve their purposes, than to attempt to make a chart designed for the CRD lifecycle also responsible for the lifecycle of what is potentially many CR's depending on the use case. This is why I mentioned my use of ArgoCD sync waves to control CRD and CR installation in separate charts but in a single "installation step". I'd love to hear if your or others have had success to the contrary!

dmitryax commented 2 years ago

I like the method 2 with separate helm charts, but not sure what's the established practice in other helm charts out there

jaronoff97 commented 2 years ago

Any updates here? I'm running in to the same problem now (as described here)

jaronoff97 commented 2 years ago

For context, the kube-prometheus-stack helm chart is able to install the CRDs and an instance of a Prometheus CRD in the same chart. To me, it's easier to manage a single chart that installs the whole prometheus ecosystem rather than two charts. I would like to do the same with Otel. They do specific orchestration around helm to achieve this

dmitryax commented 2 years ago

I don't think anyone is working on it. Any help would be appreciated

jaronoff97 commented 1 year ago

@TylerHelmuth I think this is still open in the operator world, I don't have a lot of cycles right now to work on fixing this, but can bring this up at the next operator SIG and see if someone else does.

TylerHelmuth commented 1 year ago

@jaronoff97 sounds good. I can dedicate cycles to this as well I think.

Allex1 commented 1 year ago

@jaronoff97 @VineethReddy02 I'm really interested in the use cases for having the CR's as part of the operator chart.

In the case of kube-prometheus-stack the bundle makes sense as the core prometheus/alertmanager config is pretty generic and the custom config is handled in decoupled CR's (servicemonitor/prometheusrule). In the case of otel where the otelcol/otelinst reuses vanilla config which varies wildly I don't yet see a generic use case.

We use both kube-prometheus-stack and otel-operator in different setups:

  1. 1 operator + 1 prometheus pair + webhooks per cluster for cluster/infra monitoring. Everything, including CRD's are deployed by the cluster admin on cluster create.
  2. multiple operator/prometheus pairs for multiple tenants that own multiple namespaces in the same cluster. CRD's are reused from 1 and webhooks are disabled.
  3. 1 admin operator per cluster that watches all namespaces for tenants to create their custom CR's via the same kps helm (with operator disabled) or other helm charts/templating tools/manifests. Everything except CR's are deployed by the cluster admin.

For otel-operator I see 3 as the best option or 1 if you'd want to replace Prometheus cluster monitoring with a collector+several backends. I believe this chart is trying to achieve 1. 2 doesn't really work in multi tenant k8s clusters as users cannot deploy webhooks which are crucial for the otelinst and otelcol sidecar mode to work. For 3 we'd need to find a way to reuse the collector chart for creating otelcol CR's while avoiding duplication.

illrill commented 1 year ago

Bumped into the same problem. After reading comments by @mars64, I wanted to pitch in a clarification that this problem isn't the classic "CRD must be installed before CR" scenario. For me it's caused by the operator's webhook not yet being ready when the CR is applied (when the operator and CR are both installed in the same release).

Outline of the sequential steps in my CI/CD pipeline:

  1. helm dependency update . to download opentemeletry-operator as a subchart.
  2. helm show crds . | kubectl apply --server-side --force-conflicts -f - to install CRDs separately as per helm method #2. This ensures that CRDs are there before the CR comes.
  3. helm upgrade -i opentelemetry . --skip-crds to install my parent chart which includes the downloaded opentemeletry-operator chart, as well as an OpenTelemetryCollector CR (all in the same release).


./Chart.yaml

apiVersion: v2
name: opentelemetry
type: application
version: 1.0.0
  - name: opentelemetry-operator
    version: 0.30.0
    repository: https://open-telemetry.github.io/opentelemetry-helm-charts/

./templates/opentelemetrycollector.yaml

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry
spec:
  mode: deployment 
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
    exporters:
      logging:
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [logging]

Ideally I'd like to deploy the operator and instantiate an operand in the same release, but the only reasonable workaround I can think of is to have them in separate releases/charts 😢

jaronoff97 commented 1 year ago

@illrill see here for more info on this... I have some open work i need to do to make a helm chart that enables exactly what you're talking about. I haven't had the time unfortunately.