open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.23k stars 442 forks source link

Enabling `operator.targetallocator.mtls` results in non-existent certificate mounts for non-Prometheus collectors #3456

Open thefirstofthe300 opened 2 weeks ago

thefirstofthe300 commented 2 weeks ago

Component(s)

No response

What happened?

Description

After enabling the operator.targetallocator.mtls, all OTEL collector deployments are updated with MTLS certificates; however, only the deployments which create a Target Allocator actually appear to create the necessary certificates.

Steps to Reproduce

  1. Install operator
  2. Enable operator.targetallocator.mtls feature gate
  3. Create deployment without target allocator
  4. Wait infinitely for ContainerCreating to never complete.

Expected Result

Only collectors which use the target allocator mount the target allocator cert.

Actual Result

All collectors attempt to mount a target allocator cert

Kubernetes Version

1.30.6

Operator version

0.113.0

Collector version

0.113.0

Environment information

No response

Log output

No response

Additional context

No response

jaronoff97 commented 1 week ago

@ItielOlenick would you be able to take a look at this?

ItielOlenick commented 1 week ago

The feature was developed for use with the Target Allocator. Perhaps I should add a check to see if the TA deployment is disabled and in that case disable the feature.

I believe using mTLS between the Collector and another drop-in should be introduced as a new feature. It shouldn’t be hard to implement—essentially, it would involve extending the current solution by dropping the Target Allocator related operator tasks and only creating the relevant secrets so they can be used by the drop-in.

@thefirstofthe300 Could you share your use case?

thefirstofthe300 commented 1 week ago

The use case is a standard non-TA deployment. I'm not trying to do anything fancy. I have the OTEL and statsd receivers enabled and the Datadog exporter enabled. The issue to me seems to be that as soon as the operator is configured to make use of TLS between TA and the collectors, it becomes incapable of running a non-TA deployment.

We're running one daemonset for non-Prometheus metrics and a statefulset for Prometheus metric collection. We would like both to be created by the operator.

On Sun, Nov 24, 2024, at 7:52 AM, ItielOlenick wrote:

The feature was developed for use with the Target Allocator. Perhaps I should add a check to see if the TA deployment is disabled and in that case disable the feature.

I believe using mTLS between the Collector and another drop-in should be introduced as a new feature. It shouldn’t be hard to implement—essentially, it would involve extending the current solution by dropping the Target Allocator related operator tasks and only creating the relevant secrets so they can be used by the drop-in.

@thefirstofthe300 https://github.com/thefirstofthe300 Could you share your use case?

— Reply to this email directly, view it on GitHub https://github.com/open-telemetry/opentelemetry-operator/issues/3456#issuecomment-2496080977, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYDZ4XNS3TJPPPCOHQIOHT2CHY5LAVCNFSM6AAAAABRXVLXW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJWGA4DAOJXG4. You are receiving this because you were mentioned.Message ID: @.***>

ItielOlenick commented 6 days ago

Nice catch! The issue is that the secrets are never created since the TA is not deployed. I'll create a PR with a fix ensuring that the secrets are not mounted to the collector if the TA is not deployed and the feature gate is enabled.

@thefirstofthe300 out of curiosity—why would you enable mTLS between the TA and the Collector if the TA isn't being used? Im asking since that feature is specifically designed for use with the TA.

thefirstofthe300 commented 6 days ago

@thefirstofthe300 out of curiosity—why would you enable mTLS between the TA and the Collector if the TA isn't being used? Im asking since that feature is specifically designed for use with the TA.

We are using the TA. We have two deployments. One with the TA and one without. Due to memory consumption of the Prometheus receiver, we isolate it to its own StatefulSet deployment. We need that deployment to scrape from secure endpoints so we need the certificate management there. Our statsd and OTEL receivers run as a DaemonSet deployment and make no use of the TA. We don't want to deploy TA for that deployment and don't need to bother with certs for it.

Both of the collectors are managed by the same operator which is why we need to enable the mTLS feature.