open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.13k stars 2.4k forks source link

Ensure that the k8s annotation-based discovery cannot target other endpoints #36595

Open dmitryax opened 3 days ago

dmitryax commented 3 days ago

Is your feature request related to a problem? Please describe.

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35617 introduced a new way to create scrapers from particular k8s annotations. The annotations can include the configuration of the scraper to be created. It's important that the configuration provided in the annotation cannot create a scraper that would scrape data from any sources other than the annotated pod. Currently, we have a validation relying on the assumption that every scraper has endpoint configuration field. However, it's not always the case. For example, TLS Check Receiver has targets.url field for that purpose. In this particular case, the validation must check that only one target is set and the target matches the endpoint with the annotation.

Describe the solution you'd like

Every scrape should explicitly support the discovery and provide a validation function, even if it's primarily identical for scrapers with the endpoint config field. Scrapers that do not explicitly provide declare that should not be supported by the discovery mechanism.

We can introduce a new interface that scrapers supporting discovery have to introduce.

type Discoverable interface {
    // Validate validates the rawCfg unmarshals to a config making the scraper receiving data only from discoveredEndpoint.
    Validate(rawCfg map[string]any, discoveredEndpoint string)
}

Most of the scrapers should be able to use a common function built on https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35617/files#diff-2db19e270904db9b7d87966c13aaa399b48566a81f520f91fa8af8032fdc9827R184

cc @ChrsMark

github-actions[bot] commented 3 days ago

Pinging code owners for receiver/receivercreator: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

ChrsMark commented 2 days ago

Thank's for checking this @dmitryax!

While I see the point of introducing such an interface, I'm not 100% sure if this is an actual problem right now though.

In the current implementation we set as default endpoint the discovered Pod's ip:port at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/539042d0c718458227d0e6517893d94fc66901a4/receiver/receivercreator/discovery.go#L127. This is also mentioned in the docs. Users can override this through the annotation but all generated configs come with endpoint: setting anyways.

Hence, scrapers that do not support the endpoint setting will fail to get validated anyways because endpoint: key will be invalid key for their configuration.

For example for a redis config that includes a random setting like the following:

io.opentelemetry.discovery.metrics.6379/config: |
  collection_interval: "20s"
  timeout: "10s"
  some: foo

We get an error like this:

2024-11-29T09:14:53.107Z    info    receivercreator@v0.114.0/observerhandler.go:201 starting receiver   {"kind": "receiver", "name": "receiver_creator/metrics", "data_type": "metrics", "name": "redis/0f2c06b9-bef5-4245-b31e-5263097ff45e_6379", "endpoint": "10.244.0.7:6379", "endpoint_id": "k8s_observer/0f2c06b9-bef5-4245-b31e-5263097ff45e/redis(6379)", "config": {"collection_interval":"20s","endpoint":"10.244.0.7:6379","some":"foo","timeout":"10s"}}
2024-11-29T09:14:53.116Z    error   receivercreator@v0.114.0/observerhandler.go:217 failed to start receiver    {"kind": "receiver", "name": "receiver_creator/metrics", "data_type": "metrics", "receiver": "redis/0f2c06b9-bef5-4245-b31e-5263097ff45e_6379", "error": "failed to load \"redis/0f2c06b9-bef5-4245-b31e-5263097ff45e_6379\" template config: decoding failed due to the following error(s):\n\n'' has invalid keys: some"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/receivercreator.(*observerHandler).startReceiver
    github.com/open-telemetry/opentelemetry-collector-contrib/receiver/receivercreator@v0.114.0/observerhandler.go:217
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/receivercreator.(*observerHandler).OnAdd
    github.com/open-telemetry/opentelemetry-collector-contrib/receiver/receivercreator@v0.114.0/observerhandler.go:93
github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer.(*EndpointsWatcher).updateAndNotifyOfEndpoints
    github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer@v0.114.0/endpointswatcher.go:111

Not all receivers make sense for monitoring K8s Pod workloads, TLS Check Receiver feels a bit like one of them (it's still under development and not even part of the contrib distro). Curious though if we have more like this one 🤔 .

At the same time Collector operators can always use the ignore_receivers setting to disable receivers/scrapers that do not make sense to them or come with security risks.

Either way, maybe we can evaluate this better once we know more about https://github.com/open-telemetry/opentelemetry-collector/issues/11238#issuecomment-2364743278?