Open ChrsMark opened 3 months ago
Pinging code owners:
receiver/receivercreator: @rmfitzpatrick
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Pinging code owners for extension/observer/k8sobserver: @rmfitzpatrick @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.
That's a really interesting idea and I reckon it would be aligned with the auto instrumentation approach in OpenTelemetry. The K8s Operator is already available to inject auto-instrumentation for some SDKs, maybe we could reuse some of the annotations naming used there:
annotations:
instrumentation.opentelemetry.io/inject-collector-receiver: redis/1
instrumentation.opentelemetry.io/inject-collector-receiver-config: '`endpoint`:6379'
As I understood, the annotations will be on Pod spec level, if a Pod contains more than one container to be instrumented, how could we differentiate their different receiver's configurations? What do you think if we follow a similar approach as Metricbeat but the user needs to append a suffix to the annotation key?
annotations:
instrumentation.opentelemetry.io/inject-collector-receiver: redis/1
instrumentation.opentelemetry.io/inject-collector-receiver-config: '`endpoint`:6379'
instrumentation.opentelemetry.io/inject-collector-receiver/2: redis/2
instrumentation.opentelemetry.io/inject-collector-receiver-config/2: '`endpoint`:1234'
Sets of annotations can be provided with numeric or text prefixes. If there are annotations that don’t have a suffix, then they get grouped together into a single configuration. I think having a common prefix would ease the annotations detection and parsing.
There is already an issue suggested this feature: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/17418. If we agree on this we can continue with one of these 2.
As I understood, the annotations will be on Pod spec level, if a Pod contains more than one container to be instrumented, how could we differentiate their different receiver's configurations? What do you think if we follow a similar approach as Metricbeat but the user needs to append a suffix to the annotation key?
@rogercoll I would expect this to be possible if we somehow handle the hints per container/port like type == "port" and port == xyz
as it was described at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/25866#issuecomment-1683004547. But good point anyways, and we should take into account in the implementation.
Update: For covering this for Logs we will need https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/35491
Hey @dmitryax , following our offline discussion I have put together the part for how the hints' API would look like. It's the Specification
part in the description which lists the hints we could support.
I would suggest we start with the metrics use-case.
@rogercoll @dmitryax I tried my hand at implementing the proposal for metrics: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35617 (logs can be added in a different Pr since it's anyways blocked on https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35544). That's more or less how I imagine the implementation but I still want to play a bit around with some details. However if you could take a look and provide your feedback that would be great so to ensure that we are aligned on the direction.
@ChrsMark I have taken a look at the https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35617, the code looks good to me, but I have a concern regarding the extensibility of the Supported metrics annotations
.
The PR seems to be tied to specific configuration annotations, meaning that I can only "discover" the receivers that use those specific fields. The PR's supported annotations (receiver configuration) are endpoint, timeout, user, and password
, which seem to be some of the configuration fields available in the nginx or redis receivers.
initial_delay
, compression
, reload_interval
, etc (scrapers and http config).Do we plan to only support specific configuration annotations? Would it be feasible to just support the required ones for the k8sobserver logic's and use the other's as raw configuration? Something like:
// well-known annotations
io.opentelemetry.collector.receiver-creator.metrics/receiver: nginx
io.opentelemetry.collector.receiver-creator.metrics/endpoint: "http://`endpoint`/nginx_status"
// any other annotation will be used as raw configuration field
io.opentelemetry.collector.receiver-creator.metrics/collection_interval: '20s'
io.opentelemetry.collector.receiver-creator.metrics/initial_delay: '20s'
io.opentelemetry.collector.receiver-creator.metrics/read_buffer_size: '10'
io.opentelemetry.collector.receiver-creator.metrics/xyz: 'abc'
will be mapped to:
nginx:
endpoint: "http://`endpoint`/nginx_status"
collection_interval: 20s
initial_delay: 20s
read_buffer_size: 10
xyz: abc
Do we plan to only support specific configuration annotations? Would it be feasible to just support the required ones for the k8sobserver logic's and use the other's as raw configuration?
My proposal would be to start supporting specific settings as annotations so as to have full control of what is allowed as a hint. This would serve as making it straight-forward for the users what specific annotations are supported as well as ensuring no security sensitive settings are exposed over this API. Every user with access to deploy Pods can potentially configure the Collector and take advantage of its privileges by just deploying a hinted Pod.
In this, I would only allow io.opentelemetry.collector.receiver-creator.metrics/raw: ''
for advanced use-cases which should also be enabled-disabled (disabled by default) on Collector's startup.
Would love to hear @dmitryax's thoughts on this as well.
I have some concerns about the suggested naming convention. Let's brainstorm it more.
receiver-creator
or receiver
parts in it because we potentially can get rid of them, see https://github.com/open-telemetry/opentelemetry-collector/issues/11238. I would use scraper
term instead.nginx
to avoid potential conflicts.What about something like this (I'm open to further discussion):
// once we have proper discovery, this would control particular pods turning it off/on
opentelemetry.io/discovery/enabled: "true"
// this one should be required as of right now, until we have the discovery
opentelemetry.io/discovery/scraper: nginx
// extra annotations to control scraper behavior
opentelemetry.io/discovery/signals: metrics
opentelemetry.io/discovery/config/endpoint: "http://`endpoint`/nginx_status"
opentelemetry.io/discovery/config/collection_interval: '20s'
opentelemetry.io/discovery/config/initial_delay: '20s'
opentelemetry.io/discovery/config/read_buffer_size: '10'
opentelemetry.io/discovery/config/xyz: 'abc'
Thank's @dmitryax!
Regarding the 1st point, in case we want to use the same set of hints with docker it is suggested to use reverse dns naming convention: https://docs.docker.com/engine/manage-resources/labels/#key-format-recommendations
I'm fine with 2 and 3 points.
However K8s does not allow multiple /
, so our annotations should only contain one.
I think your suggestion would cover the needs of this feature and is quite future compliant with the potential proper discovery
.
2 questions:
1) Should we only expose specific settings through the annotations or allow everything? See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34427#issuecomment-2400208585
2) How about having an additional annotation category for container logs? opentelemetry.io.discovery.container_logs/enabled: "true"
. This would indicate that a filelog receiver instance will start collecting Pod's containers' logs.
// once we have proper discovery, this would control particular pods turning it off/on
opentelemetry.io.discovery/enabled: "true"
// this one should be required as of right now, until we have the discovery
opentelemetry.io.discovery/scraper: nginx
// extra annotations to control scraper behavior
opentelemetry.io.discovery/signals: "metrics,logs"
opentelemetry.io.discovery/config.endpoint: "http://`endpoint`/nginx_status"
opentelemetry.io.discovery/config.collection_interval: '20s'
opentelemetry.io.discovery/config.initial_delay: '20s'
opentelemetry.io.discovery/config.read_buffer_size: '10'
opentelemetry.io.discovery/config.xyz: 'abc'
// nested configs
opentelemetry.io/discovery/config.nested: |
zyz: foo
bar: baz
// enable container logs collection
opentelemetry.io.discovery.container_logs/enabled: "true"
opentelemetry.io.discovery.container_logs/config.max_log_size: "2MiB"
opentelemetry.io.discovery.container_logs/config.foobar: "xyz"
opentelemetry.io.discovery.container_logs/config.operators: |
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
We should also take into account the multiple containers case which would be covered by including the name of the port in the annotation for scraper:
opentelemetry.io.discovery.<container_port>/scraper: nginx
opentelemetry.io.discovery.<container_port>/config.endpoint: "http://`endpoint`/nginx_status"
and the name of the container for container_logs
:
opentelemetry.io.discovery.container_logs.<container_name>/enabled: "true"
opentelemetry.io.discovery.container_logs.<container_name>/config.max_log_size: "2MiB"
Let's try not to restrict users from achieving particular configuration use cases but don't provide different ways to achieve the same results. Otherwise, it'll be much harder to maintain and introduce any changes to the interface if we ever have to do that.
If we keep metrics
and logs
(I don't think we need container_
prefix) as a part of the key, let's make it the only way. We don't need the .../signals
key in that case.
Why do we need to provide annotation per a config option AND the nested
config? Can we have only one?
We can stick with the reverse domain notation if we have the one /
restriction and put the discovery-related parts before that.
What about starting with supporting the following:
io.opentelemetry.discovery.metrics/enabled: "true"
io.opentelemetry.discovery.logs/enabled: "true"
io.opentelemetry.discovery.metrics.8888/enabled: "false" // takes priority
io.opentelemetry.discovery.log.my-container/enabled: "false" // takes priority
io.opentelemetry.discovery.metrics/scraper: "nginx"
io.opentelemetry.discovery.metrics/config: |
endpoint: "http://`endpoint`/nginx_status"
collection_interval: '20s'
initial_delay: '20s'
read_buffer_size: '10'
xyz: 'abc'
io.opentelemetry.discovery.metrics.8888/scraper: "prometheus" // takes priority
Is there any limit on size of the annotation values in k8s so we can provide the whole config easily?
Thank's @dmitryax!
Why do we need to provide annotation per a config option AND the nested config? Can we have only one?
I thought that providing support for nested configurations was a nice to have extension in addition to the base option which is to provide one config option per line/annotation. From my perspective it's more user-friendly to provide "one-liners" like io.opentelemetry.discovery/config.endpoint: "http://`endpoint`/nginx_status"
in case you only want to define one single setting. Then, in case there is need for multi-line config settings this should also be allowed.
However since it was also found confusing at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35617#discussion_r1810268608 I think we can proceed with one single approach and only support providing the whole config for now.
Is there any limit on size of the annotation values in k8s so we can provide the whole config easily?
The validation code only restricts the total size of annotations to 256kB which equals to 262144 characters. I guess we should be fine with that fair limit.
If we keep metrics and logs (I don't think we need container_ prefix) as a part of the key, let's make it the only way. We don't need the .../signals key in that case.
I'm fine with the io.opnetelemetry.discovery/signals
key for tuning the scraper's signals. I think it's more tidy and makes the implementation simpler.
However, the logs
signal of the scraper does not indicate necessarily that the container's logs will be collected from stdout/file. That's why I propose the io.opentelemetry.discovery.container_logs/enabled: "true"
annotation which can be added extra to a Pod that has io.opentelemetry.discovery/signals: "metrics,logs"
. In this, additionally to the configured scraper (which can collect metrics and logs signals) a filelog
receiver with the container_parser
enabled will get started in order to collect the container's logs properly. Would that make sense?
As first step we can only support scraper annotations like the following:
io.opentelemetry.discovery/enabled
io.opentelemetry.discovery/signals
io.opentelemetry.discovery/scraper
io.opentelemetry.discovery/config
io.opentelemetry.discovery.80/scraper # takes priority
Then we can consider the additional opinionated hint/suggestion to collect container logs through the filelog receiver in follow up PR/issue. PR tuned accordingly @dmitryax, let me know what you think.
From my perspective it's more user-friendly to provide "one-liners" like io.opentelemetry.discovery/config.endpoint: "http://`endpoint`/nginx_status" in case you only want to define one single setting.
io.opentelemetry.discovery/config: "endpoint: http://`endpoint`/nginx_status"
is still pretty user friendly
I'm fine with the io.opnetelemetry.discovery/signals key for tuning the scraper's signals. I think it's more tidy and makes the implementation simpler.
It's unlikely that logs (once we have it) and metrics will be collected by the same receiver/scraper. So we won't able to reuse io.opentelemetry.discovery/config
when users want to configure collecting both signals. Also io.opentelemetry.discovery/scraper
likely won't be applicable to logs collection. So I think we have to keep the <signal-type>
as part of the annotation key and don't introduce io.opentelemetry.discovery/signals
So @dmitryax from your previous proposal the assumption is that io.opentelemetry.discovery.logs/enabled: "true"
spawns a filelog
receiver with container
parser enabled?
(with only metrics
signal supporting the io.opentelemetry.discovery/scraper
)
So in that case I believe it makes sense, yes, and we can conclude with the following:
io.opentelemetry.discovery.metrics/enabled: "true"
io.opentelemetry.discovery.logs/enabled: "true"
io.opentelemetry.discovery.metrics.8888/enabled: "false" // takes priority
io.opentelemetry.discovery.log.my-container/enabled: "false" // takes priority
io.opentelemetry.discovery.logs/config: | # additional filelog receiver configuration here
max_log_size: "2MiB"
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
io.opentelemetry.discovery.metrics/scraper: "nginx"
io.opentelemetry.discovery.metrics/config: |
endpoint: "http://`endpoint`/nginx_status"
collection_interval: '20s'
initial_delay: '20s'
read_buffer_size: '10'
xyz: 'abc'
io.opentelemetry.discovery.metrics.8888/scraper: "prometheus" // takes priority
@ChrsMark, looks good to me! 👍
Thank's @dmitryax @rogercoll for the feedback here!
I have updated the PR and the logic seems to work quite smooth so far.
I will keep testing (and possibly add more tests) but I would appreciate any reviews there already, mainly to validate the receiver_creator
's configuration approach (how the feature is turned on/off and its defaults).
Since the metrics part has been merged I have opened a PR that adds support for the logs part based on https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34427#issuecomment-2436140051.
Component(s)
receiver/receivercreator
Prior related proposal: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/17418
Is your feature request related to a problem? Please describe.
This issue proposes to add support for a hints/suggestions based autodiscovery mechanism for the
receivercreator
. This was initially discussed at https://cloud-native.slack.com/archives/C01N5UCHTEH/p1718893771362289.Describe the solution you'd like
The goal is to provide support to the users for automatically enabling receivers by annotating their Pods properly:
Inspiration comes from Filebeat's/Metricbeat's and Fluentbit's respective features.
Describe alternatives you've considered
While such a dynamic workload monitoring approach is already supported by the
receivercreator
and thek8s_observer
, it still requires the users to have already predefined thereceivercreator
's sub-configs. This means that if we want to later target and discover a different workload we need to extend the Collector's config and restart the Collector's instances.Providing the option to achieve the same result seamlessly would ease the experience for both the application developers who deploy workloads on K8s but also for the admins of the K8s cluster and/or the OTel based observability stack.
Additional context
Feature feasibility
I have already tried that out and it seems that the
receivercreator
can indeed be extended to support such a feature.There might be a need to distinguish the implementation for logs and metrics respectively, to best cover the default behaviors (i.e. we want to parse all logs from containers by default in a generic way unless we are hinted to use a technology specific "input". One thing to cover here would be to ensure that the feature would play well with a future templates provider)
A very dirty PoC can be found at https://github.com/open-telemetry/opentelemetry-collector-contrib/commit/4591ecfaaddb62c24f1bfe004d33ba4b4c0cb9cf. This is just to illustrate the idea for now. Here is how it works:
values.yaml
for installing the Collector's Helm Chart:Target Redis Pod:
Collector's output:
Specification
Prerequisite: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35544
In the context of this feature we can focus on 2 primary goals: 1) Collect Metrics from dynamically discovered workloads 2) Collect Logs from dynamically discovered workloads
For Pods with mutliple containers we need to ensure that we collect logs from each one of them and that we can scope the metrics collection accordingly per container/port.
Collect Metrics dynamically from workloads
⚠️ To support use cases where a Pod consists of multiple containers the configurations generated by metrics' hints will be of
type: port
.Supported hints:
io.opentelemetry.collector.receiver-creator.metrics/receiver: redis
io.opentelemetry.collector.receiver-creator.metrics/endpoint: "`endpoint`:6379"
io.opentelemetry.collector.receiver-creator.metrics/username: "admin"
io.opentelemetry.collector.receiver-creator.metrics/password: "passpass"
io.opentelemetry.collector.receiver-creator.metrics/collection_interval: 10s
io.opentelemetry.collector.receiver-creator.metrics/timeout: 1m
With some more advanced options:
io.opentelemetry.collector.receiver-creator.metrics/raw: ''
To support users that want to provide more advanced configurations.Extra Additionally we can support defining hints per Pod's container/port i.e.:
io.opentelemetry.collector.receiver-creator.metrics.<port_name>/receiver: redis
. To support use cases where a Pod consists of multiple containers and we want to specify a specific portCollect Logs dynamically from workloads
⚠️ To support use cases where a Pod consists of multiple containers the configurations generated by logs' hints will be of
type: pod.container
(see https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35544).Supported hints:
io.opentelemetry.collector.receiver-creator.logs/enabled: true
-> This will enable a default filelog receiver for the specific pathio.opentelemetry.collector.receiver-creator.logs/template: redis
-> If/When we will have support for technology specific templates(https://github.com/open-telemetry/opentelemetry-collector/issues/8372)io.opentelemetry.collector.receiver-creator.logs/raw: ''
We could expose more settings of filelogreceiver through annotations. For example
io.opentelemetry.collector.receiver-creator.logs/max_log_size
,io.opentelemetry.collector.receiver-creator.logs/operators
Extra Additionally we can support defining hints per Pod's container/port i.e.:
io.opentelemetry.collector.receiver-creator.logs.<container_name>/operators: redis
Next steps
I would like to get feedback from the community/SIG on this. Some pieces might be missing from this proposal, so any kind of feedback would be helpful here :). Eventually, in case we find this a good fit for the Collector I'd be happy to contribute on its implementation (and future support of it).
We could potentially break this up to 2 different scopes, one for the metrics use-case and one for the logs use-case since the requirements are a bit different in each one of them.