Open djaglowski opened 1 year ago
Based on discussion in today's SIG meeting, I have compiled a list of differences between this approach and the "Template Receiver" proposed in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/26312.
The collector is informed of templates at run time via a new template
provider scheme. Basically, the user runs the collector with an additional --config template:/path/to/my_template.yaml
for each template.
The collector loads each template file and validates the expected structure.
The receiver contains a config field e.g. path
which indicates where to find the template file.
The receiver factory reads and validates the file when instantiating the receiver.
✅ We should be able to include templates by incorporating other types of providers. e.g. url, s3, etc
⚠️ It might be possible for the receiver to tap into "providers" directly. At best this seems like an awkward mechanism for a component to incorporate.
Each template file contains a unique "type", much like a receiver, processor, or exporter type.
✅ Once loaded and validated, each template may be referred to by its type. No additional information is required in order to refer to a given template.
⛔ Types must be unique. This can easily be caught at runtime, but a simple namespace pattern may be helpful.
There is no notion of types. Each instance of the receiver has a path to a template file.
⛔ If multiple receivers refer to the same template, they must each specify the path to the file.
⛔ To define a templated component, the component ID is prefixed with template/
to indicate that the component is templated. e.g. template/couchbase_metrics
or `template/couchbase_metrics/1
✅ Template parameters are specified just like any other component.
receivers:
template/my_template:
foo: bar
✅ template
is an actual receiver type, so it is used in configuration just like any other type of receiver.
⛔ Template parameters are specified in a sub-section of the configuration because the top level must separately define the source of the template.
receivers:
template/1:
path: ./my_template.yaml
parameters:
foo: bar
When the template is rendered, the components and pipelines it contains are merged into the overall configuration.
⛔ Exposing the internals of the template is arguably against the general purpose of template, which are should abstract away complexity from the user.
✅ The effective configuration is accurate, even if it contains some elements which must be understood as internal to the template.
✅ The effective configuration closely matches the level of abstraction familiar to the user.
⛔ Technically, the effective configuration is not accurate because it omits the internals of the template. It's not clear that there is any mechanism which would surface these details.
The template is expanded and merged into the overall configuration.
✅ Service::Telemetry configuration is naturally the same as the rest of the collector.
✅ Rendered components and pipelines are started & stopped as part of normal collector lifecycle.
The receiver runs an internal service which contains the templated components.
⛔ It's not clear that there is a mechanism to inherit telemetry settings from the main service.
⛔ The receiver must manage the independent service instance.
@jpkrohling, I hope I've represented your perspective fairly in the above comparison. I'm happy to update the list if not.
cc: @tigrannajaryan @jsuereth @jkowall @joshdover
Sorry @djaglowski im not working on otel for the last 10 months.
I believe your summary did capture what we talked about during the call. I'm convinced that the config provider approach is reasonable and provides a good set of features despite still thinking that the UX for the template provider is friendlier.
I'm convinced that the config provider approach is reasonable and provides a good set of features despite still thinking that the UX for the template provider is friendlier.
@jpkrohling you mention provider twice, did you mean "UX for the template receiver is friendlier"?
@djaglowski is it possible to have 2 template/couchbase_metrics
receivers with different endpoints and attached to 2 different pipelines? Do you have to make the pipeline name a template parameter in that case?
Are there any limitations on what keys may be used in a template definition? (Can I add exporters, connectors, etc?)
@djaglowski is it possible to have 2
template/couchbase_metrics
receivers with different endpoints and attached to 2 different pipelines?
Yes. What I'm proposing here is that each use of a template has the same level of uniqueness as you would have when using a normal component: Just as you would use otlp
and otlp/2
as distinct receivers, you can use template/foo
and template/foo/2
as distinct receivers.
If the normal component ID format is component_type[/instance_name]
, the templated component ID format would be template/
followed by template_type/[instance_name]
. Effectively, we reserve template
as a special "receiver type" and trigger the special behavior of rendering a template and substituting it in place of the receiver.
Do you have to make the pipeline name a template parameter in that case?
No, the user would use the template ID in whichever pipelines they choose, exactly as they would use a receiver ID.
receivers:
template/couchbase_metrics:
endpoint: couchbase:8091
username: otelu
password: otelpassword
template/couchbase_metrics/another:
endpoint: somethingelse:9999
username: user
password: pass
exporters:
prometheus:
endpoint: 0.0.0.0:9123
service:
pipelines:
metrics:
receivers: [template/couchbase_metrics]
exporters: [prometheus]
metrics/2:
receivers: [template/couchbase_metrics/another]
exporters: [prometheus]
Are there any limitations on what keys may be used in a template definition? (Can I add exporters, connectors, etc?)
What I am proposing immediately is that yes, there are restrictions. A rendered template would have a format very similar to a normal config file. However, it would be slimmed down slightly:
receivers:
foo:
foo/2:
processors:
bar:
bar/2:
# exporters not allowed
# connectors not allowed
# service not allowed. Use "pipelines" directly
pipelines:
logs:
receivers: [foo, foo/2]
processors: [bar, bar/2]
# exporters not allowed. The template _is_ a receiver in a pipeline, so there is implicitly an "exporter"
# which passes data onto the pipeline(s) in which the template is used.
# additional pipelines here, just as in a normal config, except "exporters" is implied.
That said, I believe we can reasonably add support for connectors and exporters from here, if we choose. Here's how:
Keeping in mind that this template is acting as a receiver, we should emit data onto the pipeline(s) in which the template is used as a receiver. To that end, I think that every template would be required to include an autogenerated "forward" connector as an exporter from the template. The definition of the connector is implied, but it must be used in at least one pipeline in order for the template to make sense as a receiver.
receivers:
foo:
endpoint: {{ .host }}:{{ .port }}
processors:
bar:
hello: {{ .name }}
exporters:
{{ if .copy_data_file }}
file: # we'll use this to send a copy of all data emitted by the template into a file
path: {{ .copy_data_file }}
{{ end }}
pipelines:
logs:
receivers: [ foo ]
processors: [ bar ]
exporters:
- forward # autogenerated connector which is used to forward data onto the pipeline(s) in which the template is used.
{{ if .copy_data_file }}
- file
{{ end }}
Tying this back in with your earlier question about using a template multiple times, it's important to understand that all components within the template will be "scoped" by having /template_type[/instance_name]
appended to their ID, such that when we expand the template into a configuration, all components will still be uniquely identified. (Technically is it possible to have collisions but this can be addressed in reasonable ways. Setting this aside for now..)
For example, let's say the above template type is called foo_bar
. We could use it as follows and expect the corresponding effective configuration:
Actual configuration
receivers:
template/foo_bar:
host: localhost
port: 1234
name: Tigran
copy_data_file: ./myfile.json
template/foo_bar/2:
host: localhost
port: 6789
name: Dan
exporters:
otlp: ...
service:
pipelines:
logs:
receivers: [ template/foo_bar, template/foo_bar/2 ]
exporters: [ otlp ]
logs/only_2:
receivers: [ template/foo_bar/2 ]
exporters: [ otlp ]
Effective configuration
receivers:
foo/foo_bar: # Inserted from first use of template. Note the ID is scoped according to the template instance ID
endpoint: localhost:1234
foo/foo_bar/2: # Inserted from second use of template.
endpoint: localhost:6789
processors:
bar/foo_bar: # From first use of template
hello: Tigran
bar/foo_bar/2: # From second use of template
hello: Dan
exporters:
otlp: ... # included directly in the actual config
file/foo_bar: # From first use of template. No corresponding component was rendered in second use.
path: ./myfile.json
connectors:
forward/foo_bar: # autogenerated to pass data from "template/foo_bar" to where it was used. Internally, the template just refers to this as "forward". The "/foo_bar" is added as "scope", just the same as all other components within the template.
forward/foo_bar/2: # autogenerated to pass data from "template/foo_bar/2" to where it was used
service:
pipelines:
# These first two were generated from the template, one for each instance.
logs/foo_bar:
receivers: [ foo/foo_bar ]
processors: [ bar/foo_bar ]
exporters: [ forward/foo_bar, file/foo_bar ]
logs/foo_bar/2:
receivers: [ foo/foo_bar/2 ]
processors: [ bar/foo_bar/2 ]
exporters: [ forward/foo_bar/2 ]
# The next two are the original pipelines defined in the config. Note that we've replaced the references
# to the template instances with the corresponding forward connector instances.
logs:
receivers: [ forward/foo_bar, forward/foo_bar/2 ]
exporters: [ otlp ]
logs/only_2:
receivers: [ forward/foo_bar/2 ]
exporters: [ otlp ]
OK, I think I understand now.
I think it is important to add to the original issue description these 2 key ideas that you refer to in your comment:
forward
connector that attaches the instantiated pipelines to the service pipelines.The net result is that essentially the template declares its own connected graph of receivers, processors and pipelines, and once instanced the output of that graph autoconnects using a forward
connector to the input of the pipeline with which the instantiated receiver is associated via the service.pipelines.<name>.receivers
key. Is that correct?
Question: is it required for template definition to contain processors
and pipelines
entries or they are optional and if absent we just connect the instantiated receiver to the service pipeline directly?
Thanks @tigrannajaryan, I've updated the issue to include these key ideas.
is it required for template definition to contain processors and pipelines entries or they are optional and if absent we just connect the instantiated receiver to the service pipeline directly?
I believe what you are suggesting is possible. It should not be difficult to support both cases.
@codeboten, yes! I did mean to say that I find the receiver having better UX. After @djaglowski's presentations and arguments, I no longer have a strong opinion in favor of the receiver despite still lightly leaning toward it.
Is your feature request related to a problem? Please describe.
Configuration of the collector is a major barrier to entry for users because the process of "developing" a configuration solution often requires detailed knowledge of one or more collector components, a sophisticated understanding of how to interface with an external technology, or just a non-trivial amount of effort working through necessary data manipulations.
Describe the solution you'd like
We should provide an abstraction mechanism that allows expert users to abstract away complex "configuration solutions" and provide novice users with a simplified configuration experience.
I propose that expert users should write templated configuration files which are natively recognized by the collector. Novice users may then include a templated solution in their configuration by defining only a simplified set of parameters.
For example, consider the following configuration (source) for scraping and normalizing metrics from couchbase.
The configuration includes a complex receiver and multiple complex processors. Ideally, a novice user should only need to be concerned with the endpoint and auth values. Most of the complexities of the receiver, and all of the complexities of the processors can be abstracted away such that the following configuration is equivalent:
The template file, would look like this.
The parameters specified by the novice user are rendered into this template. Then, the collector merges the rendered template into the overall configuration, yielding an effective configuration that achieves the exact same functionality.
Implementation details
Template Instance IDs
Templated components are used in place of receivers, but their usage in the configuration should seem somewhat intuitive. The notable difference is in the way the template component is identified. The standard for component IDs is
<component_type>/[instance_name]
.Templated components would be identified very similarly, with the format
template/<template_type>[/instance_name]
.template
indicates that the component is templated.template_type
is required and must reference a known template type, just as acomponent_type
would reference a known component type.instance_name
is optional. It serves exactly the same purpose as an instance name for a component. That is, it allows the user to define multiple instances of the template and refer to them distinctly in the configuration.Template use in pipelines
Just as a normal receiver, a template is defined with an ID, and this ID must be used in one or more pipelines to indicate where data will be emitted.
Template expansion
The collector will render a template and integrate its components and pipelines into the service graph.
my_template/1
appended to each component and pipeline ID defined within the template.Each pipeline within the template is integrated into the overall service graph as follows:
forward
connector that attaches the templated pipelines to the service pipelines.Describe alternatives you've considered
Previously proposed as a receiver. This issue contains discussion about what is currently possible via various configuration merging strategies.
Additional context