Open tigrannajaryan opened 1 year ago
I like the concept. Would it be possible to achieve this as a config convertor that rewrites the pipeline configuration to expand a set of components? That could make it a zero-cost abstraction at runtime with low startup overhead.
I like the concept. Would it be possible to achieve this as a config convertor that rewrites the pipeline configuration to expand a set of components? That could make it a zero-cost abstraction at runtime with low startup overhead.
This is zero-cost abstraction for the receivergroup
and a tiny cost for processorgroup
and exportergroup
(just one for
loop going over the list of nested component and a single extra function call per nested component - I think it will be negligible). We can benchmark, but my gut feeling is that the a straightforward, trivial implementation should be good enough.
I think this is a great idea. I believe @rmfitzpatrick implemented this approach in our config to allow to create config modules. We can port over/adopt this approach.
I like the idea a lot. What do you think about using group
as the component name?
receivers:
group/all:
hostmetrics: ...
otlp: ....
redis: ...
service:
pipelines:
metrics:
receivers: [group/all]
processors: ...
exporters: ...
I like the idea a lot. What do you think about using
group
as the component name?
I like the shorter name.
Can we have a component named group
which is a receiver, a processor or an exporter depending on the context it is used in?
Can we have a component named group which is a receiver, a processor or an exporter depending on the context it is used in?
Yes, all are always in context of one type (receiver/processor/exporter). We have existing examples of overlap such as otlp
, though of course this is only receiver/exporter, but the same concept applies to processors.
I updated the PR title/description to use group
as the component name.
Lets avoiding jumping to the implementation details Converter/Proper Component/etc. I think the idea is very good and helpful. Let'a agree for the configuration.
For some reasons I don't like the name "group". It does not signal to me that this is just a "config grouping". I have some questions, maybe that will clarify:
Is it fair to say this proposal attempts to solve issues arising from component definitions and in-sequence reference limitations with anchors, aliases, and merge keys? I wonder if a config provider that offers starlark or something like ytt* support could solve this without requiring new components or converter adoption.
- Can we reference a receiver from a group in a pipeline or only the whole group?
Only the whole group. The group
is just a regular component from Service's perspective. The inner structure of the group
and the fact that it instantiates other components is invisible from the outside. This is very similar to existing receivercreator
component.
- For processors, if we don't allow individual references then what is the order of execution?
Ahh, that's a good point. I started with receivers and forgot to specify ordering for processors. We can make the config a sequence for group
processor, e.g.:
processors:
group/myprocessors:
- batch:
# batch settings
- attributes:
# attributes settings
service:
pipelines:
metrics:
receivers: [otlp]
processors: [group/myprocessors]
exporters: [otlp]
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
This issue has been closed as inactive because it has been stale for 120 days with no activity.
@open-telemetry/collector-contrib-approvers we had some interest from approvers/maintainers in the past. Should we reopen this or keep closed?
In my experience I haven't felt the need for this kind of grouping of components, so I don't see the benefits. On the other hand, I see the disadvantage of making the config more complex and hard to read by regular users, by introducing another level of complexity to it, making it easier to make mistakes and harder to understand resulting error messages. As I said, the benefits might justify it, but I personally don't see them. :smile:
If I ever wanted to split configs, it is by functionality. For example, here's how the config is split in the Sumo distro:
file_storage
extension https://github.com/SumoLogic/sumologic-otel-collector/blob/v0.80.0-sumo-0/examples/sumologic.yamlOf course I don't aspire to know about all useful use cases in the world. Apparently there are people who find this useful, so this might be implemented, and that's OK. :smile: Just wanted to make sure all opinions are aired.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Is using the forward
connector enough here? For example, rewriting the first post example:
receivers:
hostmetrics: ...
otlp: ....
redis: ...
connector:
forward/all:
service:
pipelines:
metrics/all:
receivers: [hostmetrics, otlp, redis]
exporters: [forward/all]
metrics:
receivers: [forward/all]
processors: ...
exporters: ...
A similar approach can be taken for processors and exporters (processors is a bit more complex). It is a bit uglier/needs a bit more config, but it still gets 80% of the way down what we want (having a way to manage group of components easily in pipelines)
Is using the forward connector enough here?
This doesn't seem to solve the original problem stated in the opening comment:
Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone
I was focusing on this part
With a config like this if I need to add or remove a receiver I only need to do it in one place, under the group/all section.
Not saying it gets everything fixed, but it does address this and makes configuration a bit less repetitive
Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone
This was indeed my primary problem and the forward connector does not seem to help with that.
This approach can also be applicable to extensions to solve a use case described in https://github.com/open-telemetry/opentelemetry-collector/issues/8394.
I believe we need a way to define order for all component types, not just processors. Even if the order for other components is not that important, it still defines the collector's behavior like the order of start/shutdown and data exporting. So I think there should be a way to set an order, maybe optional, applying alphabetical order by default.
Defining order as a list makes the overriding of the group difficult. For example, if I want to extend the group by applying another config or by including multiple files with globs, I would need to override the whole list.
Not sure what would be the best solution for defining the order. Maybe another field for that in every group's component, the components will be sorted alphabetically by a value of that field. Or we can use the name of components for that purpose, e.g. processors list would be compiled into [memory_limiter/1, k8sattributes/2, batch/3, transform/4_add_attr_a, transform/5_remove_attr_b]
I believe we need a way to define order for all component types, not just processors. Even if the order for other components is not that important, it still defines the collector's behavior like the order of start/shutdown and data exporting. So I think there should be a way to set an order, maybe optional, applying alphabetical order by default.
Pipeline components are currently ordered according to a topological sort. I do not think we can easily or necessarily should worry about ordering the start/stop sequence of pipeline components (except processors, which are naturally ordered in each pipeline due to topological ordering).
I don't understand the motivation for this. What is the reason why the user needs to control this beyond what the service graph already does?
Is your feature request related to a problem? Please describe.
Collector configuration requires listing receivers at least in 2 places: once in the top-level
receivers
section and one more time thereceivers
setting of the pipeline.This gives flexibility to use the same receiver in more than one pipeline, but complicates the most common use case when the receiver is used in one pipeline only.
Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone (and prevents an interesting use case described below).
Describe the solution you'd like
I would like to have a new
group
receiver component that can instantiate other receivers. The config for example can look like this:With a config like this if I need to add or remove a receiver I only need to do it in one place, under the
group/all
section.Functionality the
group
component will be a significantly simplified version of the existing receivercreator component.The exact same problem applies to processors and exporters for which equivalent processor
group
and exportergroup
component would be also highly desirable.Additional context
Here is a use case.
We can break down Collector config file into smaller, more manageable config files, where receivers, processors and exporters are in their own files. For example: