Implement receiver/processor/exporter "group" components

tigrannajaryan commented 1 year ago

Is your feature request related to a problem? Please describe.

Collector configuration requires listing receivers at least in 2 places: once in the top-level receivers section and one more time the receivers setting of the pipeline.

This gives flexibility to use the same receiver in more than one pipeline, but complicates the most common use case when the receiver is used in one pipeline only.

Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone (and prevents an interesting use case described below).

Describe the solution you'd like

I would like to have a new group receiver component that can instantiate other receivers. The config for example can look like this:

receivers:
  group/all:
    hostmetrics: ...
    otlp: ....
    redis: ...

service:
  pipelines:
    metrics:
      receivers: [group/all]
      processors: ...
      exporters: ...

With a config like this if I need to add or remove a receiver I only need to do it in one place, under the group/all section.

Functionality the group component will be a significantly simplified version of the existing receivercreator component.

The exact same problem applies to processors and exporters for which equivalent processor group and exporter group component would be also highly desirable.

Additional context

Here is a use case.

We can break down Collector config file into smaller, more manageable config files, where receivers, processors and exporters are in their own files. For example:

# this is the main config file

receivers:
  group/all: ${file: receivers.yaml}

processors:
  group/all: ${file: processors.yaml}

exporters:
  group/all: ${file: exporters.yaml}

service:
  pipelines:
    metrics:
      receivers: [group/all]
      processors: [group/all]
      exporters: [group/all]

# this is receivers.yaml

hostmetrics: ...

otlp: ...

redis: ...

# this is exporters.yaml

otlphttp:
  endpoint: ...

Aneurysm9 commented 1 year ago

I like the concept. Would it be possible to achieve this as a config convertor that rewrites the pipeline configuration to expand a set of components? That could make it a zero-cost abstraction at runtime with low startup overhead.

tigrannajaryan commented 1 year ago

I like the concept. Would it be possible to achieve this as a config convertor that rewrites the pipeline configuration to expand a set of components? That could make it a zero-cost abstraction at runtime with low startup overhead.

This is zero-cost abstraction for the receivergroup and a tiny cost for processorgroup and exportergroup (just one for loop going over the list of nested component and a single extra function call per nested component - I think it will be negligible). We can benchmark, but my gut feeling is that the a straightforward, trivial implementation should be good enough.

atoulme commented 1 year ago

I think this is a great idea. I believe @rmfitzpatrick implemented this approach in our config to allow to create config modules. We can port over/adopt this approach.

djaglowski commented 1 year ago

I like the idea a lot. What do you think about using group as the component name?

receivers:
  group/all:
    hostmetrics: ...
    otlp: ....
    redis: ...

service:
  pipelines:
    metrics:
      receivers: [group/all]
      processors: ...
      exporters: ...

tigrannajaryan commented 1 year ago

I like the idea a lot. What do you think about using group as the component name?

I like the shorter name.

Can we have a component named group which is a receiver, a processor or an exporter depending on the context it is used in?

djaglowski commented 1 year ago

Can we have a component named group which is a receiver, a processor or an exporter depending on the context it is used in?

Yes, all are always in context of one type (receiver/processor/exporter). We have existing examples of overlap such as otlp, though of course this is only receiver/exporter, but the same concept applies to processors.

tigrannajaryan commented 1 year ago

I updated the PR title/description to use group as the component name.

bogdandrutu commented 1 year ago

Lets avoiding jumping to the implementation details Converter/Proper Component/etc. I think the idea is very good and helpful. Let'a agree for the configuration.

For some reasons I don't like the name "group". It does not signal to me that this is just a "config grouping". I have some questions, maybe that will clarify:

Can we reference a receiver from a group in a pipeline or only the whole group?
For processors, if we don't allow individual references then what is the order of execution?

rmfitzpatrick commented 1 year ago

Is it fair to say this proposal attempts to solve issues arising from component definitions and in-sequence reference limitations with anchors, aliases, and merge keys? I wonder if a config provider that offers starlark or something like ytt* support could solve this without requiring new components or converter adoption.

tigrannajaryan commented 1 year ago

Can we reference a receiver from a group in a pipeline or only the whole group?

Only the whole group. The group is just a regular component from Service's perspective. The inner structure of the group and the fact that it instantiates other components is invisible from the outside. This is very similar to existing receivercreator component.

For processors, if we don't allow individual references then what is the order of execution?

Ahh, that's a good point. I started with receivers and forgot to specify ordering for processors. We can make the config a sequence for group processor, e.g.:

processors:
  group/myprocessors:
    - batch:
        # batch settings
    - attributes:
        # attributes settings

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [group/myprocessors]
      exporters: [otlp]

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 1 year ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.

tigrannajaryan commented 1 year ago

@open-telemetry/collector-contrib-approvers we had some interest from approvers/maintainers in the past. Should we reopen this or keep closed?

andrzej-stencel commented 1 year ago

In my experience I haven't felt the need for this kind of grouping of components, so I don't see the benefits. On the other hand, I see the disadvantage of making the config more complex and hard to read by regular users, by introducing another level of complexity to it, making it easier to make mistakes and harder to understand resulting error messages. As I said, the benefits might justify it, but I personally don't see them. :smile:

If I ever wanted to split configs, it is by functionality. For example, here's how the config is split in the Sumo distro:

The "main" sumologic.yaml file has the basics, like the OTLP receivers and pipelines, the file_storage extension https://github.com/SumoLogic/sumologic-otel-collector/blob/v0.80.0-sumo-0/examples/sumologic.yaml
Additional config to scrape Linux host logs and metrics defines the receivers, processors and pipelines required for this: https://github.com/SumoLogic/sumologic-otel-collector/blob/v0.80.0-sumo-0/examples/conf.d/linux.yaml

Of course I don't aspire to know about all useful use cases in the world. Apparently there are people who find this useful, so this might be implemented, and that's OK. :smile: Just wanted to make sure all opinions are aired.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

mx-psi commented 1 year ago

Is using the forward connector enough here? For example, rewriting the first post example:

receivers:
  hostmetrics: ...
  otlp: ....
  redis: ...

connector:
  forward/all:

service:
  pipelines:
    metrics/all:
       receivers: [hostmetrics, otlp, redis]
       exporters: [forward/all]
    metrics:
      receivers: [forward/all]
      processors: ...
      exporters: ...

A similar approach can be taken for processors and exporters (processors is a bit more complex). It is a bit uglier/needs a bit more config, but it still gets 80% of the way down what we want (having a way to manage group of components easily in pipelines)

andrzej-stencel commented 1 year ago

Is using the forward connector enough here?

This doesn't seem to solve the original problem stated in the opening comment:

Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone

mx-psi commented 1 year ago

I was focusing on this part

With a config like this if I need to add or remove a receiver I only need to do it in one place, under the group/all section.

Not saying it gets everything fixed, but it does address this and makes configuration a bit less repetitive

tigrannajaryan commented 1 year ago

Every time a new receiver needs to be added, it needs to be added in 2 places in the config, which is annoying and error prone

This was indeed my primary problem and the forward connector does not seem to help with that.

dmitryax commented 1 year ago

This approach can also be applicable to extensions to solve a use case described in https://github.com/open-telemetry/opentelemetry-collector/issues/8394.

I believe we need a way to define order for all component types, not just processors. Even if the order for other components is not that important, it still defines the collector's behavior like the order of start/shutdown and data exporting. So I think there should be a way to set an order, maybe optional, applying alphabetical order by default.

Defining order as a list makes the overriding of the group difficult. For example, if I want to extend the group by applying another config or by including multiple files with globs, I would need to override the whole list.

Not sure what would be the best solution for defining the order. Maybe another field for that in every group's component, the components will be sorted alphabetically by a value of that field. Or we can use the name of components for that purpose, e.g. processors list would be compiled into [memory_limiter/1, k8sattributes/2, batch/3, transform/4_add_attr_a, transform/5_remove_attr_b]

djaglowski commented 1 year ago

I believe we need a way to define order for all component types, not just processors. Even if the order for other components is not that important, it still defines the collector's behavior like the order of start/shutdown and data exporting. So I think there should be a way to set an order, maybe optional, applying alphabetical order by default.

Pipeline components are currently ordered according to a topological sort. I do not think we can easily or necessarily should worry about ordering the start/stop sequence of pipeline components (except processors, which are naturally ordered in each pipeline due to topological ordering).

I don't understand the motivation for this. What is the reason why the user needs to control this beyond what the service graph already does?

open-telemetry / opentelemetry-collector-contrib