open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.89k stars 2.26k forks source link

Default routing to receiver-specific data streams #34246

Open felixbarny opened 1 month ago

felixbarny commented 1 month ago

Component(s)

exporter/elasticsearch

Is your feature request related to a problem? Please describe.

When dynamic indexing to data streams is enabled, we currently route signals to <type>-generic.otel-default, for example logs-generic-default. A challenge with that is that the data for all receivers is going to the same data streams. We should instead separate the data a bit better according to the data stream naming scheme, without risking a data stream explosion.

As this impacts the default routing, we should implement this before GA as changing the data streams can be considered to be a breaking change.

Describe the solution you'd like

If the scope.name matches the regex ^otelcol/(.*receiver\/?[^\s]*?), the dataset will be set to the capture group 1 ($1). For example, hostmetricsreceiver_process (or hostmetricsreceiver_process.otel in the OTel output mode) for the scope name otelcol/hostmetricsreceiver/process 8.15.0. This ensures that we don't send all metrics from well-known receivers to metrics-generic.otel-default. As tracing instrumentations typically set a different scope name for each instrumented library, and because it can be user-defined with an unknown cardinality, we don't want to route by any generic scope name to not risk an explosion of data streams. Instead, we only route based on receivers by default, where the granularity and cardinality is limited and matches well with the definition of the data stream naming scheme.

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners: