open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.48k stars 1.47k forks source link

[metrics builder] Ability to re-aggregate metric by attributes #10726

Open dmitryax opened 2 years ago

dmitryax commented 2 years ago

We need to provide users with the ability to change a set of attributes emitted by a metrics receiver by applying an automatic re-aggregation of the data points. For example, system.cpu.time is emitted by default with state and cpu attributes. Many users don't need metrics per CPU core, so they would like to get metrics per host and state instead. Currently, they would need to set up an additional metricstransform processor to achieve that. But this can happen inside the metrics builder instead.

Changes to metadata.yaml interface for receiver builders

metrics:
  system.cpu.time:
    enabled: true
    description: Total CPU seconds broken down by different states.
    unit: s
    sum:
      value_type: double
      aggregation: cumulative. # Probably to be renamed to not confuse with attributes_aggregation
      monotonic: true
    attributes_aggregation: sum  # NEW FIELD, name of the field is TBD
    attributes:
      cpu:
        enabled: true  # NEW FIELD
        description: CPU number starting at 0.

      state:
        enabled: true  # NEW FIELD
        description: Breakdown of CPU usage by type.
        enum: [idle, interrupt, nice, softirq, steal, system, user, wait]

This will also allow us to introduce the notion of Optional attributes that are disabled by default. E.g. cpu core is a good candidate for that.

This also requires moving the attributes section from the top level under each metric removing the naming confusion between the name of an attribute key and value field that not be needed anymore. Receiver authors that don't like the additional repetition can use YAML anchors instead.

Additional interface to user configuration

The metrics interface will get additional fields to disable/enable metrics and change the aggregation type. For example:

metrics:
  system.cpu.time:
    attributes_aggregation: [sum|avg|min|max]
    attributes:
      cpu:
        enabled: [true|false]
      state:
        enabled: [true|false]

Action items

dmitryax commented 1 year ago

As discussed in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/16533, we will not be moving the attributes to the metrics section in metadata.yaml. So I'm closing the corresponding action item as "Won't do"

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 1 year ago

Pinging code owners for cmd/telemetrygen: @mx-psi @amenasria @codeboten. See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax commented 1 year ago

It's cmd/mdatagen not telemetrygen :)

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.

atoulme commented 1 year ago

Rename sum.aggregation field to not confuse it with attributes_aggregation Rename sum.aggregation to sum.aggregation_temporality ? Following https://github.com/open-telemetry/opentelemetry-collector/blob/main/pdata/pmetric/generated_sum.go#L49

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 11 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 9 months ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.