open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.92k stars 2.29k forks source link

receiver/hostmetrics: Paging device is populated on Windows but not other platforms #5030

Closed punya closed 1 year ago

punya commented 3 years ago

Describe the bug When the host metrics receiver is enabled with the paging scraper, it produces metric data points describing paging usage. According to the metadata

  system.paging.usage:
    description: Swap (unix) or pagefile (windows) usage.
    unit: By
    data:
      type: int sum
      aggregation: cumulative
      monotonic: false
    labels: [paging.device, paging.state]

it's supposed to have a label for the paging device. In reality, that label exists on Windows but not on any other platform.

Possible fixes

  1. Implement the device label on non-Windows OSes, possibly by contributing to shirou/gopsutil.
  2. Fix the docs to say that the device is only populated on Windows.
  3. Paging device isn't covered by the spec right now. Figure out if we should add it to the spec or mark it as an implementation-specific extension.

Steps to reproduce On {Windows, Linux}:

  1. Configure the collector to enable the hostmetrics receiver with the paging scraper.
  2. Create a pipeline that logs metric data points for debugging purposes.

What did you expect to see? Labels for paging device and paging state.

What did you see instead? On Linux: only paging device. For example,

Metric #0
Descriptor:
     -> Name: system.paging.usage
     -> Description: Swap (unix) or pagefile (windows) usage.
     -> Unit: By
     -> DataType: IntSum
     -> IsMonotonic: false
     -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
IntDataPoints #0
Data point labels:
     -> state: used
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-06-15 15:31:03.896993103 +0000 UTC
Value: 1310720
IntDataPoints open-telemetry/opentelemetry-collector#1
Data point labels:
     -> state: free
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-06-15 15:31:03.896993103 +0000 UTC
Value: 1070592000
IntDataPoints open-telemetry/opentelemetry-collector#2
Data point labels:
     -> state: cached
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-06-15 15:31:03.896993103 +0000 UTC
Value: 1835008

What version did you use? Version: v0.28.0 (contrib collector installed from .deb on the releases page)

What config did you use?

receivers:
  hostmetrics:
    scrapers:
      paging: {}

exporters:
  logging:
    loglevel: debug

processors:
  batch:

extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [logging]
      processors: [batch]

Environment OS: Debian 10 on GCE

Additional context Based on looking at the source code, the Windows-specific scraper impl populates the label and other others don't. The other ones delegate to shirou/gopsutil, which (at least on Linux) uses underlying mechanisms that don't provide a breakdown by device.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.