open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.92k stars 2.29k forks source link

Enhance hostmetrics receiver filesystem scraper to bring it closer to parity with telegraf disk plugin provided metrics #14117

Closed jspaleta closed 1 year ago

jspaleta commented 2 years ago

Is your feature request related to a problem? Please describe.

Found while migrating from telegraf based collection to otelcol, several equivalent metrics are missing in the otelcol hostmetrics filesystem scraper.

Describe the solution you'd like

Implement additional metrics in the hostmetrics filesystem scraper to bring metrics coverage closer to parity with telegraf disk plugin. All the missing metrics are derived from gopsutil package.

Describe alternatives you've considered

None

Additional context

While migrating supported SumoLogic dashboard apps from using telegraf based fields to otelcol native metrics names, several metrics appear to be missing from otelcol native receivers. The metrics listed are an identified subset of differences between telegraf disk and the hostmetrics filesystem scraper.

missing metrics

  1. telegraf name: inodes_free (number of free inodes) Suggested name: system.filesystem.inodes.free
  2. telegraf name: inodes_total (number of inodes) Suggested name: system.filesystem.inodes.total
  3. telegraf name: free (filesystem free bytes) Suggested name: system.filesystem.free telgraf plugin ref: https://github.com/influxdata/telegraf/blob/3d2b7bd210356c20701ce51eaf60480b314ad1c8/plugins/inputs/disk/disk.go#L48
github-actions[bot] commented 2 years ago

Pinging code owners: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

andrzej-stencel commented 1 year ago

I believe all this information is already provided by the hostmetrics receiver's filesystem scraper, even if in not the most intuitive way:

  1. Number of free inodes: system.filesystem.inodes.usage with the attribute state=free
  2. Total number of inodes: you would need to sum system.filesystem.inodes.usage{state="free"} + system.filesystem.inodes.usage{state="used"}. Proof in gopsutil code: Linux, MacOS, FreeBSD, OpenBSD, AIX.
  3. Number of free bytes: system.filesystem.usage with the attribute state=free.

I believe this issue can be closed as resolved given this information.

As an example, here's a configuration for the contrib (v0.60.0) and the output from it on my Linux machine, scraping the filesystem metrics for the root filesystem (mount point /):

exporters:
  logging:
    loglevel: debug

receivers:
  hostmetrics:
    collection_interval: 3s
    scrapers:
      filesystem:
        include_mount_points:
          match_type: strict
          mount_points:
          - /

service:
  pipelines:
    metrics:
      receivers:
      - hostmetrics
      exporters:
      - logging
$ ./otelcol-contrib-0.60.0-linux_amd64 --config config.yaml
2022/10/03 16:03:44 proto: duplicate proto type registered: jaeger.api_v2.PostSpansRequest
2022/10/03 16:03:44 proto: duplicate proto type registered: jaeger.api_v2.PostSpansResponse
2022-10-03T16:03:44.439+0200    info    service/telemetry.go:115        Setting up own telemetry...
2022-10-03T16:03:44.439+0200    info    service/telemetry.go:156        Serving Prometheus metrics      {"address": ":8888", "level": "basic"}
2022-10-03T16:03:44.439+0200    info    components/components.go:30     In development component. May change in the future.     {"kind": "exporter", "data_type": "metrics", "name": "logging", "stability": "in development"}
2022-10-03T16:03:44.459+0200    info    service/service.go:112  Starting otelcol-contrib...     {"Version": "0.60.0", "NumCPU": 16}
2022-10-03T16:03:44.459+0200    info    extensions/extensions.go:42     Starting extensions...
2022-10-03T16:03:44.459+0200    info    pipelines/pipelines.go:74       Starting exporters...
2022-10-03T16:03:44.459+0200    info    pipelines/pipelines.go:78       Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "logging"}
2022-10-03T16:03:44.459+0200    info    pipelines/pipelines.go:82       Exporter started.       {"kind": "exporter", "data_type": "metrics", "name": "logging"}
2022-10-03T16:03:44.459+0200    info    pipelines/pipelines.go:86       Starting processors...
2022-10-03T16:03:44.459+0200    info    pipelines/pipelines.go:98       Starting receivers...
2022-10-03T16:03:44.459+0200    info    pipelines/pipelines.go:102      Receiver is starting... {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2022-10-03T16:03:44.459+0200    info    pipelines/pipelines.go:106      Receiver started.       {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2022-10-03T16:03:44.459+0200    info    service/service.go:129  Everything is ready. Begin running and processing data.
2022-10-03T16:03:47.462+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "#metrics": 2}
2022-10-03T16:03:47.462+0200    info    ResourceMetrics #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.9.0
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope otelcol/hostmetricsreceiver/filesystem 0.60.0
Metric #0
Descriptor:
     -> Name: system.filesystem.inodes.usage
     -> Description: FileSystem inodes used.
     -> Unit: {inodes}
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
NumberDataPoints #0
Data point attributes:
     -> device: STRING(/dev/dm-0)
     -> mode: STRING(rw)
     -> mountpoint: STRING(/)
     -> type: STRING(ext4)
     -> state: STRING(used)
StartTimestamp: 2022-09-26 10:04:08 +0000 UTC
Timestamp: 2022-10-03 14:03:47.461611613 +0000 UTC
Value: 1869988
NumberDataPoints #1
Data point attributes:
     -> device: STRING(/dev/dm-0)
     -> mode: STRING(rw)
     -> mountpoint: STRING(/)
     -> type: STRING(ext4)
     -> state: STRING(free)
StartTimestamp: 2022-09-26 10:04:08 +0000 UTC
Timestamp: 2022-10-03 14:03:47.461611613 +0000 UTC
Value: 28751708
Metric #1
Descriptor:
     -> Name: system.filesystem.usage
     -> Description: Filesystem bytes used.
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
NumberDataPoints #0
Data point attributes:
     -> device: STRING(/dev/dm-0)
     -> mode: STRING(rw)
     -> mountpoint: STRING(/)
     -> type: STRING(ext4)
     -> state: STRING(used)
StartTimestamp: 2022-09-26 10:04:08 +0000 UTC
Timestamp: 2022-10-03 14:03:47.461611613 +0000 UTC
Value: 146400174080
NumberDataPoints #1
Data point attributes:
     -> device: STRING(/dev/dm-0)
     -> mode: STRING(rw)
     -> mountpoint: STRING(/)
     -> type: STRING(ext4)
     -> state: STRING(free)
StartTimestamp: 2022-09-26 10:04:08 +0000 UTC
Timestamp: 2022-10-03 14:03:47.461611613 +0000 UTC
Value: 321101488128
NumberDataPoints #2
Data point attributes:
     -> device: STRING(/dev/dm-0)
     -> mode: STRING(rw)
     -> mountpoint: STRING(/)
     -> type: STRING(ext4)
     -> state: STRING(reserved)
StartTimestamp: 2022-09-26 10:04:08 +0000 UTC
Timestamp: 2022-10-03 14:03:47.461611613 +0000 UTC
Value: 25097928704
        {"kind": "exporter", "data_type": "metrics", "name": "logging"}
^C2022-10-03T16:03:49.075+0200  info    service/collector.go:192        Received signal from OS {"signal": "interrupt"}
2022-10-03T16:03:49.075+0200    info    service/service.go:138  Starting shutdown...
2022-10-03T16:03:49.075+0200    info    pipelines/pipelines.go:118      Stopping receivers...
2022-10-03T16:03:49.075+0200    info    pipelines/pipelines.go:125      Stopping processors...
2022-10-03T16:03:49.075+0200    info    pipelines/pipelines.go:132      Stopping exporters...
2022-10-03T16:03:49.075+0200    info    extensions/extensions.go:56     Stopping extensions...
2022-10-03T16:03:49.075+0200    info    service/service.go:152  Shutdown complete.

And the output of df to compare (I've only included the output for the root filesystem /):

$ df | grep '/$'
Filesystem                  1K-blocks      Used Available Use% Mounted on
/dev/mapper/nvme0n1p4_crypt 481054288 142989052 313555540  32% /
jspaleta commented 1 year ago

ah sorry i missed the state attribute

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

andrzej-stencel commented 1 year ago

Given my comment above, I believe this issue can be closed.