open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
247 stars 158 forks source link

Add system metrics reporting total memory capacity or clarify how to recover existing ones #127

Open mx-psi opened 1 year ago

mx-psi commented 1 year ago

Current system metrics cover usage of memory, paging/swap memory and filesystems, but we don't currently support the total capacity of any of these systems as a separate metric.

The following metric can't be recovered from the existing system metrics and would need to be added to support this:

Name Description Units Instrument Type Value Type Attribute Key Attribute Values
system.disk.limit Total memory available in the disk. By UpDownCounter Int64 device (identifier)

The following metric may be recovered from system.memory.usage, but the current description of the attribute values of state is insufficient to recover this.

Name Description Units Instrument Type Value Type Attribute Key Attribute Values
system.memory.limit Total memory available in the machine. Does not include paging/swap memory. By UpDownCounter Int64 n/a n/a ------------ ---------- ------------- ------------------------

The following metrics are available as the sum of used and free (and reserved for the file system one). They could be added as a convenience metric:

Name Description Units Instrument Type Value Type Attribute Key Attribute Values
system.filesystem.limit Total memory available in the disk. By UpDownCounter Int64 device (identifier)
state used, free, reserved
type ext4, tmpfs, etc.
mode rw, ro, etc.
mountpoint (path)
system.paging.limit Total paging/swap memory available. By UpDownCounter Int64 n/a n/a

Items for this issue:

This would be part of open-telemetry/opentelemetry-specification/issues/3556 if approved.

trask commented 1 year ago

check out https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#do-not-use-total

.limit is probably the closest existing convention: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#instrument-naming

mx-psi commented 1 year ago

Thanks, I changed all .total suffixes by .limit suffixes above :)

yevgentrukhin commented 1 year ago

Curious for next steps, will there be a PR for this change to semantic convention?

mx-psi commented 1 year ago

Curious for next steps, will there be a PR for this change to semantic convention?

@yevgentrukhin yes, this is part of the system semantic conventions WG roadmap and will be addressed before stabilization of system metrics. If you are interested in having this sooner, I am happy to review PRs related to this and you can join our weekly meeting to discuss if necessary (see here for details)

mx-psi commented 11 months ago

Clarify set of attribute values for system.memory.usage state and consider using system.memory.total

I have marked this one as done, since #89 we have the the total value for state which would be system.memory.limit

mx-psi commented 8 months ago

Discussed on January 18th System Semantic Conventions WG meeting, we don't consider this one to be a blocker for system metrics GA unless the new metrics affect existing metrics. @mx-psi to double check this.

rogercoll commented 1 week ago

@joaopgrassi Sorry, I think in the https://github.com/open-telemetry/semantic-conventions/pull/1356 PR I did not specify that it was partially solving this issue. I reckon system.filesystem.limit and system.paging.limit metrics still need to be added.