open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.9k stars 2.27k forks source link

Add available memory state for hostmetrics receiver #7417

Open rogercoll opened 2 years ago

rogercoll commented 2 years ago

The current Linux memory states provided by the hostmetricsreceiver might be not the most appropriate ones for an end user. The available memory states for Linux are:

[buffered, cached, free, slab_reclaimable, slab_unreclaimable, used]

All of those values are retrieved with gopsutil, which gets them from /proc/meminfo file.

A user might think that the free value represents the amount of memory that the system can use, but free (MemFree) is a count of how many pages are free in the buddy allocator, however there are a lot of ways pages can be returned to the buddy allocator in time of need. The latest is provided by the Linux kernel (>3.16) as MemAvailable in /proc/meminfo.

Can we get MemAvailable field from current states?

No, it is an independent field not computed with the current values. Gopsutils provides this value regardless the kernel version as it can be estimated from MemFree, Active(file), Inactive(file), and SReclaimable, as well as the "low" watermarks from /proc/zoneinfo.

Repercussion

The user will be able to know the actual available memory of the system.

It is already broken and quite difficult to achieve it as some of the memory state values are inclusive, for example the cached state. The assumption would be defined as:

total = buffered + cached + free + used + slab_reclaimable + slab_unreclaimable

But the used value is computed by gopsutils as: total - free - buffered - cached

Then:

total = buffered + cached + free + (total - free - buffered - cached) + slab_reclaimable + slab_unreclaimable

total = total + slab_reclaimable + slab_unreclaimable

The sum(state) will always be higher than the total.

(The following proposals do not solve the sum assumption, they only add more memory information for the user)

Proposal 1

Add a new memory state for Linux to provide the available memory. State name: avaialble

Proposal 2

Change used state for: total - available This proposal requires to add the memory.limit metric as we won't be able to compute the total memory from the states. Currently, we can get the total amount of memory with the following promQL query: scalar(sum without (state) (system_memory_usage{state =~ "free|used|cached|buffered"}))

jpkrohling commented 2 years ago

cc @dmitryax

bogdandrutu commented 2 years ago

/cc @dashpole I think you are using this as well

quentinmit commented 2 years ago

I don't think either of those proposals improves the problem with the values not adding up. I'd like to add proposal 3, fixing the set of states to exactly cover the memory on the system. If there are states that double-count memory, they should be separate metrics. Here are the non-overlapping states from Munin and how they calculate them from /proc/meminfo as an inspiration:

apps (MemTotal-MemFree-Buffers-Cached-Slab-PageTables-Percpu-SwapCached)
buffers (Buffers)
swap (SwapTotal-SwapFree)
cached (Cached-Shmem)
free (MemFree)
shmem (Shmem)
slab_cache (Slab)
swap_cache (SwapCached)
page_tables (PageTables)
per_cpu (Percpu)
vmalloc_used (VmallocUsed)

(I'm not sure if it makes sense to include swap; I'm guessing that with the current set of states we're not including swap? You could leave that out and I think the remaining states would represent total physical memory.)

I think the main difference between this and proposal 2 is that here we're changing the definition of used (what Munin calls apps) instead of changing the definition of free.

rogercoll commented 2 years ago

@quentinmit I really like your proposal to solve the sum assumption, it provides much more information to the user and we prevent having overlapping states.

Tested without the Swap states as they are covered in the paging scraper and the sum usage is correct. Nonetheless, I have a couple of concerns:

In my opinion, the available or non-available state is very useful metric when defining alerts. The metric used (or apps) should not be understood as non-available memory, usually, the latest is quite higher.

rogercoll commented 2 years ago

Proposal 3

Another possible solution would be to add a new memory metric only for this purpose, as making any change on the current states would break the sum assumption.

The memory system scrapper will generate the following metrics:

- system.memory.usage: state attributes
- system.memory.utilization: disabled by default, state attributes
- system.memory.available: no state attributes

I would appreciate any opinion on this

rogercoll commented 2 years ago

Whether using the used state memory or the total - available memory can be truly significant in alerting systems. I made a small test to check the average variation between both values while stressing the memory:

image

The average difference is around 7/8% . By providing both values we let the user choose whatever it's best.

TylerHelmuth commented 2 years ago

Another possible solution would be to add a new memory metric only for this purpose

Is this the same as your proposed solution 1?

I like the idea of adding a new metric; allows those who currently rely on used to continue to use it, while providing a more accurate measure for those who want it.

rogercoll commented 2 years ago

@TylerHelmuth No, thanks for pointing that out. It would be a new proposal, let me update the comment.

Proposals 1 and 2 are based on the "state" attribute used in usage and utilization metrics. Proposal 3 won't do any modification on the state neither the current metrics, instead, add a new one.

dmitryax commented 2 years ago

Adding another metric system.memory.available maybe be a good solution to one the problems described in this issue. But it's still unclear how to solve the total mismatch issue and also adding this metric brings confusion to end user questioning what's the difference between system.memory.available and system.memory.usage.

Potentially, we can solve both problems by introducing a concept of optional metric attribute values to the metrics builder. The attributes values can be enabled or disabled in metadata.yaml (default) and user configs. If attribute value is disabled, it means that datapoints with this attribute value won't be emitted.

With this functionality added to the metrics builder, we will have the following sets of state attribute values in this receivers:

What do you think?

@bogdandrutu @djaglowski please share your opinion on the additional feature to the metrics builder

djaglowski commented 2 years ago

I think we should aim for a design that adheres to the specification's aggregation expectations (ie. data points sum to a meaningful total). Supporting metrics that violate these expectations seems like the wrong approach to me.

There seem to be two layers to the problem here:

  1. What is the correct data model for this situation?
  2. How do we migrate from a flawed data model to the correct one?

I don't have a strong opinion on the first question, but in regards to the second, I think we basically have to decide whether to fix it, or to deprecate and replace it. The right approach depends on stability level. In my opinion, a per-metric stability level is necessary in order to have a clear approach (eg. alpha -> beta -> stable -> deprecated). If this metric is less than stable, then we should fix it. If it is "stable" but realized to be fundamentally flawed in its design, we should deprecate it and add one or more new metrics that amount to a more accurate model.

dmitryax commented 2 years ago

I think we should aim for a design that adheres to the specification's aggregation expectations (ie. data points sum to a meaningful total). Supporting metrics that violate these expectations seems like the wrong approach to me.

That's true, we should always aim to have meaningful total values. Proposing optional attribute values, I imply that the sum of attribute values enabled by default should represent the most reasonable total value. At the same time, I can think of use cases when user wants to disable some attributes values or add others optional ones and have their own custom meaningful sums out of them. In this case, moving slab_reclaimable and slab_unreclaimable to a different metric to get correct total value would significantly reduce flexibility of using them because they are part of the same memory usage metric.

quentinmit commented 2 years ago

I think it's okay to keep SlabReclaimable and SlabUnreclaimable in the regular summable metric; their sum is just Slab. So you can include either slab or slab_reclaimable and slab_unreclaimable but not both. Other states are the problematic ones.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

lootek commented 1 year ago

I think this issue is still valid and worth addressing (@dmitryax) as we're clearly missing system.memory.available metrics.

And so I'd like to reopen the discussion as to me it seems like adding system.memory.available is what all of us agree to do (especially if it's just a matter of using a value already exposed by gopsutil).

As for fixing the flawed sum, I like this approach:

With this functionality added to the metrics builder, we will have the following sets of state attribute values in this receivers:

  • Enabled by default: buffered, cached, free and used. Sums up to a correct total value.
  • Disabled by default: slab_reclaimable, slab_unreclaimable and available. Optional values that user can enable.
github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

lootek commented 1 year ago

@fatsheep9146 maybe I could take this one and try implementing those changes?

fatsheep9146 commented 1 year ago

@lootek It's yours!

lootek commented 1 year ago

@fatsheep9146 That's great, thanks! I'll followup soon :)

lootek commented 1 year ago

I have a draft PR prepared with the available memory state attribute added, yet this is for now the implementation of the simplest solution that just adds this state.

As for making some of the attributes optional, since I'm new to the codebase and don't yet have a full grasp of it I'd love to hear feedback from you and ask you for some initial guidance. I spent quite a long time staring at the metadata.yaml file and a few solutions came to my mind yet none of those I'm happy with... Particularly because I was focused on a solution that would be fully backward-compatible but all of those required some changes to mdatagen by at least adding another field aside enum. The one I can see that could require changes only to the memoryscraper itself was Inspired by filters available in eg. filesystemscraper configuration. Eventually, I landed in the file containing AttributesMatcher implementation - do you think it would be any useful here?

If the changes to mdatagen turn out to be inevitable, to my understanding the desired order of work would be to first do those in a separate PR and then issue another PR addressing the particular case of available memory using those changes?

// EDIT: I made a PR to the upstream repo out of this work as well:

lootek commented 1 year ago

It looks like some related work has been merged just a few hours ago:

Given that I wonder if we still need a special logic for grouping/choosing memory state attributes here. In case we do, I've prepared a PR (built on top of that previous one) that adds a config option for that - please take a look:

lootek commented 1 year ago

The discussion follows in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19149

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ItsLastDay commented 1 year ago

Hi folks. I'm interested in having available memory. Below I summarize my understanding of what is done, which decisions are made and which are still pending approval, and what can be the next steps.

What is done

Mainly, discussions on two issues: A. The sum of system.memory.usage metric's state attribute values do not sum to "total memory of the system". B. How to start exposing Linux Kernel's available memory to users?

My impression is that these issues can be solved independently. Hence, I'll describe them in detail separately.

Issue A: sum over state is bad

The problematic states are slab_reclaimable and slab_unreclaimable: [1]. Because of them, the sum over state attribute is higher than the total memory. The "slab" is a term internal to Linux Kernel: [2].

dmytryax@ proposed having slab_... states disabled by default. lootek@ implemented this proposal in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19149.

The implementation sparkled more discussion: e.g. https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19149#discussion_r1134025928 by djaglowski@. He pointed out that if we write down all the mathematical relationships between the memory states, we'll be able to converge faster to a correct data model. The implementation was rejected (or abandoned - either way, it's not merged).

As a result, another proposal took precedence: introduce a new metric system.memory.slab, and deprecate the slab states from system.memory.usage. rogercoll@ created an issue to add such metric to the specification first: https://github.com/open-telemetry/semantic-conventions/issues/531. The issue got no comments since March.

Issue B: expose available memory from Linux Kernel'

rogercoll@ showed that available is really a different state: [3]. He also showed that available is unique to Linux Kernel: [4].

dmitryax@ proposed to add a separate metric system.linux.memory.available. He asked other maintainers for agreement before starting the implementation. This is where the story ends so far.

Next steps

Long-term

Based on the above, I think the main step is to understand the correct data model (quoting djaglowski@'s https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/7417#issuecomment-1115126721), i.e. to:

  1. Understand for which OS (operating systems) do we want to provide system.memory.usage metric
  2. Write down the memory states that we want to track, independent of the OS
    • currently we use gopsutil's nomenclature: code
    • my opinion on this: we really care about free and used. Everything else (like buffered) is OS-specific, and can go to OS-specific metrics. However, I don't know if that's a common practice in OTel semconv to split metrics by OS.
  3. Write down the mathematical relationships between those states, like in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19149#discussion_r1134025928
  4. Look at the actual OS from step number one. See how our states map to their states. Iterate our model further.

IMO, this will help us to build a long-term solution, which will be stable to library changes and OS changes. However, such solution might be hard to use: e.g. users would query system.memory.usage for general memory usage, and then query system.linux.memory.some_linux_specific_metric for details. I'm not sure I'm qualified enough to reason about this, because I haven't talked to any OTel customers. Tagging @dmitryax @braydonk from system semantic conventions WG.

Short-term

Alternatively, if we want to build a short-term solution, which still looks very pleasant, the next steps are:

For issue A:

  1. Approve https://github.com/open-telemetry/semantic-conventions/issues/531
  2. Implement the approved specification
  3. Deprecate the slab-related states in system.memory.usage

For issue B:

  1. Create a semconv proposal, similar to https://github.com/open-telemetry/semantic-conventions/issues/531, to add system.linux.memory.available
  2. Approve the proposal
  3. Implement it

As a result, both issues are solved.

[1] https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/7417#issue-1117282239 [2] https://github.com/open-telemetry/semantic-conventions/issues/531 [3] https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/7417#issuecomment-1109571183 [4] https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/19149#issuecomment-1467634451

ItsLastDay commented 1 year ago

I propose to walk the "short-term" path. I created https://github.com/open-telemetry/semantic-conventions/issues/257 to add system.linux.memory.available.

ItsLastDay commented 11 months ago

https://github.com/open-telemetry/semantic-conventions/issues/257 is done. Next step: I'll implement system.linux.memory.available in hostmetricsreceiver.

ItsLastDay commented 9 months ago

I implemented all steps for "Issue B" from https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/7417#issuecomment-1668082340.

So, answering the original question from this bug - "Can we get MemAvailable field from current states?": now yes. I propose to close this bug as done.

Feel free to open a separate thread for "Issue A" from https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/7417#issuecomment-1668082340. I see that https://github.com/open-telemetry/semantic-conventions/issues/531 is still open, and IMO it's not critical.

github-actions[bot] commented 7 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

quentinmit commented 7 months ago

Feel free to open a separate thread for "Issue A" from #7417 (comment). I see that open-telemetry/semantic-conventions#531 is still open, and IMO it's not critical.

The linked semconv issue doesn't address this bug - it doesn't fix the free state to represent something meaningful, and it doesn't propose fixing anything about the existing metric. I think we should keep using this issue for that, given that most of the comments are about it.

github-actions[bot] commented 5 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

rogercoll commented 3 months ago

I would keep this issue open too, the existing memory metrics won't be aligned with the semantic conventions until the slab state is replaced.

SemConv progress update: https://github.com/open-telemetry/semantic-conventions/pull/1078

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

rogercoll commented 1 month ago

/unstale system.linux.memory.slab metric will be available in the next semconv release: https://github.com/open-telemetry/semantic-conventions/pull/1078