Closed tetesh closed 2 years ago
Hi,
I am not sure I follow, I see the following on my M1:
# HELP mem_active Telegraf collected metric
# TYPE mem_active gauge
mem_active{host="mbp"} 6.01956352e+09
# HELP mem_available Telegraf collected metric
# TYPE mem_available gauge
mem_available{host="mbp"} 9.091268608e+09
# HELP mem_available_percent Telegraf collected metric
# TYPE mem_available_percent gauge
mem_available_percent{host="mbp"} 52.918148040771484
# HELP mem_free Telegraf collected metric
# TYPE mem_free gauge
mem_free{host="mbp"} 8.527872e+08
# HELP mem_inactive Telegraf collected metric
# TYPE mem_inactive gauge
mem_inactive{host="mbp"} 8.238481408e+09
# HELP mem_total Telegraf collected metric
# TYPE mem_total gauge
mem_total{host="mbp"} 1.7179869184e+10
# HELP mem_used Telegraf collected metric
# TYPE mem_used gauge
mem_used{host="mbp"} 8.088600576e+09
# HELP mem_used_percent Telegraf collected metric
# TYPE mem_used_percent gauge
mem_used_percent{host="mbp"} 47.081851959228516
# HELP mem_wired Telegraf collected metric
# TYPE mem_wired gauge
mem_wired{host="mbp"} 1.386201088e+09
Those percent values look correct for my system.
Can you point out what you think should be different?
telegraf says that 90% of RAM is occupied, although this is not so.
I attached the output of the top
command above, where you can see that the memory is only half occupied
we have more than 500 imacs, and some of them get this bug from time to time
@tetesh in the same terminal, can you please get the output of vm_stat
and then run telegraf to collect the memory information using the following config and command:
[[inputs.mem]]
[[outputs.file]]
telegraf --config config.toml --once --debug
On macOS, Telegraf users the gopsutil library to find the following fields:
fields["active"] = vm.Active
fields["free"] = vm.Free
fields["inactive"] = vm.Inactive
fields["wired"] = vm.Wired
top:
PhysMem: 4329M used (1426M wired), 3861M unused.
metrics:
# TYPE mem_total gauge
mem_total{dc="denver-1",host="pr-h4.kzn.21-school.ru"} 8.589934592e+09
# HELP mem_used Telegraf collected metric
# TYPE mem_used gauge
mem_used{dc="denver-1",host="pr-h4.kzn.21-school.ru"} 7.3059328e+09
# HELP mem_used_percent Telegraf collected metric
# TYPE mem_used_percent gauge
mem_used_percent{dc="denver-1",host="pr-h4.kzn.21-school.ru"} 85.0522518157959
vm_stat and debug telegraf
[root] pr-h4 [~] # vm_stat
Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free: 98230.
Pages active: 522286.
Pages inactive: 217409.
Pages speculative: 893692.
Pages throttled: 0.
Pages wired down: 365059.
Pages purgeable: 5001.
"Translation faults": 25978118772.
Pages copy-on-write: 3394766164.
Pages zero filled: 3347016987.
Pages reactivated: 38870.
Pages purged: 27637.
File-backed pages: 1193587.
Anonymous pages: 439800.
Pages stored in compressor: 0.
Pages occupied by compressor: 0.
Decompressions: 0.
Compressions: 0.
Pageins: 1415887.
Pageouts: 9.
Swapins: 0.
Swapouts: 0.
[root] pr-h4 [~] #
[root] pr-h4 [~] # telegraf --config config.toml --once --debug
2022-02-14T15:55:17Z I! Starting Telegraf 1.17.2
2022-02-14T15:55:17Z D! [agent] Initializing plugins
2022-02-14T15:55:17Z D! [agent] Connecting outputs
2022-02-14T15:55:17Z D! [agent] Attempting connection to [outputs.file]
2022-02-14T15:55:17Z D! [agent] Successfully connected to outputs.file
2022-02-14T15:55:17Z D! [agent] Starting service inputs
2022-02-14T15:55:17Z D! [agent] Stopping service inputs
2022-02-14T15:55:17Z D! [agent] Input channel closed
2022-02-14T15:55:17Z I! [agent] Hang on, flushing any cached metrics before shutdown
mem,host=pr-h4.kzn.21-school.ru wired=1495707648i,used=7311187968i,available_percent=14.88656997680664,active=2153500672i,inactive=890507264i,total=8589934592i,available=1278746624i,used_percent=85.11343002319336,free=388239360i 1644854118000000000
2022-02-14T15:55:17Z D! [outputs.file] Wrote batch of 1 metrics in 62.409µs
2022-02-14T15:55:17Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2022-02-14T15:55:17Z D! [agent] Stopped Successfully
The difference of opinion here is that top and gopsutil do not determine the used % the same way.
Here is the calc Telegraf does for used percentage:
used percent = used / total * 100
used percent = 7311187968 / 8589934592 * 100
used percent = 85.11343002319336
Now let's look at the vm_stat
output and I've multiplied the values by the page size and converted to Mb:
Pages free: 98230 402 Mb
Pages active: 522286 2139 Mb
Pages inactive: 217409 890 Mb
Pages speculative: 893692 3661 Mb
Pages wired down: 365059 1495 Mb
------------------------------------
Total: 2096676 8585 Mb
Some definitions:
Here is on way to calculate used percentage using vm_stat:
used percent = (total - free) / total * 100
used percent = (2096676 - 98230 - 217409) / 2096676 * 100
used percent = 94.95%
top
is reporting a different set of values:
PhysMem: 4150M used (1419M wired), 4042M unused.
My guess is they are using pages active + wired or active + inactive + wired.
The way memory is calculated can vary from one tool to the other depending on if different classifications are included in the total. Consider looking at linuxatemyram.com.
As I said, these are using different ways to calculate free. As such I do not consider this a bug in Telegraf and will be closing this.
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.17.2 (git: HEAD 74011e22), MacOs Mojave 10.14.6
Docker
No response
Steps to reproduce
Expected behavior
I want to see real mem_used_percent indicators (90%)
Actual behavior
I see not real indicators, because no one uses the iMac and the top utility also issues:
Load Avg: 1.50, 1.02, 0.70 CPU usage: 0.32% user, 0.32% sys, 99.34% idle SharedLibs: 119M resident, 39M data, 28M linkedit. MemRegions: 14416 total, 1087M resident, 77M private, 560M shared.
PhysMem: 4150M used (1419M wired), 4042M unused.
VM: 802G vsize, 1372M framework vsize, 0(0) swapins, 0(0) swapouts. Networks: packets: 18793774/4189M in, 8271302/2404M out. Disks: 1902868/27G read, 11020419/85G written.Additional info
No response