prometheus-community / windows_exporter

Prometheus exporter for Windows machines
MIT License
2.88k stars 688 forks source link

Process memory metrics max out at 4294967295 bytes? (equal to 2^32 - 1 bytes or roughly 4.29 GB) #955

Closed parmsib closed 6 months ago

parmsib commented 2 years ago

Hello!

I've had issues with some process metrics that seem to not be able to go above the value 4294967295 despite what they're supposed to represent doing so.

One of these metric is this one: windows_process_virtual_bytes. 4294967295 B is roughly 4.29 GB, which is a lot less than the ~30 GB a process was actually using, according to the task manager's reported commit size for it.

From doing some googling, Prometheus gauges should be 64 bit floating point values, so representing values larger than an unsigned 32 bit integer should not be an issue. The windows OS is Windows Server 2019 (64 bit).

Is this expected behavior?

Is there some configuration I'm missing, or could it be a limitation in WMI?

Here's a screenshot of when a memory spike of a process looks like it plateaus on 4.29GB, when it in fact continues to upwards of 30 GB. image

Thankful for any help!

breed808 commented 2 years ago

Prometheus gauges do indeed use float64 values, though this is cast from an int64 in the perflib_exporter library during collection.

Could you:

1) Run the Perflib exporter directly to see if the 2^32 limit is experienced here? 2) Run Get-Counter -Counter '\Process(*)\Virtual Bytes and confirm if displayed values exceed 2^32?

parmsib commented 2 years ago
  1. Run the Perflib exporter directly to see if the 2^32 limit is experienced here?

    Not sure exactly what you mean by perflib exporter.

  2. Running Get-Counter -Counter '\Process(*)\Virtual Bytes gives e.g the following, which is a lot higher than 2^32.

    \\process()\virtual bytes : 2211221737472

    The same process shows the following from windows_exporter's /health endpoint

    windows_process_virtual_bytes{creating_process_id="652",process="",process_id="2832"} 4.294967295e+09

breed808 commented 2 years ago

Hmm, I can't reproduce this one. I've spun up a testing Windows 11 VM and run both perflib_exporter and windows_exporter (master branch), and am seeing values greater than 4294967295 for both:

/prometheus $ promtool query instant http://localhost:9090 'count(perflib_process_virtual_bytes{name = "svchost"} > 4294967295)' && promtool query instant http://localhost:9090 'count(windows_process_virtual_bytes{process = "svchost"} > 4294967295)'
{} => 76 @[1647511876.86]
{} => 76 @[1647511876.89]

/prometheus $ promtool query instant http://localhost:9090 'topk(10, windows_process_virtual_bytes{process = "svchost"} > 4294967295)'
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="1788"} => 2207793954816 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="6700"} => 2203586228224 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="3140"} => 2203523735552 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="7772"} => 2203513102336 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="3100"} => 2203480330240 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="5068"} => 2203472465920 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="4536"} => 2203468918784 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="812"} => 2203458842624 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="6572"} => 2203456008192 @[1647512085.319]
windows_process_virtual_bytes{creating_process_id="656", instance="xx.xx.xx.26:9182", job="windows", process="svchost", process_id="7144"} => 2203449651200 @[1647512085.319]

Is this occurring for all devices that have windows_exporter installed?

parmsib commented 2 years ago

First of all, sorry for getting back to you so slowly on this. I really appreciate the help.

This is occurring for all our devices with the windows_exporter installed. There are, however, other metrics from the same exporter (e.g the windows_logical_disk_read_bytes_total metric with a value of 508505029259264) which don't have this issue.

Running the perflib_exporter without any arguments gave me a bunch of errors about duplicate metrics related to usb when visiting its /metrics endpoint. I managed to limit it to only the process metrics (I think) by running .\perflib_exporter-0.1.0-amd64.exe --perflib.objects=230. Then the /metrics endpoint gave me, among others, the following value: perflib_process_private_bytes{creating_process_id="644",name="<redacted>",process_id="7688"} 4.7055732736e+10 which is well beyond 2^32.

Some more examples with a different metric.

windows_process_virtual_bytes{creating_process_id="3356",process="firefox",process_id="3080"} 4.294967295e+09
windows_process_virtual_bytes{creating_process_id="3356",process="firefox",process_id="4824"} 4.294967295e+09
windows_process_virtual_bytes{creating_process_id="3356",process="firefox",process_id="5480"} 4.294967295e+09
windows_process_virtual_bytes{creating_process_id="3356",process="firefox",process_id="5600"} 4.294967295e+09
windows_process_virtual_bytes{creating_process_id="3356",process="firefox",process_id="7520"} 4.294967295e+09
windows_process_virtual_bytes{creating_process_id="3356",process="firefox",process_id="9204"} 4.294967295e+09
perflib_process_virtual_bytes{creating_process_id="3356",name="firefox",process_id="3080"} 2.205907861504e+12
perflib_process_virtual_bytes{creating_process_id="3356",name="firefox",process_id="4824"} 2.205924323328e+12
perflib_process_virtual_bytes{creating_process_id="3356",name="firefox",process_id="5480"} 2.20588660736e+12
perflib_process_virtual_bytes{creating_process_id="3356",name="firefox",process_id="5600"} 2.204014026752e+12
perflib_process_virtual_bytes{creating_process_id="3356",name="firefox",process_id="7520"} 2.205926486016e+12
perflib_process_virtual_bytes{creating_process_id="3356",name="firefox",process_id="9204"} 2.205918953472e+12
breed808 commented 2 years ago

Normally I'd be inclined to believe this could be an issue with a solitary Windows host, but if all your hosts are experiencing this behavior, there may indeed be a bug in the exporter (especially as the Perflib exporter isn't experiencing the issue).

At this stage I'd try running a debugger to track the value of the metric variable(s) to see exactly where the 2^32 value is being introduced. I'd recommend Delve if you're familiar with operating a debugger.

If not, I'm unsure of any further troubleshooting steps to follow at this time :disappointed:

parmsib commented 2 years ago

I'm familiar with debuggers but not at all with Go. I'll give debugging a shot, hoping it won't take me too long to understand the environment.

dved commented 11 months ago

Hello, I confirm this bug on bunch of our windows 2019 systems. Used latest windows_exporter 0.24 version. It seems like the bug is observed on all windows_process memory related metrics, like: windows_process_private_bytes, windows_process_virtual_bytes, windows_process_working_set_private_bytes, windows_process_working_set_peak_bytes, windows_process_working_set_bytes.

I went through delve debugging tool and my try to debug it if I understand how to start it properly (I mean i need to open go project with exporter source code and catch the point when memory related metrics are being captured) metrics browser proc exp screenshot

jkroepke commented 10 months ago

@parmsib @dved Can some of you validate, if windows_export is running in 32-bit or 64-bit?

https://www.tenforums.com/tutorials/60878-how-see-if-process-32-bit-64-bit-windows-10-a.html


I can reproduce the issue, if windows_exporter is running as 32-bit binary. Please check your logs

ts=2023-11-15T18:32:37.080Z caller=exporter.go:296 level=info msg="Build context" build_context="(go=go1.21.1, platform=windows/386, user=runneradmin@fv-az282-478, date=20230926-08:58:03, tags=unknown)

github-actions[bot] commented 7 months ago

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.