prometheus-community / windows_exporter

Prometheus exporter for Windows machines
MIT License
2.92k stars 703 forks source link

metrics windows_terminal_services_local_session_count missing in 0.26+ #1568

Closed pflesar closed 2 months ago

pflesar commented 3 months ago

Current Behavior

If collector terminal_services is enabled, metrics "windows_terminal_services_local_session_count" are not produced. Additionally, scraping takes unexpectedly long (10 sec in my case). There is no associated error in event log - msiexec finishes without issues and no logs are produced during service runtime.

Expected Behavior

Metrics "windows_terminal_services_local_session_count" are produced and the scrape takes a reasonable amount of time. In my case on 0.25.1 it took ~0.5 seconds with the same collectors enabled.

Steps To Reproduce

1. discard previous windows_exporter service, if there is any (i.e. using msiexec /x)
2. Admin Powershell: msiexec /i windows_exporter-0.27.0-amd64.msi ENABLED_COLLECTORS="[defaults],terminal_services,memory,process"
3. look for "windows_terminal_services_local_session_count" at http://localhost:9182/metrics - it's missing
4. in prometheus, check the scrape_duration_seconds metric for the particular instance - it's running into the timeout (in my case 10s)

Environment

windows_exporter logs

Microsoft-Windows-RestartManager 8/12/2024 2:41:13 PM Ending session 0 started 2024-08-12T12:41:11.894333100Z.
MsiInstaller 8/12/2024 2:41:13 PM Windows Installer installed the product. Product Name: windows_exporter. Product Version: 0.27.0. Product Language: 1033. Manufacturer: prometheus-community. Installation success or error status: 0.
MsiInstaller 8/12/2024 2:41:13 PM Product: windows_exporter -- Installation completed successfully.
MsiInstaller 8/12/2024 2:41:13 PM Ending a Windows Installer transaction: C:\Users\pflesar\Documents\Monitoring\windows_exporter-0.27.0-amd64.msi. Client Process Id: 7712.
windows_exporter 8/12/2024 2:41:13 PM ts=2024-08-12T12:41:13.031Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9182
windows_exporter 8/12/2024 2:41:13 PM ts=2024-08-12T12:41:13.031Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9182
windows_exporter 8/12/2024 2:41:13 PM ts=2024-08-12T12:41:13.029Z caller=exporter.go:265 level=info msg="Build context" build_context="(go=go1.22.6, platform=windows/amd64, user=runneradmin@fv-az1495-484, date=20240811-13:54:18, tags=unknown)"
windows_exporter 8/12/2024 2:41:13 PM ts=2024-08-12T12:41:13.029Z caller=exporter.go:264 level=info msg="Starting windows_exporter" version="(version=0.27.0, branch=HEAD, revision=ca4ad46e2df498e0317d09bc2c037922cd879898)"
windows_exporter 8/12/2024 2:41:13 PM ts=2024-08-12T12:41:13.029Z caller=exporter.go:229 level=info msg="Enabled collectors: logical_disk, system, memory, process, cpu, physical_disk, net, os, service, terminal_services, cs"
windows_exporter 8/12/2024 2:41:13 PM ts=2024-08-12T12:41:13.029Z caller=exporter.go:222 level=info msg="Running as NT AUTHORITY\\SYSTEM"
windows_exporter 8/12/2024 2:41:12 PM ts=2024-08-12T12:41:12.952Z caller=service.go:102 level=warn collector=service msg="No where-clause specified for service collector. This will generate a very large number of metrics!"
Microsoft-Windows-RestartManager Starting session 0 - 2024-08-12T12:41:11.894333100Z.
windows_exporter 8/12/2024 2:41:12 PM Cannot create another system semaphore. 
MsiInstaller 8/12/2024 2:41:11 PM Beginning a Windows Installer transaction: C:\Users\pflesar\Documents\Monitoring\windows_exporter-0.27.0-amd64.msi. Client Process Id: 7712.

Anything else?

No response

jkroepke commented 3 months ago

There are any logs while calling http://localhost:9182/metrics ?

Could you post the output of http://localhost:9182/metrics ?

pflesar commented 3 months ago

No logs appear when calling the metrics page. No browser console errors (apart from missing favicon). I could post the output, but would have to redact it first (as it now contains full usernames).

However, I checked the recent commits; I think you removed the local_session_count metrics altogether while you were adding the session_info metrics: https://github.com/prometheus-community/windows_exporter/commit/7044b556c27d6ac2b97a08bea9620d5642d337ff#diff-8e34fa71b73f05033f914496ccfd57a8e6cb7dd4c88684a2a032c8c357e54823L103-L106

If thought it was superfluous to collect the number of sessions, while a list of sessions will be collected that can be counted by a query, then the documentation should be updated here: https://github.com/prometheus-community/windows_exporter/blob/master/docs/collector.terminal_services.md Although that would be a "breaking" change for someone who used the local_session_count metrics in alerts and dashboards.

jkroepke commented 3 months ago

Hi @pflesar

your are right. The changes I did where months ago, I couldn't fully remember about that change. I will adjust the documentation soon. You already mention the reason, why I removed that metric.

0.26 take a while to release and it includes tons of changes. The next releases will be shorted and the Release Notes will better inform about breaking changes.

At the moment, we may decide to do some breaking changes. We will stop doing breaking changes, once we receive the V1. I'm aware that this is may does not fit your satisfactions, but I understand your critic. I will better inform end-users about breaking changes in the coming version, create smaller releases and taking an eye of the documentation.

pflesar commented 3 months ago

I am perfectly fine to count the statistics from other metrics. I also don't mind breaking changes before full release. As long as its documented I think everyone should be satisfied.

Thank you

pflesar commented 3 months ago

Summary:

jkroepke commented 3 months ago

I had no idea, what the root cause of High scraping time was.

I saw

windows_exporter 8/12/2024 2:41:12 PM Cannot create another system semaphore.

in your logs which seems suspicious to me and could the root cause for that issue.

jkroepke commented 3 months ago

I reopen the issue to remember that I have to fix the docs.

pflesar commented 2 months ago

I had no idea, what the root cause of High scraping time was.

I saw

windows_exporter 8/12/2024 2:41:12 PM Cannot create another system semaphore.

in your logs which seems suspicious to me and could the root cause for that issue.

I unintentionally removed a firewall exception so the scraping was actually unsuccessful at the time I checked, with connection timeout being 10 seconds. Apologies. please feel free to ignore that.