influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

Windows Perfmon Counters - Plugin fails to start when dynamic instances are added for monitoring #9809

Closed muralinareddy closed 11 months ago

muralinareddy commented 3 years ago

Relevant telegraf.conf:

[[inputs.win_perf_counters]]

[[inputs.win_perf_counters.object]] ObjectName = "Connected Clients" Instances = ["*"] Counters = ["Messages Received"] Measurement = "client_connections"

System info:

telegraf-1.19.3_windows_amd64 Windows Server 2016 Standard Windows 10 Enterprise

Steps to reproduce:

ObjectName = "Connected Clients" Instances = ["*"] Counters = ["Messages Received"] Measurement = "client_connections"

In the above example, whenever a new client connects to my server, I create a dynamic perfmon instance to that client under the perfmon object "Connected Clients". Instance name includes client ip and the port it is connected on. As I learn clients when connected, I create a perfmon instance dynamically. I have counters to track messages sent, messages received, message queue size etc...

As I don't know the instance names, I am using the wild card option to get counters for all instances.

Telegraf input plugin fails to start when there are no connected clients, as it doesn't find any perfmon instance to pull the values. As it fails to start, it can't monitor any other counters.

Expected behavior:

As it is a wild card fetch, I expect Telegraf input plugin to ignore this error when there are no instances to monitor. Please note that, when I don't have perfmon object, it works fine. When I have perfmon object but no instances, it fails.

Actual behavior:

Telegraf input plugin fails to start.

Additional info:

E:\telegraf>E:\telegraf\telegraf.exe --config E:\telegraf\telegraf.conf --config-directory E:\telegraf\conf --test 2021-09-22T20:44:34Z I! Starting Telegraf 1.19.3 2021-09-22T20:44:35Z E! [inputs.win_perf_counters] Error in plugin: error while getting value for counter \Connected Clients(*)\Messages Received: The returned data is valid. 2021-09-22T20:44:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors

muralinareddy commented 3 years ago

Troubleshooting this I found that using wild cards expansion and counters refresh interval takes care of dynamic instances. inputs.conf: [inputs.win_perf_counters]] UseWildcardsExpansion = true CountersRefreshInterval = "1m” [[inputs.win_perf_counters.object]] …

Details:

Source file: telegraf\plugins\inputs\win_perf_counters.go

When UseWildcardsExpansion is set, all available counters are expanded and GetFormattedCounterValueDouble is called. When UseWildcardsExpansion is NOT set, GetFormattedCounterArrayDouble is called.

GetFormattedCounterArrayDouble

GetFormattedCounterValueDouble

When dynamic instances are not yet created, PdhGetFormattedCounterArrayDouble returns ERROR_SUCCESS on the first call itself. As per the current implementation, this is an Error condition so the plug in fails.

Fix:

We can either fix GetFormattedCounterArrayDouble to accept ERROR_SUCCESS on first call and treat this as no data condition or document to use wild cards expansion and counters refresh interval to take care of dynamic instances.

Suggestions?

srebhan commented 12 months ago

@muralinareddy can you please test the binary in PR #1424 and let me know if this fixes your issue?