Typeperf Results in Incorrect Counter Values

Certain conditions will cause the plugin to report bad values for the performance counter, or other unrelated performance counters.

For instance, the following counter will typically cause incorrect reporting for other counter values when included for reporting:

\HTTP Service Request Queues(*)\MaxQueueItemAge

Tested on multiple machines where there are multiple IIS sites configured, typeperf will return a mismatch on the number of counter names and counter values in the results.

Here is an example of the output from this typeperf command:

typeperf "\HTTP Service Request Queues(*)\MaxQueueItemAge"

"(PDH-CSV 4.0)","\\ABCD1234\HTTP Service Request Queues(???3)\MaxQueueItemAge","
\\ABCD1234\HTTP Service Request Queues(???2)\MaxQueueItemAge","\\ABCD1234\HTTP S
ervice Request Queues(???1)\MaxQueueItemAge"
"10/25/2013 11:00:39.984","-1","-1","-1","-1","-1","-1","-1","-1","-1","-1","-1"
,"-1","-1","-1","-1","-1","-1","-1","-1","0.000000","0.000000","0.000000"

typeperf is deciding not to report the name of the counters, who's values are -1, creating a name/value mismatch. The actual values for the three reported instances should be 0, not -1.

Running the perfmon gui, to verify the results, shows that there really are additional instance counters for each website, but that they are not reporting any performance counter data.

This will, at a minimum, cause the HTTP Service Request Queues instance values to incorrectly report as -1, when they are in fact 0. Worse, when multiple counters are included on the same thread, this can result in an incorrect value for the other counters on the thread, creating bad data that is unreliable.

For instance, start by setting the number of threads for the plugin to 1 in the perfmon_metrics.rb file (more than 1 thread obscures the issue, but it is still present).

typeperf  "\HTTP Service Request Queues(*)\MaxQueueItemAge"
"\Processor(0)\% Processor Time" -sc 1

"(PDH-CSV 4.0)","\\ABCD1234\HTTP Service Request Queues(???3)\MaxQueueItemAge","
\\ABCD1234\HTTP Service Request Queues(???2)\MaxQueueItemAge","\\ABCD1234\HTTP S
ervice Request Queues(???1)\MaxQueueItemAge","\\ABCD1234\Processor(0)\% Processo
r Time"
"10/25/2013 11:25:29.601","-1","-1","-1","-1","-1","-1","-1","-1","-1","-1","-1"
,"-1","-1","-1","-1","-1","-1","-1","-1","0.000000","0.000000","0.000000","3.664
721"

Because the plugin maps names and values based on index, the values for all of the counters are reported as -1, which is not correct for any of the reported counters.

The current workaround is not to specify all instances (*) for Http Service Request Queues, or to remove the MaxQueueItemAge counter entirely. This also means that every desired performance counter must be verified with typeperf to have a matching count of names and values before it is used, if the data is to be trusted.

Worse though, is that from time to time a null or -1 value is reported intermittently for other counters that are typically reliable, making it difficult to determine when a performance counter change is a result of a valid change, or from another misreporting counter.

Possible Solutions / Workarounds:

1 counter per thread, so as (at a minimum) not to skew other performance counters. This will hurt performance, so this should probably be configurable.
Throw out counters when name and value counts don't match. This will throw out additional, unrelated counters if there are still multiple counters per thread.
Throw out all values that are "-1". Optionally combine this with matching name/value counts. This probably needs additional test cases with other counters to determine the validity of this option.
Create a counter blacklist file for know unreliable counters, users have to remove an item from the blacklist file to opt into that counter, explicitly acknowledging the implications.
Possibly use another performance counter interface, such as powershell
Log anytime there is a counter name/value count mismatch, to help identify unreliable counters that could be falsely reporting for themselves or skewing results for other counters, so that they can be blacklisted or avoided.

nickfloyd / newrelic-perfmon-plugin

Typeperf Results in Incorrect Counter Values #11