newrelic / nri-perfmon

Windows Perfmon / WMI On-Host Integration for New Relic Infrastructure
Other
14 stars 18 forks source link

Exceptions reported in EventLog for "ASP.NET Applications" category #23

Closed brunotag closed 4 years ago

brunotag commented 4 years ago

https://github.com/newrelic/nri-perfmon/pull/17/commits/b01decc1d17770153bbb8dfe44da2e58e89dbe75 seems to cause exceptions for the "ASP.NET Applications" category :-(

Only version https://github.com/newrelic/nri-perfmon/releases/tag/0.5.1 seems to be affected.

EventCode = 0;
EventIdentifier = 0;
Logfile = "Application";
RecordNumber = 32001402;
SourceName = "nri-perfmon";
TimeGenerated = "20200930005723.000000-000";
TimeWritten = "20200930005723.000000-000";
Type = "Error";
EventType = 1;
Category = 0;
CategoryString = "None";
Message = "Exception occurred in processing next value of Perfmon Counter:
Category: ASP.NET Applications
Counter: Requests Total (WebSockets)
Message: Instance '_LM_W3SVC_49_ROOT' does not exist in the specified Category.
Trace:    at System.Diagnostics.CounterDefinitionSample.GetInstanceValue(String instanceName)
   at System.Diagnostics.PerformanceCounter.NextSample()
   at System.Diagnostics.PerformanceCounter.NextValue()
   at NewRelic.PerfmonPlugin.PollPerfCounters() in Z:\eclipse\nri-perfmon\nri-perfmon\Plugin.cs:line 510";
    InsertionStrings = {"Exception occurred in processing next value of Perfmon Counter:
Category: ASP.NET Applications
Counter: Requests Total (WebSockets)
Message: Instance '_LM_W3SVC_49_ROOT' does not exist in the specified Category.
Trace:    at System.Diagnostics.CounterDefinitionSample.GetInstanceValue(String instanceName)
   at System.Diagnostics.PerformanceCounter.NextSample()
   at System.Diagnostics.PerformanceCounter.NextValue()
   at NewRelic.PerfmonPlugin.PollPerfCounters() in Z:\eclipse\nri-perfmon\nri-perfmon\Plugin.cs:line 510"};
sschwartzman commented 4 years ago

@brunotag So you're saying this is attributable to the version you pushed? Do we need to roll it back or do you have an idea of what to fix it?

sschwartzman commented 4 years ago

Also @brunotag, can you run in verbose mode and send me the output?

brunotag commented 4 years ago

I can't replicate it locally (so can't send the verbose output easily), it only happens on certain servers, after some time.

Windows Server 2012 R2, IIS 8.5.

We could turn on verbose mode on such servers but that might take some time.

It seems to affect the "ASP.NET Applications" category only, so I suspect it is related, somehow, to trying to apply perf counters to AppDomain (instances) that don't exist anymore.

On Tue, 6 Oct 2020, 06:06 Seth Schwartzman, notifications@github.com wrote:

Also @brunotag https://github.com/brunotag, can you run in verbose mode and send me the output?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/newrelic/nri-perfmon/issues/23#issuecomment-703764206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJIYBCXFVDFG7C5LM7RM5DSJH4J7ANCNFSM4R7VBNWQ .

sschwartzman commented 4 years ago

So perhaps if we add some controls to catch such an exception and remove instances that don't exist when that happens?

brunotag commented 4 years ago

Yes, the exception is thrown here https://github.com/newrelic/nri-perfmon/blob/master/nri-perfmon/Plugin.cs#L510, and caught here https://github.com/newrelic/nri-perfmon/blob/master/nri-perfmon/Plugin.cs#L522, and the problem is that it clutters the logs.

I can't tell the type of the exception from the logs but I suspect it is an InvalidOperationException thrown by this piece of code from the .NET Framework https://referencesource.microsoft.com/#System/services/monitoring/system/diagnosticts/PerformanceCounterLib.cs,1569 . Again I can't reproduce it :-(

I thought about doing exactly what you are suggesting but I am not sure where to do it: the code where the exception is thrown loops on performance counters, and it "gets" the instance from the performance counter. I don't understand why / where / how it gets instance that don't exist anymore.

On Wed, 7 Oct 2020 at 04:05, Seth Schwartzman notifications@github.com wrote:

So perhaps if we add some controls to catch such an exception and remove instances that don't exist when that happens?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/newrelic/nri-perfmon/issues/23#issuecomment-704334280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJIYBFK34MFS4MVSSGMUGTSJMW5NANCNFSM4R7VBNWQ .

-- Bruno Tagliapietra bruno.tagliapietra@gmail.com 0226441495 +64226441495

sschwartzman commented 4 years ago

@brunotag I think I have fixed this issue. The code had no place where it would remove stale instances of counters, so to resolve I made that message a VERBOSE one (so not normally seen) and remove the offending counter from future executions. The whole list of counters is repopulated every time anyway so, if the instance reappears, it will show up again. I also reduced down some of the code in the Populate method. Commit: https://github.com/newrelic/nri-perfmon/commit/a07c698fe24a0aa93f62ddde4ecba729255f779d

Here's the release, if you want to try it out. Let me know and hopefully we can close this out. https://github.com/newrelic/nri-perfmon/releases/tag/0.5.2-alpha

sschwartzman commented 4 years ago

@brunotag ever have a chance to look at my fix? Can we close this out?

brunotag commented 4 years ago

@sschwartzman I just got the news that the fix seems to work, in the 0.5.2-alpha version, so I think we can close the issue :)