newrelic / nri-perfmon

Windows Perfmon / WMI On-Host Integration for New Relic Infrastructure
Other
13 stars 19 forks source link

events are not sent to newrelic for perfmon counters with no instances #36

Open ayounas opened 2 years ago

ayounas commented 2 years ago

Hello

We are monitoring few of the windows hypervisor servers using the nri-perfmon and discovered that metrics/event from few of the configured counters are not sent to newrelic. On further troubleshooting it seems if there are no instances for the counter then nri-perfmon wont parse the metric

e.g config below does not collect and send counters for Hyper-V Hypervisor

     {
        "provider":"PerfCounter",
        "category":"Hyper-V Hypervisor",
        "eventname":"Perfmon_HyperVHypervisor",
        "counters":[
          {
            "counter":"Logical Processors"
          },
          {
            "counter":"Partitions"
          },
          {
            "counter":"Virtual Processors"
          }
        ]
      },

The output below shows Name for the counter is blank (no instances) and i guess probably that's why the counter is not parsed

PS C:\temp> Get-CimInstance Win32_PerfFormattedData_HvStats_HyperVHypervisor

Caption                 :
Description             :
Name                    :
Frequency_Object        :
Frequency_PerfTime      :
Frequency_Sys100NS      :
Timestamp_Object        :
Timestamp_PerfTime      :
Timestamp_Sys100NS      :
HypervisorStartupCost   : 20599148
LogicalProcessors       : 48
ModernStandbyEntries    : 0
MonitoredNotifications  : 6
Partitions              : 4
PlatformIdleTransitions : 0
TotalPages              : 1619117
VirtualProcessors       : 60
PSComputerName          :

i tried using "instance":"*", "instance":"", Still no luck

Similarly no output from

      {
        "provider":"PerfCounter",
        "category":"Hyper-V Virtual Machine Health Summary",
        "eventname":"Perfmon_HyperVVirtualMachineHealthSummary",
        "counters":[
          {
            "counter":"Health Critical"
          },
          {
            "counter":"Health Ok"
          }
        ]
      }
PS C:\temp> Get-CimInstance Win32_PerfFormattedData_VmmsVirtualMachineStats_HyperVVirtualMachineHealthSummary

Caption            :
Description        :
Name               :
Frequency_Object   :
Frequency_PerfTime :
Frequency_Sys100NS :
Timestamp_Object   :
Timestamp_PerfTime :
Timestamp_Sys100NS :
HealthCritical     : 0
HealthOk           : 3
PSComputerName     :

All the counters with something in Name: or with instances work just fine

Seems similar to #18 and #17

Verbose Logs

C:\Program Files\New Relic\newrelic-infra\custom-integrations\nri-perfmon>nri-perfmon.exe -r -v  -c c:\temp\nrc.json
Thread-1 : nri-perfmon version 0.6.1.0 starting with options:
{
  "ConfigFile": "c:\\temp\\nrc.json",
  "PollingInterval": 10000,
  "RunOnce": true,
  "MachineName": "****",
  "UserName": "***",
  "DomainName": "**",
  "Password": "",
  "Verbose": true
}
Thread-1 : nri-perfmon counters:
[
  {
    "provider": "PerfCounter",
    "category": "Hyper-V Virtual Machine Health Summary",
    "instance": null,
    "counters": [
      {
        "counter": "Health Critical",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Health Ok",
        "attrname": "using_counter_name",
        "parser": ""
      }
    ],
    "query": null,
    "eventname": "Perfmon_HyperVVirtualMachineHealthSummary",
    "querytype": "wmi_query",
    "querynamespace": "root\\cimv2"
  },
  {
    "provider": "PerfCounter",
    "category": "Hyper-V Hypervisor",
    "instance": null,
    "counters": [
      {
        "counter": "Logical Processors",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Partitions",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Virtual Processors",
        "attrname": "using_counter_name",
        "parser": ""
      }
    ],
    "query": null,
    "eventname": "Perfmon_HyperVHypervisor",
    "querytype": "wmi_query",
    "querynamespace": "root\\cimv2"
  },
  {
    "provider": "PerfCounter",
    "category": "System",
    "instance": null,
    "counters": [
      {
        "counter": "Context Switches/sec",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Processes",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Processor Queue Length",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "System Calls/sec",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Threads",
        "attrname": "using_counter_name",
        "parser": ""
      }
    ],
    "query": null,
    "eventname": "Perfmon_System",
    "querytype": "wmi_query",
    "querynamespace": "root\\cimv2"
  }
]
Thread-1 : Running once and exiting.
Thread-1 : Polling time: 00:00:00.0156237

C:\Program Files\New Relic\newrelic-infra\custom-integrations\nri-perfmon>
sschwartzman commented 2 years ago

Hi @ayounas sorry for the delay. Thanks for all of that detail, it made it easy for me to re-create, as I have a Hyper-v rig to test it with. However, I actually got back the expected data, without making any changes to your json config stanzas.

Here's the results (excerpted from -v output):

Thread-1 : Metric output:
{
  "name": "HYPERV-HV-1",
  "protocol_version": "1",
  "integration_version": "0.1.0",
  "events": [],
  "inventory": {},
  "metrics": [
    {
      "event_type": "Hyper_V_Virtual_Machine_Health_Summary",
      "name": "",
      "HealthCritical": 0.0,
      "HealthOk": 2.0
    },
    {
      "event_type": "Hyper_V_Hypervisor",
      "name": "",
      "LogicalProcessors": 72.0,
      "Partitions": 1.0,
      "VirtualProcessors": 72.0
    }
  ]
}

Here is the config.json I used:

{
  "counterlist": [
    {
        "provider":"PerfCounter",
        "category":"Hyper-V Hypervisor",
        "eventname":"Perfmon_HyperVHypervisor",
        "counters":[
          {
            "counter":"Logical Processors"
          },
          {
            "counter":"Partitions"
          },
          {
            "counter":"Virtual Processors"
          }
        ]
    },
    {
        "provider":"PerfCounter",
        "category":"Hyper-V Virtual Machine Health Summary",
        "eventname":"Perfmon_HyperVVirtualMachineHealthSummary",
        "counters":[
          {
            "counter":"Health Critical"
          },
          {
            "counter":"Health Ok"
          }
        ]
    }
  ]
}

It's worth noting that, when you're using the PerfCounter functionality, "eventname" is ignored, that attribute is only used for the WMI queries functionality. It doesn't cause it to fail though, it just ignores that attribute and names events based on Category name, like you see in the results above.

Is it possible that you're running an older version of nri-perfmon? How about what OS and arch are you running on?

ayounas commented 2 years ago

Hello @sschwartzman thanks for looking into it, I checked the version and we are using the latest 0.6.1.0 I downloaded fresh copy from github and reinstalled for just in case but still same problem.

Versions:

Thread-1 : nri-perfmon version 0.6.1.0 starting with options:
Microsoft Windows [Version 10.0.14393]
Windows Server 2016
Version 1607 (OS Build 14393.4530)

Output using the config.json you pasted.

C:\Program Files\New Relic\newrelic-infra\custom-integrations\nri-perfmon>nri-perfmon.exe -r -v  -c c:\temp\config.json
Thread-1 : nri-perfmon version 0.6.1.0 starting with options:
{
  "ConfigFile": "c:\\temp\\config.json",
  "PollingInterval": 10000,
  "RunOnce": true,
  "MachineName": "redacted",
  "UserName": "redacted",
  "DomainName": "redacted",
  "Password": "",
  "Verbose": true
}
Thread-1 : nri-perfmon counters:
[
  {
    "provider": "PerfCounter",
    "category": "Hyper-V Hypervisor",
    "instance": null,
    "counters": [
      {
        "counter": "Logical Processors",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Partitions",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Virtual Processors",
        "attrname": "using_counter_name",
        "parser": ""
      }
    ],
    "query": null,
    "eventname": "Perfmon_HyperVHypervisor",
    "querytype": "wmi_query",
    "querynamespace": "root\\cimv2"
  },
  {
    "provider": "PerfCounter",
    "category": "Hyper-V Virtual Machine Health Summary",
    "instance": null,
    "counters": [
      {
        "counter": "Health Critical",
        "attrname": "using_counter_name",
        "parser": ""
      },
      {
        "counter": "Health Ok",
        "attrname": "using_counter_name",
        "parser": ""
      }
    ],
    "query": null,
    "eventname": "Perfmon_HyperVVirtualMachineHealthSummary",
    "querytype": "wmi_query",
    "querynamespace": "root\\cimv2"
  }
]
Thread-1 : Running once and exiting.
Thread-1 : Polling time: 00:00:00.0156255
sschwartzman commented 2 years ago

So what can you tell me about your env? OS, arch, user running nri-perfmon.exe, etc?

Also, can you try running without the "run-once" flag? Maybe that's the issue, it's a relatively new addition to nri-perfmon so I want to ensure it's not that.

Update: I tried with "-r" flag and got the same results as without it.

ayounas commented 2 years ago

Hello These are physical hosts running windows server 2016 (Version 1607 (OS Build 14393.4530) The windows 2016 servers are running Microsoft Hyper-V They are all domain joined. Nri Perfmon version we are running is : nri-perfmon version 0.6.1.0 Newrelic agent -: 1.18 I was running nri-perfmon using my own username which is a domain account and have full admin rights The cmd prompt I used to run the tests was launched as Administrator.

ayounas commented 2 years ago

Hello @sschwartzman Spent some more time today testing on different operating systems (server 2019, server 2022, server 2016) using latest 0.6.1 release but no luck, then i thought i will try version 0.5.1 because that was the first version with the fix to get metrics for counters without any instance. And with version 0.5.1 i was able to get the metrics for the counters without instance. Tried all the versions above 0.5.1 and same issue i.e. no metrics for counters without instance

Thread-1 : Metric output:
{
  "name": "<redacted>",
  "protocol_version": "1",
  "integration_version": "0.1.0",
  "events": [],
  "inventory": {},
  "metrics": [
    {
      "event_type": "Hyper_V_Virtual_Machine_Health_Summary",
      "name": "",
      "HealthOk": 3.0,
      "HealthCritical": 0.0
    },
    {
      "event_type": "Hyper_V_Hypervisor",
      "name": "",
      "LogicalProcessors": 48.0,
      "Partitions": 4.0,
      "TotalPages": 1619199.0,
      "VirtualProcessors": 60.0,
      "MonitoredNotifications": 6.0,
      "ModernStandbyEntries": 0.0,
      "PlatformIdleTransitions": 0.0,
      "HypervisorStartupCost": 20520132.0
    }
  ]
}
Thread-1 : Polling time: 00:00:01.4794740
Thread-1 : Sleeping for: 00:00:08.5205260
PS C:\temp\nri-perfmon0.5.1>

Seems all the version above 0.5.1 regressed the fix introduced in the 0.5.1 . Also with 0.5.1 metrics are not filtered and output shows every metric for the perfcounter instead of named metrics. The output yourself pasted shows only the filtered metrics. Will you please be able to have a look and see if the issue can be fixed? or let me know if you need more information from me

Eroriko commented 1 year ago

I'm just wondering if this is planned to get fixed, a long time has passed since this was reported. I tried ayounas findings and installed version 0.5.1 which solved the problem. But i would rather be in a more up to date version.