opserver / Opserver

Stack Exchange's Monitoring System
https://opserver.github.io/Opserver/
MIT License
4.5k stars 827 forks source link

IsVMHost incorrect on Windows Server 2016 #305

Closed KennethScott closed 6 years ago

KennethScott commented 6 years ago

Has anyone else had trouble with Opserver monitoring Windows Server 2016 - specifically the inability to poll CPU utilization statistics via the WMI call?

We pointed Opserver at a 2016 server for the first time yesterday and quickly noticed the missing stats. I've traced it down to the GetIsVMHost() call that's testing for the presence of the WMI class Win32_PerfRawData_HvStats_HyperVHypervisorLogicalProcessor to determine whether or not it's a VMHost. For some reason on our 2016 servers this class does actually exist (even though they aren't VM hosts - and there are no actual properties within the class) - and it causes him to run the wrong WMI query to get the CPU stats.

Relevant code in WmiDataProvider.Polling.cs - PollCpuUtilizationAsync():

 private async Task PollCpuUtilizationAsync()
 {
     var query = IsVMHost
         ? "SELECT Name, Timestamp_Sys100NS, PercentTotalRunTime FROM Win32_PerfRawData_HvStats_HyperVHypervisorLogicalProcessor WHERE Name = '_Total'"
         : "SELECT Name, Timestamp_Sys100NS, PercentProcessorTime FROM Win32_PerfRawData_PerfOS_Processor WHERE Name = '_Total'";

     var property = IsVMHost
         ? "PercentTotalRunTime"
         : "PercentProcessorTime";

     using (var q = Wmi.Query(Endpoint, query))

If I force the IsVMHost property to false, I get the stats just fine.

I'm curious if there's something about the way our 2016 servers are being setup that's causing this, or if this is some anomaly with 2016 itself? 'Hoping someone else has tried a 2016 server.

UPDATE: For now I've made a small change in Wmi.cs - ClassExists() to ensure the result of the query is also non-null instead of just relying on an exception to be thrown (original line commented):

  internal static async Task<bool> ClassExists(string machineName, string @class, string wmiNamespace = defaultWmiNamespace)
  {
      // it's much faster trying to query something potentially non existent and catching an exception than to query the "meta_class" table.
      var query = $"SELECT * FROM {@class}";

      try
      {
          using (var q = Query(machineName, query, wmiNamespace))
          {
              //await q.GetFirstResultAsync().ConfigureAwait(false);
              var data = await q.GetFirstResultAsync().ConfigureAwait(false);
              if (data == null)
                  return false;
          }
      }
      catch
      {
          return false;
      }

      return true;
  }

It seems to do the trick, and a reasonable change - I'm just really curious if this a shop-specific problem.

Thanks-

KennethScott commented 6 years ago

@NickCraver, what do you think? Would you be open to this null check? I can do a PR if you're not opposed. Thanks-

NickCraver commented 6 years ago

I'm not as familiar with WMI - I'm assuming null is distinctly different from an empty list here? I'm definitely okay with adding this if so. You can simplify quite a bit though:

  internal static async Task<bool> ClassExists(string machineName, string @class, string wmiNamespace = defaultWmiNamespace)
  {
      // it's much faster trying to query something potentially non existent and catching an exception than to query the "meta_class" table.
      var query = $"SELECT * FROM {@class}";
      try
      {
          using (var q = Query(machineName, query, wmiNamespace))
          {
              return (await q.GetFirstResultAsync().ConfigureAwait(false)) != null;
          }
      }
      catch
      {
          return false;
      }
  }
KennethScott commented 6 years ago

Yeah I think it's really about the way GetFirstResultAsync() is coded to try to get the FirstOrDefault ManagementObject, but there are none returned so it translates to null in this particular scenario.

public async Task<dynamic> GetFirstResultAsync()
{
    ManagementObject obj;
    using (MiniProfiler.Current.CustomTiming("WMI", _rawQuery, _machineName))
        obj = (await Result.ConfigureAwait(false)).Cast<ManagementObject>().FirstOrDefault();
    return obj == null ? null : new WmiDynamic(obj);
}

To be honest, it's really strange to me why the class exists in 2016, but isn't populated. But I think it's just an unexpected scenario based on the way this is coded really expecting an exception to be thrown. The null check seems like a pretty safe test to me.

I'll do the PR shortly and you can decide if you want to pull it in. Thanks again!

NickCraver commented 6 years ago

Fixed via #308 merged in - thank you!