opserver / Opserver

Stack Exchange's Monitoring System
https://opserver.github.io/Opserver/
MIT License
4.51k stars 828 forks source link

Cpu usage for SQL Server showing wrong values #372

Open maurmun opened 4 years ago

maurmun commented 4 years ago

The CPU graph is showing wrong values, which are always higher than the normal values. This is the way how values are calculated inside the RING_BUFFER_SCHEDULER_MONITOR record of the sys.dm_os_ring_buffers. in the following query

Select /* SQL\SQLInstance.CPUHistory.cs@16 */
 DateAdd(s, (timestamp - (osi.cpu_ticks / Convert(Float, (osi.cpu_ticks / osi.ms_ticks)))) / 1000, GETDATE()) AS EventTime,
       Record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') as SystemIdle,
       Record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') as ProcessUtilization,
       Record.value('(./Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization)[1]', 'int') as MemoryUtilization,
  From (Select timestamp, 
               convert(xml, record) As Record 
          From sys.dm_os_ring_buffers 
         Where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
           And record Like '%<SystemHealth>%') x
        Cross Join sys.dm_os_sys_info osi
Order By timestamp

You should use 100 -Record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') to get correct values.

This discrepancy is more evident with servers with more NUMA Nodes/Cores

NickCraver commented 4 years ago

I'm unclear what the bug is here: the SQL tab is showing how much CPU SQL Server itself is using, but this calculation would be how much the machine is using, which is not the same thing (and not useful for multiple instances on the same machine, for instance). Is the mismatch in expectations here?

maurmun commented 4 years ago

Well, the RING_BUFFER_SCHEDULER_MONITOR record returns wrong values for the ProcessUtilization as the number of cores/logical processor/sockets goes higher or the cores per socket goes higher. It's clear that it reports the CPU usage of SQL Server and not the whole box, but oh high-end machines it's always higher (and of a great degree) than the real usage; this value is correct only on 1 core or 1 socket boxes. Showing this value in the SQL tab of the dashboard could lead to wrong assumptions (in the case a non SQL expert is looking into it). On a 56 core box for example You can see (correctly) the CPU Usage in the Dashboard (collected with WMI) to be around 5% while the SQL tab could show values as high as 20%. On a high-end box with 192 cores running on an average of 20% that query show values also in the range of 75%-90%.

maurmun commented 4 years ago

https://support.solarwinds.com/SuccessCenter/s/article/CPU-utilization-is-not-displayed-properly-in-DPA