microsoft / sqlmanagementobjects

Sql Management Objects, an API for scripting and managing SQL Server and Azure SQL Database
Other
127 stars 20 forks source link

ResourcePoolAffinityInfo is null #169

Open chadbaldwin opened 4 weeks ago

chadbaldwin commented 4 weeks ago

I'm really not sure if this is an issue with my instance or an issue with SMO but I'm happy to help with testing however I can. I'm assuming it's an issue with SMO because I'm receiving an exception for a situation I would expect to be handled within SMO in the case that this was normal to happen.

For some reason, out of 100 SQL Instances, just 1 Microsoft.SqlServer.Management.Smo.Server object is showing $server.ResourceGovernor.ResourcePools.ResourcePoolAffinityInfo as null for all ResourcePools. I have tried comparing every single resource governor related system view with all the instances that worked, and I see no differences that would indicate why it's null.

Here's the sample script I'm using:

$cs = 'Data Source=MYSERVERNAME;Database=master;Integrated Security=True;Encrypt=True;Trust Server Certificate=True;Application Name=DBADash;Multi Subnet Failover=True'
$cn = [Microsoft.Data.SqlClient.SqlConnection]::new($cs)
$sc = [Microsoft.SqlServer.Management.Common.ServerConnection]::new($cn)
$instance = [Microsoft.SqlServer.Management.Smo.Server]::new($sc)
$instance.ResourceGovernor.ResourcePools | select Name, ResourcePoolAffinityInfo

Which is returning:

Name     ResourcePoolAffinityInfo
----     ------------------------
default
internal

Whereas on all other instances I tested this against, I received:

Name     ResourcePoolAffinityInfo
----     ------------------------
default  Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo
internal Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo

The reason this is a problem for me is because an open source monitoring tool I use calls Microsoft.SqlServer.Management.Smo.ResourcePool.Script(). However, because ResourcePoolAffinityInfo is null, it is returning the following error:

Exception             : 
    Type           : System.Management.Automation.MethodInvocationException
    ErrorRecord    : 
        Exception             : 
            Type    : System.Management.Automation.ParentContainsErrorRecordException
            Message : Exception calling "Script" with "0" argument(s): "Script failed for  'default'. "
            HResult : -2146233087
        CategoryInfo          : NotSpecified: (:) [], ParentContainsErrorRecordException
        FullyQualifiedErrorId : FailedOperationException
        InvocationInfo        : 
            ScriptLineNumber : 6
            OffsetInLine     : 1
            HistoryId        : 114
            Line             : $instance.ResourceGovernor.ResourcePools[0].Script()
            Statement        : $instance.ResourceGovernor.ResourcePools[0].Script()
            PositionMessage  : At line:6 char:1
                               + $instance.ResourceGovernor.ResourcePools[0].Script()
                               + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            CommandOrigin    : Internal
        ScriptStackTrace      : at <ScriptBlock>, <No file>: line 6
    TargetSite     : 
        Name          : ConvertToMethodInvocationException
        DeclaringType : [System.Management.Automation.ExceptionHandlingOps]
        MemberType    : Method
        Module        : System.Management.Automation.dll
    Message        : Exception calling "Script" with "0" argument(s): "Script failed for  'default'. "
    Data           : System.Collections.ListDictionaryInternal
    InnerException : 
        Type             : Microsoft.SqlServer.Management.Smo.FailedOperationException
        SmoExceptionType : FailedOperationException
        Operation        : Script
        FailedObject     : [default]
        Message          : Script failed for  'default'. 
        HelpLink         : https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=17.100.18.0&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&EvtID=Script+ResourcePool&LinkId=20476
        TargetSite       : 
            Name          : EnumScriptImpl
            DeclaringType : [Microsoft.SqlServer.Management.Smo.SqlSmoObject]
            MemberType    : Method
            Module        : Microsoft.SqlServer.Smo.dll
        Data             : System.Collections.ListDictionaryInternal
        InnerException   : 
            Type       : System.ArgumentException
            Message    : An item with the same key has already been added. Key: 0
            TargetSite : 
                Name          : ThrowAddingDuplicateWithKeyArgumentException
                DeclaringType : [System.ThrowHelper]
                MemberType    : Method
                Module        : System.Private.CoreLib.dll
            Source     : System.Private.CoreLib
            HResult    : -2147024809
            StackTrace : 
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at Microsoft.SqlServer.Management.Smo.NumaNodeCollection.get_NumaCollectionFromId()
   at Microsoft.SqlServer.Management.Smo.NumaNodeCollection.GetByID(Int32 numanodeId)
   at Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo.SetSchedulerValues()
   at Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo.PopulateDataTable()
   at Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo..ctor(ResourcePool parent)
   at Microsoft.SqlServer.Management.Smo.ResourcePool.get_ResourcePoolAffinityInfo()
   at Microsoft.SqlServer.Management.Smo.ResourcePool.GetAllParams(StringBuilder sb, ScriptingPreferences sp, Int32& count)
   at Microsoft.SqlServer.Management.Smo.ResourcePool.ScriptCreate(StringCollection queries, ScriptingPreferences sp)
   at Microsoft.SqlServer.Management.Smo.SqlSmoObject.ScriptCreateInternal(StringCollection query, ScriptingPreferences sp, Boolean skipPropagateScript)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptCreateObject(Urn urn, ScriptingPreferences sp, ObjectScriptingType& scriptType)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptCreate(Urn urn, ScriptingPreferences sp, ObjectScriptingType& scriptType)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptCreateObjects(IEnumerable`1 urns)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptUrns(List`1 orderedUrns)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.DiscoverOrderScript(IEnumerable`1 urns)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptWorker(List`1 urns, ISmoScriptWriter writer)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.Script(SqlSmoObject[] objects, ISmoScriptWriter writer)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.Script(SqlSmoObject[] objects)
   at Microsoft.SqlServer.Management.Smo.SqlSmoObject.EnumScriptImplWorker(ScriptingPreferences sp)
   at Microsoft.SqlServer.Management.Smo.SqlSmoObject.EnumScriptImpl(ScriptingPreferences sp)
        Source           : Microsoft.SqlServer.Smo
        HResult          : -2146233088
        StackTrace       : 
   at Microsoft.SqlServer.Management.Smo.SqlSmoObject.EnumScriptImpl(ScriptingPreferences sp)
   at Microsoft.SqlServer.Management.Smo.SqlSmoObject.ScriptImpl(ScriptingPreferences sp)
   at Microsoft.SqlServer.Management.Smo.SqlSmoObject.ScriptImpl()
   at CallSite.Target(Closure, CallSite, Object)
    Source         : System.Management.Automation
    HResult        : -2146233087
    StackTrace     : 
   at System.Management.Automation.ExceptionHandlingOps.ConvertToMethodInvocationException(Exception exception, Type typeToThrow, String methodName, Int32 numArgs, MemberInfo memberInfo)
   at CallSite.Target(Closure, CallSite, Object)
   at System.Dynamic.UpdateDelegates.UpdateAndExecute1[T0,TRet](CallSite site, T0 arg0)
   at System.Management.Automation.Interpreter.DynamicInstruction`2.Run(InterpretedFrame frame)
   at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
FullyQualifiedErrorId : FailedOperationException
InvocationInfo        : 
    ScriptLineNumber : 6
    OffsetInLine     : 1
    HistoryId        : 114
    Line             : $instance.ResourceGovernor.ResourcePools[0].Script()
    Statement        : $instance.ResourceGovernor.ResourcePools[0].Script()
    PositionMessage  : At line:6 char:1
                       + $instance.ResourceGovernor.ResourcePools[0].Script()
                       + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CommandOrigin    : Internal
ScriptStackTrace      : at <ScriptBlock>, <No file>: line 6

My assumption here is that if it is possible for ResourcePoolAffinityInfo to be null, then I would expect SMO to be aware of that situation and would check for null prior to running Script(). But since it is not checking for that situation, I assume this is a bug, or at the very least, an unhandled exception.

shueybubbles commented 4 weeks ago

thx for opening an issue. Why would a tool try to script a system pool for "Create"? SSMS only enables Script for Alter on those pools. The exception indicates there are 2 rows in a catalog view with the same id.

I think the query it ran to populate the collection is constructed based on src/Microsoft/SqlServer/Management/SqlEnum/xml/ResourcePoolAffinityInfo.xml

See if that instance has more than one row in dm_os_nodes with the same memory_node_id and processor_group.

You could find the exact query being run by looking at profiler or using xperf

xperf -start trace -f myTrace.etl -on *Microsoft.Data.SqlClient.EventSource
chadbaldwin commented 4 weeks ago

The monitoring tool does exclude system pools using if (pool.IsSystemObject && pool.ID == 1) continue;, but I left that out in my test script because it didn't seem to make a difference when attempting to recreate the issue as this still happens on Default.

I'm not sure how to use xperf, but I'll try to figure that out, in the meantime I ran that query from the XML file, this is what it returned on the problematic instance:

EDIT: The results below were removed due to being incorrect. They are actually from a non-problematic instance. Query was run on wrong instance by mistake. See: https://github.com/microsoft/sqlmanagementobjects/issues/169#issuecomment-2294448475

EDIT: REMOVED

That said, I also ran it on a non-problematic instance, and received very similar results:

| cpu_affinity_mask | online_scheduler_mask | processor_group | node_state_desc | memory_node_id | 
|-------------------|-----------------------|-----------------|-----------------|----------------| 
| 72057594037927935 | 72057594037927935     | 0               | ONLINE          | 0              | 
| 72057594037927935 | 72057594037927935     | 1               | ONLINE          | 1              | 
| 72057594037927935 | 72057594037927935     | 2               | ONLINE          | 2              | 
| 72057594037927935 | 72057594037927935     | 3               | ONLINE          | 3              | 
| 0                 | 89478485              | 0               | ONLINE DAC      | 64             | 
chadbaldwin commented 4 weeks ago

I just wrote a script to check all of our instances...across 260, this problem exists for 41 of them, so it's certainly not isolated to just one instance. Unfortunately, I don't see any instances with this issue where I have permissions to set up a trace, but I'm happy to provide any info I can to help track down or re-create this issue.

shueybubbles commented 4 weeks ago

you don't need permissions on the server to see the queries. xperf is something you run on the client to see what your app is sending to the server. Curious - if you open Object Explorer in SSMS to the resource governor entries for those instances, can you open the property sheets and script as alter for the pools and their workload groups?

SSMS also emits the sql client query traces to the output window, you can turn it on using tools/options/output window

chadbaldwin commented 4 weeks ago

Sorry, I meant permissions to do a SQL Profiler trace, not an xperf trace. I still haven't really figured out how to get xperf working. I installed the toolkit, but when I run the command it just returns immediately with an empty trace file. I tried using PerfView.exe but that's a bit of a learning curve as well 😂

Looks like I get the same error in SSMS as well.

When I tried scripting the entire Resource Governor item as "Create to" (alter isn't available at that level), it works fine.

When I tried scripting the workload group as "Alter to", it works fine.

BUT, when I tried scripting the "default" pool as "Alter to", it fails and returns that same exception with the same call stack (same with "drop and create to"):

===================================

An item with the same key has already been added. (mscorlib)

------------------------------
Program Location:

   at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
   at Microsoft.SqlServer.Management.Smo.NumaNodeCollection.get_NumaCollectionFromId()
   at Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo.SetSchedulerValues()
   at Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo.PopulateDataTable()
   at Microsoft.SqlServer.Management.Smo.ResourcePool.GetAllParams(StringBuilder sb, ScriptingPreferences sp, Int32& count)
   at Microsoft.SqlServer.Management.Smo.ResourcePool.ScriptAlter(StringCollection queries, ScriptingPreferences sp)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptAlterObjects(List`1 urns, Boolean isCreateOrAlter)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptWorker(List`1 urns, ISmoScriptWriter writer)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.Script(Urn[] urns, ISmoScriptWriter writer)
   at Microsoft.SqlServer.Management.Smo.ScriptMaker.Script(Urn[] urns)
   at Microsoft.SqlServer.Management.UI.VSIntegration.ObjectExplorer.ScriptGenerator.ScriptAlter(SqlTextWriter sqlwriter, Server server, Urn[] urns, ScriptingOptions options, Func`1 onScriptComplete)
   at Microsoft.SqlServer.Management.UI.VSIntegration.ObjectExplorer.ScriptNodeActionContext.<>c__DisplayClass8_0.<ScriptWithProgress>b__0()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()

I am able to pull up the Properties dialog.

I should note that Resource Governor is disabled on all of these instances, including the problematic ones. I'm just trying to resolve this exception that keeps causing the monitoring tool to log errors.

shueybubbles commented 4 weeks ago

sounds like you could just remove && pool.ID == 1 from your filter. Isn't default also a system object?

chadbaldwin commented 4 weeks ago

I think their goal was to just exclude the internal workload group, but include Default so they could show a script in the UI. That said, one of the other problematic instances I found does have a non-system pool (RG still disabled, must have been enabled in the past), and this problem is still occurring on the non-system pool...but it seems to be working fine with the two system pools. So it's not limited to only default/system pools.

For example, when I run my test script on that instance, I see this:

Name          IsSystemObject ResourcePoolAffinityInfo
----          -------------- ------------------------
default                 True Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo
internal                True Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo
pool_REDACTED          False

When I run .Script() on pool_REDACTED, it fails with the same error.

Also, this is the query I see SSMS using to populate the list of pools:

SELECT
wkg.name AS [Name],
'Server[@Name=' + quotename(CAST(
       serverproperty(N'Servername')
      AS sysname),'''') + ']' + '/ResourceGovernor' + '/ResourcePool[@Name=' + quotename(p.name,'''') + ']' + '/WorkloadGroup[@Name=' + quotename(wkg.name,'''') + ']' AS [Urn]
FROM
sys.resource_governor_configuration AS c,
sys.resource_governor_resource_pools AS p
INNER JOIN sys.resource_governor_workload_groups AS wkg ON CAST(wkg.pool_id AS int)=CAST(p.pool_id AS int)
INNER JOIN sys.resource_governor_external_resource_pools AS pls ON wkg.external_pool_id = pls.external_pool_id
WHERE
(p.name=default)
ORDER BY
[Name] ASC

However, the output window does not log any sort of query when scripting the pool. It's only showing the queries used for populating the object explorer tree.

shueybubbles commented 4 weeks ago

check this box in tools/options/output window to see all sqlclient queries

image

shueybubbles commented 4 weeks ago

i see this. swap out @_msparam_0 with a real id

SELECT
[mn].[memory_node_id] AS [NumaNodeId],
[smap].[cpu_id] AS [CpuId],
[smap].[scheduler_id] AS [SchedulerId],
[pls].[pool_id] AS [PoolId],
ISNULL([aff].[scheduler_mask], 0) AS [SchedulerMask]
FROM
[sys].[dm_os_memory_nodes] AS [mn]
INNER JOIN 
                         (SELECT [osn].[memory_node_id] AS [numa_node_id], [sc].[cpu_id], [sc].[scheduler_id]
                          FROM [sys].[dm_os_nodes] AS [osn]
                          LEFT JOIN [sys].[dm_os_schedulers] AS [sc] 
                              ON [osn].[node_id] = [sc].[parent_node_id] 
                              AND [sc].[scheduler_id] < 1048576
                          WHERE [osn].[memory_node_id] < (CASE WHEN @@version like '%(X64)%' THEN 64 ELSE 32 END)) AS [smap] ON  [mn].[memory_node_id] = [smap].[numa_node_id]

INNER JOIN [sys].[dm_resource_governor_resource_pools] AS [pls] ON  1 = 1 
LEFT OUTER JOIN [sys].[dm_resource_governor_resource_pool_affinity] AS [aff] ON 
        [pls].[pool_id] = [aff].[pool_id] 
        AND ([smap].[scheduler_id] / (CASE WHEN @@version like '%(X64)%' THEN 64 ELSE 32 END)) = [aff].[processor_group]

WHERE
([pls].[pool_id]=@_msparam_0)
ORDER BY [smap].[numa_node_id], [smap].[scheduler_id] ASC
chadbaldwin commented 4 weeks ago

Running against the instance where there is also a non-system pool, but RG still disabled: For ID 0 (internal), it returned nothing, which I assume is expected for the internal pool. For ID 1 (default), it returned 24 rows (I can include the results if you want them) For ID 257, which is the non-system pool, it returned nothing.

Running against the original instance that caused me to create this issue, which only has 2 pools (internal and default, both system): For ID 0 (internal), it returned nothing For ID 1 (default), it returned 20 rows

Running against a non-problematic instance: For ID 0 (internal), it returned nothing For ID 1 (default), it returned 224 rows

shueybubbles commented 4 weeks ago

i don't know anything about resource pools so I don't know what the right results would be. The exception indicates the columns it uses as a key have identical values in more than one row. I'm not sure if it's one column or a combination of a few columns.

chadbaldwin commented 4 weeks ago

i don't know anything about resource pools so I don't know what the right results would be

That makes two of us

shueybubbles commented 4 weeks ago

SMO has been using this same query for at least a decade. something could have changed in SQL server since 2012 that makes the query no longer correct. If you can somehow figure out what the duplicate key value is, you could see if your DBA can fix something on the server. Otherwise we'd need to find out how they created the offending pool in the first place so I can reproduce the issue and try to fix it.

chadbaldwin commented 4 weeks ago

Well.........I'd like to start this comment off with an apology for (partially) wasting a couple hours of your time. This is still an issue, but I screwed up earlier and ran the sys.dm_os_nodes query on the wrong instance. I ran it on a non-problematic instance...Had I run it on the correct instance, we would have seen the problem immediately. (I'll update the comment from earlier noting the mistake).

THIS is what it looks like when run against the problematic instance:

| cpu_affinity_mask | online_scheduler_mask | processor_group | node_state_desc             | memory_node_id | 
|-------------------|-----------------------|-----------------|-----------------------------|----------------| 
| 992               | 992                   | 0               | ONLINE                      | 0              | 
| 31                | 31                    | 0               | ONLINE THREAD_RESOURCES_LOW | 0              | 
| 1047552           | 1047552               | 0               | ONLINE                      | 1              | 
| 0                 | 31                    | 0               | ONLINE DAC                  | 64             | 

Query for reference:

SELECT SUM ([cpu_affinity_mask]) AS [cpu_affinity_mask], 
SUM([online_scheduler_mask]) AS [online_scheduler_mask],
[processor_group],
[node_state_desc],
[memory_node_id]
FROM [sys].[dm_os_nodes] 
GROUP BY [memory_node_id], [node_state_desc], [processor_group]

Notice the duplicate rows (by memory_node_id and processor_group).

This is the result of the full query run by SSMS:

| PoolID | AffinityType | ID | GroupID | CpuIds  | CpuAffinityMask | 
|--------|--------------|----|---------|---------|-----------------| 
| 1      | 2            | 0  | 0       | 992     | 992             | 
| 1      | 2            | 0  | 0       | 31      | 31              | 
| 1      | 2            | 1  | 0       | 1047552 | 1047552         | 

For reference, this is the full query captured from SSMS:

SELECT
[pls].[pool_id] AS [PoolID],

          CASE WHEN [plaff].[pool_id] IS NULL
          THEN 2 ELSE 1
          END
         AS [AffinityType],
[mn].[memory_node_id] AS [ID],
[mn].[processor_group] AS [GroupID],
[mn].[cpu_affinity_mask] AS [CpuIds],
[mn].[online_scheduler_mask] AS [CpuAffinityMask]
FROM
sys.dm_resource_governor_resource_pools AS [pls]
INNER JOIN 
                     (SELECT SUM ([cpu_affinity_mask]) AS [cpu_affinity_mask], 
                      SUM([online_scheduler_mask]) AS [online_scheduler_mask],
                      [processor_group],
                      [node_state_desc],
                      [memory_node_id]
                      FROM [sys].[dm_os_nodes] 
                      GROUP BY [memory_node_id], [node_state_desc], [processor_group]) AS [mn] ON  [mn].[memory_node_id] < (CASE WHEN @@version like '%(X64)%' THEN 64 ELSE 32 END)

LEFT OUTER JOIN (SELECT DISTINCT [pa].[pool_id] FROM [sys].[dm_resource_governor_resource_pool_affinity] AS [pa]) AS [plaff] ON 
        [pls].[pool_id] = [plaff].[pool_id]

WHERE
([pls].[pool_id]=@_msparam_0)
ORDER BY [pls].[pool_id], [mn].[memory_node_id] ASC

When the above query is run against a non-problematic instance, the results look like this:

| PoolID | AffinityType | ID | GroupID | CpuIds            | CpuAffinityMask   | 
|--------|--------------|----|---------|-------------------|-------------------| 
| 1      | 2            | 0  | 0       | 72057594037927935 | 72057594037927935 | 
| 1      | 2            | 1  | 1       | 72057594037927935 | 72057594037927935 | 
| 1      | 2            | 2  | 2       | 72057594037927935 | 72057594037927935 | 
| 1      | 2            | 3  | 3       | 72057594037927935 | 72057594037927935 | 
chadbaldwin commented 4 weeks ago

One last update for the day...Looked deeper into this.

I noted earlier that I had 260 instances and 41 of them have this problem. Turns out that's not true. The exception message is the same, but the inner inner exception message was not...Unfortunately, PowerShell only displays the top level exception message unless you force it. So when I saw they had all the same top level exception, I didn't think to drill in for each one.

4 of them hit VIEW SERVER STATE permission errors. So Script() had an inner inner exception of:

Failed to retrieve data for this request.

And 36 returned this:

The resource governor resource pool information is not complete. This can happen if a pool was created but the resource governor is not reconfigured.

So I would consider those not a bug/unhandled exception and is the expected behavior in those scenarios.

It seems the one and only instance I have where this appears to be a bug is when "ONLINE THREAD_RESOURCES_LOW" is involved. Which is the one returning:

An item with the same key has already been added. Key: 0

(the original exception that this issue is based on)

shueybubbles commented 4 weeks ago

thx for all the info! I'll find someone in the engine team who can explain why there'd be two rows that match the query and see if it's something we can fix.

My theory is we can remove [node_state_desc] from the inner query.