Open chadbaldwin opened 4 weeks ago
thx for opening an issue. Why would a tool try to script a system pool for "Create"? SSMS only enables Script for Alter on those pools. The exception indicates there are 2 rows in a catalog view with the same id.
I think the query it ran to populate the collection is constructed based on src/Microsoft/SqlServer/Management/SqlEnum/xml/ResourcePoolAffinityInfo.xml
See if that instance has more than one row in dm_os_nodes
with the same memory_node_id
and processor_group
.
You could find the exact query being run by looking at profiler or using xperf
xperf -start trace -f myTrace.etl -on *Microsoft.Data.SqlClient.EventSource
The monitoring tool does exclude system pools using if (pool.IsSystemObject && pool.ID == 1) continue;
, but I left that out in my test script because it didn't seem to make a difference when attempting to recreate the issue as this still happens on Default
.
I'm not sure how to use xperf, but I'll try to figure that out, in the meantime I ran that query from the XML file, this is what it returned on the problematic instance:
EDIT: The results below were removed due to being incorrect. They are actually from a non-problematic instance. Query was run on wrong instance by mistake. See: https://github.com/microsoft/sqlmanagementobjects/issues/169#issuecomment-2294448475
EDIT: REMOVED
That said, I also ran it on a non-problematic instance, and received very similar results:
| cpu_affinity_mask | online_scheduler_mask | processor_group | node_state_desc | memory_node_id |
|-------------------|-----------------------|-----------------|-----------------|----------------|
| 72057594037927935 | 72057594037927935 | 0 | ONLINE | 0 |
| 72057594037927935 | 72057594037927935 | 1 | ONLINE | 1 |
| 72057594037927935 | 72057594037927935 | 2 | ONLINE | 2 |
| 72057594037927935 | 72057594037927935 | 3 | ONLINE | 3 |
| 0 | 89478485 | 0 | ONLINE DAC | 64 |
I just wrote a script to check all of our instances...across 260, this problem exists for 41 of them, so it's certainly not isolated to just one instance. Unfortunately, I don't see any instances with this issue where I have permissions to set up a trace, but I'm happy to provide any info I can to help track down or re-create this issue.
you don't need permissions on the server to see the queries. xperf
is something you run on the client to see what your app is sending to the server.
Curious - if you open Object Explorer in SSMS to the resource governor entries for those instances, can you open the property sheets and script as alter for the pools and their workload groups?
SSMS also emits the sql client query traces to the output window, you can turn it on using tools/options/output window
Sorry, I meant permissions to do a SQL Profiler trace, not an xperf trace. I still haven't really figured out how to get xperf working. I installed the toolkit, but when I run the command it just returns immediately with an empty trace file. I tried using PerfView.exe
but that's a bit of a learning curve as well 😂
Looks like I get the same error in SSMS as well.
When I tried scripting the entire Resource Governor item as "Create to" (alter isn't available at that level), it works fine.
When I tried scripting the workload group as "Alter to", it works fine.
BUT, when I tried scripting the "default" pool as "Alter to", it fails and returns that same exception with the same call stack (same with "drop and create to"):
===================================
An item with the same key has already been added. (mscorlib)
------------------------------
Program Location:
at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at Microsoft.SqlServer.Management.Smo.NumaNodeCollection.get_NumaCollectionFromId()
at Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo.SetSchedulerValues()
at Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo.PopulateDataTable()
at Microsoft.SqlServer.Management.Smo.ResourcePool.GetAllParams(StringBuilder sb, ScriptingPreferences sp, Int32& count)
at Microsoft.SqlServer.Management.Smo.ResourcePool.ScriptAlter(StringCollection queries, ScriptingPreferences sp)
at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptAlterObjects(List`1 urns, Boolean isCreateOrAlter)
at Microsoft.SqlServer.Management.Smo.ScriptMaker.ScriptWorker(List`1 urns, ISmoScriptWriter writer)
at Microsoft.SqlServer.Management.Smo.ScriptMaker.Script(Urn[] urns, ISmoScriptWriter writer)
at Microsoft.SqlServer.Management.Smo.ScriptMaker.Script(Urn[] urns)
at Microsoft.SqlServer.Management.UI.VSIntegration.ObjectExplorer.ScriptGenerator.ScriptAlter(SqlTextWriter sqlwriter, Server server, Urn[] urns, ScriptingOptions options, Func`1 onScriptComplete)
at Microsoft.SqlServer.Management.UI.VSIntegration.ObjectExplorer.ScriptNodeActionContext.<>c__DisplayClass8_0.<ScriptWithProgress>b__0()
at System.Threading.Tasks.Task.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
I am able to pull up the Properties dialog.
I should note that Resource Governor is disabled on all of these instances, including the problematic ones. I'm just trying to resolve this exception that keeps causing the monitoring tool to log errors.
sounds like you could just remove && pool.ID == 1
from your filter. Isn't default also a system object?
I think their goal was to just exclude the internal workload group, but include Default so they could show a script in the UI. That said, one of the other problematic instances I found does have a non-system pool (RG still disabled, must have been enabled in the past), and this problem is still occurring on the non-system pool...but it seems to be working fine with the two system pools. So it's not limited to only default/system pools.
For example, when I run my test script on that instance, I see this:
Name IsSystemObject ResourcePoolAffinityInfo
---- -------------- ------------------------
default True Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo
internal True Microsoft.SqlServer.Management.Smo.ResourcePoolAffinityInfo
pool_REDACTED False
When I run .Script()
on pool_REDACTED
, it fails with the same error.
Also, this is the query I see SSMS using to populate the list of pools:
SELECT
wkg.name AS [Name],
'Server[@Name=' + quotename(CAST(
serverproperty(N'Servername')
AS sysname),'''') + ']' + '/ResourceGovernor' + '/ResourcePool[@Name=' + quotename(p.name,'''') + ']' + '/WorkloadGroup[@Name=' + quotename(wkg.name,'''') + ']' AS [Urn]
FROM
sys.resource_governor_configuration AS c,
sys.resource_governor_resource_pools AS p
INNER JOIN sys.resource_governor_workload_groups AS wkg ON CAST(wkg.pool_id AS int)=CAST(p.pool_id AS int)
INNER JOIN sys.resource_governor_external_resource_pools AS pls ON wkg.external_pool_id = pls.external_pool_id
WHERE
(p.name=default)
ORDER BY
[Name] ASC
However, the output window does not log any sort of query when scripting the pool. It's only showing the queries used for populating the object explorer tree.
check this box in tools/options/output window to see all sqlclient queries
i see this. swap out @_msparam_0 with a real id
SELECT
[mn].[memory_node_id] AS [NumaNodeId],
[smap].[cpu_id] AS [CpuId],
[smap].[scheduler_id] AS [SchedulerId],
[pls].[pool_id] AS [PoolId],
ISNULL([aff].[scheduler_mask], 0) AS [SchedulerMask]
FROM
[sys].[dm_os_memory_nodes] AS [mn]
INNER JOIN
(SELECT [osn].[memory_node_id] AS [numa_node_id], [sc].[cpu_id], [sc].[scheduler_id]
FROM [sys].[dm_os_nodes] AS [osn]
LEFT JOIN [sys].[dm_os_schedulers] AS [sc]
ON [osn].[node_id] = [sc].[parent_node_id]
AND [sc].[scheduler_id] < 1048576
WHERE [osn].[memory_node_id] < (CASE WHEN @@version like '%(X64)%' THEN 64 ELSE 32 END)) AS [smap] ON [mn].[memory_node_id] = [smap].[numa_node_id]
INNER JOIN [sys].[dm_resource_governor_resource_pools] AS [pls] ON 1 = 1
LEFT OUTER JOIN [sys].[dm_resource_governor_resource_pool_affinity] AS [aff] ON
[pls].[pool_id] = [aff].[pool_id]
AND ([smap].[scheduler_id] / (CASE WHEN @@version like '%(X64)%' THEN 64 ELSE 32 END)) = [aff].[processor_group]
WHERE
([pls].[pool_id]=@_msparam_0)
ORDER BY [smap].[numa_node_id], [smap].[scheduler_id] ASC
Running against the instance where there is also a non-system pool, but RG still disabled: For ID 0 (internal), it returned nothing, which I assume is expected for the internal pool. For ID 1 (default), it returned 24 rows (I can include the results if you want them) For ID 257, which is the non-system pool, it returned nothing.
Running against the original instance that caused me to create this issue, which only has 2 pools (internal and default, both system): For ID 0 (internal), it returned nothing For ID 1 (default), it returned 20 rows
Running against a non-problematic instance: For ID 0 (internal), it returned nothing For ID 1 (default), it returned 224 rows
i don't know anything about resource pools so I don't know what the right results would be. The exception indicates the columns it uses as a key have identical values in more than one row. I'm not sure if it's one column or a combination of a few columns.
i don't know anything about resource pools so I don't know what the right results would be
That makes two of us
SMO has been using this same query for at least a decade. something could have changed in SQL server since 2012 that makes the query no longer correct. If you can somehow figure out what the duplicate key value is, you could see if your DBA can fix something on the server. Otherwise we'd need to find out how they created the offending pool in the first place so I can reproduce the issue and try to fix it.
Well.........I'd like to start this comment off with an apology for (partially) wasting a couple hours of your time. This is still an issue, but I screwed up earlier and ran the sys.dm_os_nodes
query on the wrong instance. I ran it on a non-problematic instance...Had I run it on the correct instance, we would have seen the problem immediately. (I'll update the comment from earlier noting the mistake).
THIS is what it looks like when run against the problematic instance:
| cpu_affinity_mask | online_scheduler_mask | processor_group | node_state_desc | memory_node_id |
|-------------------|-----------------------|-----------------|-----------------------------|----------------|
| 992 | 992 | 0 | ONLINE | 0 |
| 31 | 31 | 0 | ONLINE THREAD_RESOURCES_LOW | 0 |
| 1047552 | 1047552 | 0 | ONLINE | 1 |
| 0 | 31 | 0 | ONLINE DAC | 64 |
Query for reference:
SELECT SUM ([cpu_affinity_mask]) AS [cpu_affinity_mask],
SUM([online_scheduler_mask]) AS [online_scheduler_mask],
[processor_group],
[node_state_desc],
[memory_node_id]
FROM [sys].[dm_os_nodes]
GROUP BY [memory_node_id], [node_state_desc], [processor_group]
Notice the duplicate rows (by memory_node_id
and processor_group
).
This is the result of the full query run by SSMS:
| PoolID | AffinityType | ID | GroupID | CpuIds | CpuAffinityMask |
|--------|--------------|----|---------|---------|-----------------|
| 1 | 2 | 0 | 0 | 992 | 992 |
| 1 | 2 | 0 | 0 | 31 | 31 |
| 1 | 2 | 1 | 0 | 1047552 | 1047552 |
For reference, this is the full query captured from SSMS:
SELECT
[pls].[pool_id] AS [PoolID],
CASE WHEN [plaff].[pool_id] IS NULL
THEN 2 ELSE 1
END
AS [AffinityType],
[mn].[memory_node_id] AS [ID],
[mn].[processor_group] AS [GroupID],
[mn].[cpu_affinity_mask] AS [CpuIds],
[mn].[online_scheduler_mask] AS [CpuAffinityMask]
FROM
sys.dm_resource_governor_resource_pools AS [pls]
INNER JOIN
(SELECT SUM ([cpu_affinity_mask]) AS [cpu_affinity_mask],
SUM([online_scheduler_mask]) AS [online_scheduler_mask],
[processor_group],
[node_state_desc],
[memory_node_id]
FROM [sys].[dm_os_nodes]
GROUP BY [memory_node_id], [node_state_desc], [processor_group]) AS [mn] ON [mn].[memory_node_id] < (CASE WHEN @@version like '%(X64)%' THEN 64 ELSE 32 END)
LEFT OUTER JOIN (SELECT DISTINCT [pa].[pool_id] FROM [sys].[dm_resource_governor_resource_pool_affinity] AS [pa]) AS [plaff] ON
[pls].[pool_id] = [plaff].[pool_id]
WHERE
([pls].[pool_id]=@_msparam_0)
ORDER BY [pls].[pool_id], [mn].[memory_node_id] ASC
When the above query is run against a non-problematic instance, the results look like this:
| PoolID | AffinityType | ID | GroupID | CpuIds | CpuAffinityMask |
|--------|--------------|----|---------|-------------------|-------------------|
| 1 | 2 | 0 | 0 | 72057594037927935 | 72057594037927935 |
| 1 | 2 | 1 | 1 | 72057594037927935 | 72057594037927935 |
| 1 | 2 | 2 | 2 | 72057594037927935 | 72057594037927935 |
| 1 | 2 | 3 | 3 | 72057594037927935 | 72057594037927935 |
One last update for the day...Looked deeper into this.
I noted earlier that I had 260 instances and 41 of them have this problem. Turns out that's not true. The exception message is the same, but the inner inner exception message was not...Unfortunately, PowerShell only displays the top level exception message unless you force it. So when I saw they had all the same top level exception, I didn't think to drill in for each one.
4 of them hit VIEW SERVER STATE
permission errors. So Script()
had an inner inner exception of:
Failed to retrieve data for this request.
And 36 returned this:
The resource governor resource pool information is not complete. This can happen if a pool was created but the resource governor is not reconfigured.
So I would consider those not a bug/unhandled exception and is the expected behavior in those scenarios.
It seems the one and only instance I have where this appears to be a bug is when "ONLINE THREAD_RESOURCES_LOW" is involved. Which is the one returning:
An item with the same key has already been added. Key: 0
(the original exception that this issue is based on)
thx for all the info! I'll find someone in the engine team who can explain why there'd be two rows that match the query and see if it's something we can fix.
My theory is we can remove [node_state_desc]
from the inner query.
I'm really not sure if this is an issue with my instance or an issue with SMO but I'm happy to help with testing however I can. I'm assuming it's an issue with SMO because I'm receiving an exception for a situation I would expect to be handled within SMO in the case that this was normal to happen.
For some reason, out of 100 SQL Instances, just 1
Microsoft.SqlServer.Management.Smo.Server
object is showing$server.ResourceGovernor.ResourcePools.ResourcePoolAffinityInfo
asnull
for all ResourcePools. I have tried comparing every single resource governor related system view with all the instances that worked, and I see no differences that would indicate why it'snull
.Here's the sample script I'm using:
Which is returning:
Whereas on all other instances I tested this against, I received:
The reason this is a problem for me is because an open source monitoring tool I use calls
Microsoft.SqlServer.Management.Smo.ResourcePool.Script()
. However, becauseResourcePoolAffinityInfo
isnull
, it is returning the following error:My assumption here is that if it is possible for
ResourcePoolAffinityInfo
to be null, then I would expect SMO to be aware of that situation and would check for null prior to runningScript()
. But since it is not checking for that situation, I assume this is a bug, or at the very least, an unhandled exception.