opserver / Opserver

Stack Exchange's Monitoring System
https://opserver.github.io/Opserver/
MIT License
4.5k stars 823 forks source link

AG Cluster Info not displaying on SQL Dashboard #423

Closed cdhunt closed 1 year ago

cdhunt commented 1 year ago

The entire AG panel isn't rendering on an instance of Opserver. We have the configuration defined in envvars like this.

Modules__Sql__clusters__0__description = Azure Channels Cluster
Modules__Sql__clusters__0__name = Az Channels
Modules__Sql__clusters__0__nodes__0__name = chsql001.ds.stackexchange.com
Modules__Sql__clusters__0__nodes__1__name = chsql002.ds.stackexchange.com
Modules__Sql__defaultConnectionString = Server=$ServerName$,1433;Persist Security Info=False;User ID=user;Password="password";MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=True;Connection Timeout=30;

It looks like Opserver is pulling AG info. image

NickCraver commented 1 year ago

oo interesting - what are you seeing where the AG view should be?

cdhunt commented 1 year ago

image

NickCraver commented 1 year ago

In that case, it sounds like it doesn't have any clusters or AGs beneath - has this worked against Azure before? I'm not sure if the information available on-prem/local is actually exposed in all the Azure DMVs, of if anyone has tried yet...so not sure what our baseline is. Any idea if this has worked?

cdhunt commented 1 year ago

These are VMs with fully managed SQL in an AG that spans on-prem and Azure. The Azure-hosted Opserver can only connect to the two Az nodes of the 4 total nodes. The on-prem instance can only connect the 2 on-prem nodes, but in that instance, the AG panel shows up and includes all four. The "Primary" is on-prem.

NickCraver commented 1 year ago

Gotcha - @tarynpratt do you have any idea what is or isn't exposed in the DMVs on the Azure side here? It's possible that it's not exposed or we need to tweak queries here to consume it from the Azure side if available.

Queries we're running today are:

I don't have a setup to connect this to...but if we had an environment, could take a peek at what it's doing.

cdhunt commented 1 year ago

There should be nothing Azure-specific other than network access.

This is from one of the secondary VMs.

image image
NickCraver commented 1 year ago

I'm not sure I agree with "there should be nothing Azure-specific" - the normal case we had was inside a Windows cluster, which is what the view was assuming and aggregating data on to represent child AGs. Isn't this a very different scenario spanning outside a Windows cluster and not Distributed AGs which also had to be figured out along the way? I'm all for making this work, but as I understand it, this isn't the same topology under the covers. That almost certainly needs some love, but I'm a bit blindly guessing at what that should look like...it needs hands-on love to figure out queries and see what we should be connecting where. Happy to help soon as time allows, but I'd need a repro to query as a starting point.

If y'all are working on this with the repro/code, happy to do a call helping (will be free some this afternoon).

cdhunt commented 1 year ago

It looks like the issue is connectivity to "Primary. The Az-hosted Opserver can't connect to the node currently.

        public List<SQLNode.AGInfo> AvailabilityGroups
        {
            get { return Nodes.SelectMany(n => n.AvailabilityGroups.Data?.Where(ag => ag.IsPrimaryReplica) ?? Enumerable.Empty<SQLNode.AGInfo>()).ToList(); }
        }
NickCraver commented 1 year ago

@cdhunt ACK - if it still isn't working after connectivity issue please let me know, happy to help get you going here. If it just works that'd be a pleasant surprise - often we had to make a few tweaks for the additional scenarios, so don't hesitate to poke as you get further!

cdhunt commented 1 year ago

Confirmed. It works once connectivity to the Primary works.

NickCraver commented 1 year ago

Oo awesome! Glad it's the easy path :)

tarynpratt commented 1 year ago

Easiest issue I've ever been tagged on. :)