zbx-sadman / WSFC

Windows Server Failover Cluster (WSFC) metrics fetcher
GNU General Public License v3.0
24 stars 15 forks source link
windows-server-failover-cluster zabbix

WSFC Miner

This is a little Powershell script help to fetch metric's values from Windows Server Failover Cluster (WSFC).

Actual release 1.2.3

Tested on:

Supported objects:

Virtual keys for 'Cluster', 'ClusterNode' objects:

Virtual keys for all object which contains in ClusterParameter (see Get-ClusterParameter cndlet) table

Actions

How to use standalone

# Get Cluster name
powershell -NoProfile -ExecutionPolicy "RemoteSigned" -File "wsfc.ps1" -Action "Get" -ObjectType "Cluster" -Key "Name" -Id "f4479814-35d4-41c5-babd-c0697769ac31"

# Get PercentFree metric value from SharedVolumeInfo.Partition table for volume with ID=b8b67dbf-e66f-443e-926e-be1d1621ece5
..."wsfc.ps1" -Action "Get" -ObjectType "ClusterSharedVolume" -Key "SharedVolumeInfo.Partition.PercentFree" -Id "b8b67dbf-e66f-443e-926e-be1d1621ece5"

# Get total number of vCPUs assigned to all clustered VMs which hosted on Node with ID=00000000-0000-0000-0000-000000000001
... "wsfc.ps1" -Action "Sum" -ObjectType "ClusterNode" -Key "SummaryInformation.VirtualMachine.NumberOfProcessors" -Id "00000000-0000-0000-0000-000000000001"

# Get total number of Memory assigned (dynamically for WS2008 R2 SP1+) to all clustered VMs which placed in Cluster with ID=f4479814-35d4-41c5-babd-c0697769ac31
... "wsfc.ps1" -Action "Sum" -ObjectType "Cluster" -Key "SummaryInformation.VirtualMachine.NumberOfProcessors" -Id "00000000-0000-0000-0000-000000000001"

# Get formatted list of 'ClusterSharedVolume' object metrics accessed with property 'SharedVolumeInfo.Partition'. Verbose messages is enabled. 
... "wsfc.ps1" -Action "Get" -ObjectType "ClusterSharedVolume" -Key "SharedVolumeInfo.Partition" -ID "8e8fb118-2601-4a06-ab9a-f0a1260bd247" -DefaultConsoleWidth -Verbose

How to use with Zabbix

I recommend start use WSFC Miner as non-clustered service, tune it with Zabbix and make its clustered then.

Use as non-clustered Service

  1. Include zbx_wsfc.conf to Zabbix Agent config on any cluster node;
  2. Put wsfc.ps1 to C:\zabbix\scripts dir. If you want to place script to other directory, you must edit zbx_wsfc.conf to properly set script's path;
  3. Set Zabbix Agent's / Server's Timeout to more that 3 sec (may be 10 or 30);
  4. If you need to use .SummaryInformation. metrics - you must change Zabbix Service account from "Local System" to any account, that have local admin rights to use FailoverClusters Cmdlet's and have rights to make WMI-queries to all cluster nodes over network. Otherwise you will got script error;
  5. Import template to Zabbix Server;
  6. Think twice before link Template to host and disable discovery rules that not so important (may be "Virtual Machines", "Generic Services", "Cluster Networks"). Otherwise u can get over 9000% CPU load with PowerShell calls;
  7. Pray and link template;
  8. Enjoy.

Use as failover Generic Service

  1. If you want use local (non-clustered) and failover (clustered) Zabbix Agent at the same time - you must to change Zabbix Agent's directive ListenPort in "clustered agent" config from default to another unused (may be 16092 or so). Otherwise you can sometime get error 1067 when clustered Zabbix Agent will migrate. This is due first started instance of Agent bind to all available host's addresses and second instance just exit when started;
  2. Create copy of Zabbix Agent config (call it zabbix_agentd_WSFC-A.conf for example) on the one cluster node;
  3. Include zbx_wsfc.conf to Zabbix Agent config, if you have not done this before;
  4. Choose new IP-address and domain name for using with Generic Service. It's should not be Cluster's IP and Hostname. Change ListenIP & Hostname directive to new values in zabbix_agentd_WSFC-A.conf;
  5. Put wsfc.ps1 and zabbix_agentd_WSFC-A.conf to every node in cluster (or try to use Windows Shares);
  6. On every node deinstall service of local Zabbix Agent and install it again with -m key (zabbix_agentd.exe -c ... -x, zabbix_agentd.exe -c ... -d, zabbix_agentd.exe -c ... -i -m, zabbix_agentd.exe -c ... -s -m);
  7. On every node install second Zabbix Agent's service with zabbix_agentd_WSFC-A.conf and -m key (zabbix_agentd.exe -c ...zabbix_agentd_WSFC-A.conf -i -m). Don't start that service manually - its will auto-started by WSFC on service's Owner node;
  8. If you need to use .SummaryInformation. metrics - you must change Zabbix Service account from "Local System" to any account, that have local admin rights to use FailoverClusters Cmdlet's and have rights to make WMI-queries to all cluster nodes over network. Otherwise you will got script error;
  9. Create new "Generic Service" for your cluster with Failover Cluster MMC, assign to its an IP-address and hostname, which was defined on step 4;
  10. Import template to Zabbix Server;
  11. Think twice before link Template to host and disable discovery rules that not so important (may be "Virtual Machines", "Generic Services", "Cluster Networks"). Otherwise u can get over 9000% CPU load with PowerShell calls;
  12. On Zabbix server create new host with IP-address and hostname from step 4;
  13. Pray and link template;
  14. Start Generic Service, that you create on step 9 with with Failover Cluster MMC;
  15. Enjoy. May be.

Note Do not try import Zabbix v2.4 template to Zabbix pre v2.4. You need to edit .xml file and make some changes at discoveryrule - filter tags area and change # to <>_ in trigger expressions. I will try to make template to old Zabbix.

Note In template used Item's type Zabbix Agent (active). You must set up ServerActive directive of Zabbix Agent or change Item's type to Zabbix Agent. In this case number of pollers of Zabbix Server must be increased, because any run of PowerShell script will freeze poller thread to 2 sec (on my hardware).

Hints

Beware frequent requests to PowerShell script eat CPU and increase Load. To avoid it - don't use small update intervals with Zabbix's Data Items and disable unused.