perlestius / Zabbix_Templates

Zabbix Templates
13 stars 8 forks source link

RFBacklog failing #6

Closed Paging6681 closed 1 year ago

Paging6681 commented 1 year ago

Hello,

Zabbix Version: 6.4.6 DFSR Servers: Windows Server 2022

In the Zabbix dashboard we are seeing problems: HOST "RG 'RG_Name': partner 'Partner-Name' WMI check failed using 'domain\user' account"

We see this reported for all DFSR servers, with an entry for each partner server in the replication group. We currently only have one replication group.

The user reported is the user under which the Zabbix server runs. This user is a local administrator on the server, and I can run PowerShell commands interactively on the server to query DFSR health and backlogs. These all run fine. For example, when querying backlog against one of the reported failures:

PS C:\Users\user\Documents> Get-DfsrBacklog -GroupName RG_Name -SourceComputerName HOST -DestinationComputerName Partner-Name -Verbose VERBOSE: No backlog for the replicated folder named "RG_Name"

In the zabbix_agent_2.log file on the DFSR server we see: 2023/09/12 17:58:39.487491 [UserParameter] command:'powershell -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent 2\Get-DFSRObjectParam.ps1" "RFBacklog" "GUID1" "GUID2"' length:49 output:'Couldn't retrieve in' 2023/09/12 17:58:39.487491 executed direct exporter task for key 'dfsr.params[RFBacklog,GUID1,GUID2]' 2023/09/12 17:58:39.487491 sending passive check response: 'Couldn't retrieve info from partner 'Partner-Name'' to 'zabbix_server_ip'

We get the same error when manually running the command in an administrative, interactive, shell while logged in as a domain adminstrator. PS C:\Users\user\Documents> powershell -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent 2\Get-DFSRObjectParam.ps1" "RFBacklog" "GUID1" "GUID2" Couldn't retrieve info from partner 'Partner-Name'

Any suggestions would be appreciated.

Thanks.

Paging6681 commented 1 year ago

As an update, I have tested the modification recommended by @pcamelio in https://github.com/perlestius/Zabbix_Templates/issues/5 and this looks to have resolved the issue I was having. This at least appears to work in my environment with Server 2022, at least to confirm that I have no backlogs (returns 0). I do not have access to other versions to test, so cant' say what the effect or need would be for other Windows Server versions. Nor, at this time do I have a backlog to know whether this will be accurately reported.

perlestius commented 1 year ago

Hello! Thank you for your feedback. Please, change line 283 at Get-DFSRObjectParam.ps1 to this text: "Couldn't retrieve info from partner '$RServerName', job state is $($j.State), timeout is $RequestTimeout" Then show me the output for this command: powershell -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent 2\Get-DFSRObjectParam.ps1" "RFBacklog" "GUID1" "GUID2"

Paging6681 commented 1 year ago

Further update...

While making the change above stopped us getting the errors in the logs, it is not a full fix. We still see the warnings in Zabbix dashboard:

HOST "RG 'RG_Name': partner 'Partner-Name' WMI check failed using 'domain\user' account"

Paging6681 commented 1 year ago

Hello! Thank you for your feedback. Please, change line 283 at Get-DFSRObjectParam.ps1 to this text: "Couldn't retrieve info from partner '$RServerName', job state is $($j.State), timeout is $RequestTimeout" Then show me the output for this command: powershell -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent 2\Get-DFSRObjectParam.ps1" "RFBacklog" "GUID1" "GUID2"

Reverted back to original file. Ran command, result is:

Couldn't retrieve info from partner 'Partner-Name', job state is Failed

perlestius commented 1 year ago

Ok. Please, try the script from Get-DFSRObjectParam-debug.zip and show me the output: DFSRObjectParam-debug.ps1" "RFBacklog" "GUID1" "GUID2"

Paging6681 commented 1 year ago

Thanks. See below:


backlog size: 0
backlog request duration: 508.2573 ms`
perlestius commented 1 year ago

Ok. Try this: Get-DFSRObjectParam-debug2.zip

Paging6681 commented 1 year ago

The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)

perlestius commented 1 year ago

Ok. I will try to reproduce the problem in my test lab and I'll be back with an answer

Paging6681 commented 1 year ago

Ok. I will try to reproduce the problem in my test lab and I'll be back with an answer

Let me know if you need any further information. But, for information, this is a newly built environment. Active Directory domain controllers and file servers all newly built, also on Windows Server 2022. There is currently just one RG and one RF and three DFSR member servers, one each on three different sites linked by SD WAN. There are only a few files on there as we have not migrated data to this new environment yet.

We see the same errors in Zabbix reported for each member server for each of its partners, so all connections between servers.

perlestius commented 1 year ago

Ok. And can you check the behavior with disabled firewall on all your servers and follow these recommendations? https://theitbros.com/the-rpc-server-is-unavailable-0x800706ba/

Paging6681 commented 1 year ago

Hello @perlestius

Thank you for your help, and apologies that it looks to be my fault!

In my previous testing, as I was able to query replication backlog and other stats from DFSR, I assumed it was something that the script was doing. Even following the page you linked to (which I had looked at previously), that all seemed to be fine. Running their first test - to prove whether RPC is available worked:

PS C:\Temp> Get-WmiObject Win32_ComputerSystem -ComputerName "remote_server"

Domain              : corp.domain.com
Manufacturer        : Microsoft Corporation
Model               : Virtual Machine
Name                : remote_server
PrimaryOwnerName    : Windows User
TotalPhysicalMemory : 4293861376

So I assumed that this was all fine. However, as suggested by you, I tested with the firewall disabled. This worked, however as I had tried disabling the firewall previously, I was confused. Then I discovered that it did not matter if the remote server's firewall was on or off, it was only the local firewall which was the issue. If I disabled the firewall on the local server that I was testing from, then the script worked fine.

This caused me to go an double check my Firewall GPO for the DFS servers. I had thought I had configured these correctly, but it seemed not. I will leave this here, in case anyone else is having similar issues and can learn from my mistakes...

In the GPO, under Windows Defender Firewall with Advanced Security, create a new inbound rule. Use the Predefined option and select DFS Replication from the drop-down. I also added File Server Remote Management.

Thanks again for the help, and sorry for it ending up being such a newbie error!

Paging6681 commented 1 year ago

As stated, user error...

perlestius commented 1 year ago

@Paging6681, thanks a lot for information. I will add this to readme file.