patricegautier / unifiZabbix

Zabbix templates to monitor pretty much all Unifi devices
181 stars 36 forks source link

Sudden timeouts after restarting zabbix server container #110

Closed jasongillis closed 9 months ago

jasongillis commented 9 months ago

I'm seeing the following error trying to get statistics from my UXG Lite. I've got it configured as a UDMP in Zabbix. This has worked since I deployed it in early December and it started failing earlier this week when I restarted the zabbix-server container. The container image did not change and is the same one that's been in use for six weeks.

199:20240125:193142.233 Failed to execute command "/usr/lib/zabbix/externalscripts/mca-dump-short.sh '-v' '-d' '172.16.0.1' '-u' 'root' '-i' '/var/lib/zabbix/ssh_keys/zb_id_rsa' '-t' 'UDMP' '-p' '{$UNIFI_SSHPASS_PASSWORD_PATH}' '-o' '30' '-b'": Timeout while executing a shell script.

I can run the mca-dump-short.sh command from the command line in the zabbix-server container fine and it returns a blob of JSON data as expected. No errors are returned when running it directly as the zabbix user in the container.

root@c4cad0f167d4:/tmp# /usr/lib/zabbix/externalscripts/mca-dump-short.sh -d 172.16.0.1 -u root -i /var/lib/zabbix/ssh_keys/zb_id_rsa -t UDMP -o 30
{"anon_id":"f72acbcb-bd4d-4f00-85ed-3cf17371282c","antenna_table":[],"architecture":"aarch64","ble_caps":0,"board_rev":4,"bomrev":"113-01266-04","bomrev_id":"0004f204","bootid":-1,"bootrom_version":"unknown","cfgversion":"9205e2d15897b38a","cfgversion_effective":"4d69566c5ed32816","config_network_wan":{"ip":"192.168.100.10","netmask":"255.255.255.0","type":"static"},"connect_request_ip":"172.16.18.1","connect_request_port":"34325","country_code":0,"countrycode_table":[],"default":false,"discovery_response":false,"dualboot":false,"ever_crash":false,"fingerprint_req":true,"fw2_caps":7,"fw_caps":1676582....

The program also executes fine for my switches and AP with no errors.

I've tried disabling and re-enabling the SSH settings on the UXGL and rebooted it twice with no change in behavior.

The mca-*.err files in /tmp are all empty, zero byte files.

This is a zabbix 6.4.10 environment running in docker. The server container image is the ubuntu-6.4-latest tag.

I'm unsure how to debug this further, so any suggestions would be appreciated.

patricegautier commented 9 months ago

in the zabbix conf, what is the timeout set to if anything?

https://bestmonitoringtools.com/increasing-timeout-on-the-zabbix-server-or-agent/

Also, have you tried restarting zabbix server again? I have seen something like this before, and I I could never get to the bottom of it. its as if zabbix does issue new requests again for a timed-out item..

jasongillis commented 9 months ago

The timeout in the server conf was the problem. It looks like it got reset to default when I restarted the container earlier this week and I didn't realize that it had done that. Luckily, I was able to go back to a backup and restore the config and it's back in action!

Thank you so much for giving me the pointer here!

Also, thank you for developing these templates for Unifi equipment. They're very helpful.