riskersen / Monitoring

Monitoring plugins wich are Nagios/icinga compatible
65 stars 113 forks source link

cluster check not working on slave #16

Closed Napsty closed 8 years ago

Napsty commented 8 years ago

It seems the "-T cluster" check is not working correctly when the check is launched on the slave. Output is:

./check_fortigate.pl -H slaveip -S AAAAAAAAAAAAAAAA -T cluster 
CRITICAL: fw002 (Master: BBBBBBBBBBBBBBBB, Slave: AAAAAAAAAAAAAAAA): HA (Active/Passive) is active, preferred master AAAAAAAAAAAAAAAA is not master!, Sync-State: Not Synchronized

The plugin thinks that serial BBBBBBBBBBBBBBBB should be master. But in the SNMP output it is shown that AAAAAAAAAAAAAAAA is master:

# snmpwalk -v 2c -c public slave 1.3.6.1.4.1.12356.101.13.2
iso.3.6.1.4.1.12356.101.13.2.1.1.1.1 = INTEGER: 1
iso.3.6.1.4.1.12356.101.13.2.1.1.1.2 = INTEGER: 2
iso.3.6.1.4.1.12356.101.13.2.1.1.2.1 = STRING: "BBBBBBBBBBBBBBBB"
iso.3.6.1.4.1.12356.101.13.2.1.1.2.2 = STRING: "AAAAAAAAAAAAAAAA"
iso.3.6.1.4.1.12356.101.13.2.1.1.3.1 = Gauge32: 0
iso.3.6.1.4.1.12356.101.13.2.1.1.3.2 = Gauge32: 0
iso.3.6.1.4.1.12356.101.13.2.1.1.4.1 = Gauge32: 29
iso.3.6.1.4.1.12356.101.13.2.1.1.4.2 = Gauge32: 30
iso.3.6.1.4.1.12356.101.13.2.1.1.5.1 = Gauge32: 6
iso.3.6.1.4.1.12356.101.13.2.1.1.5.2 = Gauge32: 7
iso.3.6.1.4.1.12356.101.13.2.1.1.6.1 = Gauge32: 8
iso.3.6.1.4.1.12356.101.13.2.1.1.6.2 = Gauge32: 11
iso.3.6.1.4.1.12356.101.13.2.1.1.7.1 = Counter32: 33634
iso.3.6.1.4.1.12356.101.13.2.1.1.7.2 = Counter32: 637158
iso.3.6.1.4.1.12356.101.13.2.1.1.8.1 = Counter32: 5748710
iso.3.6.1.4.1.12356.101.13.2.1.1.8.2 = Counter32: 152509278
iso.3.6.1.4.1.12356.101.13.2.1.1.9.1 = Counter32: 0
iso.3.6.1.4.1.12356.101.13.2.1.1.9.2 = Counter32: 0
iso.3.6.1.4.1.12356.101.13.2.1.1.10.1 = Counter32: 0
iso.3.6.1.4.1.12356.101.13.2.1.1.10.2 = Counter32: 0
iso.3.6.1.4.1.12356.101.13.2.1.1.11.1 = STRING: "fw002"
iso.3.6.1.4.1.12356.101.13.2.1.1.11.2 = STRING: "fw001"
iso.3.6.1.4.1.12356.101.13.2.1.1.12.1 = INTEGER: 1
iso.3.6.1.4.1.12356.101.13.2.1.1.12.2 = INTEGER: 0
iso.3.6.1.4.1.12356.101.13.2.1.1.13.1 = ""
iso.3.6.1.4.1.12356.101.13.2.1.1.13.2 = ""
iso.3.6.1.4.1.12356.101.13.2.1.1.14.1 = ""
iso.3.6.1.4.1.12356.101.13.2.1.1.14.2 = ""
iso.3.6.1.4.1.12356.101.13.2.1.1.15.1 = STRING: "A061A044BA6725817CD726C0C38529A2"
iso.3.6.1.4.1.12356.101.13.2.1.1.15.2 = STRING: "A061A044BA6725817CD726C0C38529A2"
iso.3.6.1.4.1.12356.101.13.2.1.1.16.1 = ""
iso.3.6.1.4.1.12356.101.13.2.1.1.16.2 = STRING: "AAAAAAAAAAAAAAAA"

I assume the plugin simply takes the first found entry (.1) and considers it to be master. The actual master serial ID is stored in .1.3.6.1.4.1.12356.101.13.2.1.1.16.2 (fgHaStatsMasterSerial). So the plugin should take this value.

However of what I'm not sure is why the value of 1.3.6.1.4.1.12356.101.13.2.1.1.12.2 is showing 0 (Not Synchronized). Any idea? Or is this normal that from point of view of the slave the master (fw001) is not in sync?

riskersen commented 8 years ago

These are good points maybe you could submit a patch and ill test and merge it. About the sync state I really don't known, fortigate often behaves not like expected.

If I find an available time slot I'll contact the fg support. What version are you using? 5.2

Napsty commented 8 years ago

Version is: v5.2.5,build701

Have you heard anything from the support concerning the not sync status from pov of the slave?

Napsty commented 8 years ago

Pull request #18 created. The -T cluster check in combination with -S now works:

# ./check_fortigate.pl -H masterip -C monitoring -T cluster -S AAAAAAAAAAAAAAAA
OK: fwrap001 (Master: AAAAAAAAAAAAAAAA, Slave: BBBBBBBBBBBBBBBB): HA (Active/Passive) is active, Sync-State: Synchronized

# ./check_fortigate.pl -H slaveip -C monitoring -T cluster -S AAAAAAAAAAAAAAAA
CRITICAL: fwrap002 (Master: AAAAAAAAAAAAAAAA, Slave: BBBBBBBBBBBBBBBB): HA (Active/Passive) is active, Sync-State: Not Synchronized

The correct master is now defined by using the dedicated OID in which the master serial is declared.

However you should verify the current method of the help_serials construct. It seems to be a roundrobin order of the serial id's. Sometimes the plugin takes the master's serial as slave, sometimes the slave's serial.

riskersen commented 8 years ago

@Napsty as @arigaud said in the pull request, this will not completely work, so i would advise to use snmp direct access for each host.

To achieve that, you could use host variables like _fw1_serial and _fw2_serial and add them to community, like check_fortigate_cluster!public-$_HOSTSERIAL_FW1$ / FW2$

But i saw that you lost your fortigate cluster, so we keep this for the archive.