sni / mod_gearman

Distribute Naemon Host/Service Checks & Eventhandler with Gearman Queues. Host/Servicegroups affinity included.
http://www.mod-gearman.org
GNU General Public License v3.0
122 stars 42 forks source link

Dup server #11

Closed mclarkson closed 13 years ago

mclarkson commented 13 years ago

Hi Sven, dup server feature works flawlessly when both job servers (server and dupserver) are on the same network. This has been running for two weeks now, although with the version before I merged your latest changes, with no problems. I tested with around 16,000 checks per 5 min then reduced to 8000 checks per 5 min on the production network. I could not see any difference in load, cpu usage or memory usage.

The feature works okay on poor network links with many drop-outs, but, of course some service checks are dropped. Errors can be seen in the log but sometimes an active check gets 'stuck' and does not update. This is with:

[nagios server]
[mod_gearman]
        |
       V
gearman_worker -----local_lan----> job server receiving active results (server)
                     \--------wan_link----> job server receiving passive results (dupserver)

In this scenario some service checks will stop updating on the [nagios server]. Service checks can be kicked back to life by issuing a reschedule check on the command pipe. I had to write a script to check service latency and write to the command pipe as I'm not sure why this occurs (doesn't make sense since the active check result is sent before the passive check result) and I'm leaving Nokia in two days.

So, this feature can only really be used when the dupserver is also on the local lan, or maybe a very reliable wan link.

Hopefully I will get to debug more in a future job.

Anyway, the whole installation was handed over to Middleware Ops today and they will be continuing the global roll-out over time so Nokia now uses mod_gearman. :)

Thanks Mark.

sni commented 13 years ago

Hi,

thanks for your work. It is possible to set retrys within gearman, i just did not try that so far. Maybe that helps a little bit on bad links. But i cannot tell something about possible side effects.

Thanks, Sven