Closed mclarkson closed 13 years ago
Hi,
thanks for your work. It is possible to set retrys within gearman, i just did not try that so far. Maybe that helps a little bit on bad links. But i cannot tell something about possible side effects.
Thanks, Sven
Hi Sven, dup server feature works flawlessly when both job servers (server and dupserver) are on the same network. This has been running for two weeks now, although with the version before I merged your latest changes, with no problems. I tested with around 16,000 checks per 5 min then reduced to 8000 checks per 5 min on the production network. I could not see any difference in load, cpu usage or memory usage.
The feature works okay on poor network links with many drop-outs, but, of course some service checks are dropped. Errors can be seen in the log but sometimes an active check gets 'stuck' and does not update. This is with:
In this scenario some service checks will stop updating on the [nagios server]. Service checks can be kicked back to life by issuing a reschedule check on the command pipe. I had to write a script to check service latency and write to the command pipe as I'm not sure why this occurs (doesn't make sense since the active check result is sent before the passive check result) and I'm leaving Nokia in two days.
So, this feature can only really be used when the dupserver is also on the local lan, or maybe a very reliable wan link.
Hopefully I will get to debug more in a future job.
Anyway, the whole installation was handed over to Middleware Ops today and they will be continuing the global roll-out over time so Nokia now uses mod_gearman. :)
Thanks Mark.