Closed willkelly closed 10 years ago
What's the kind of zpipes traffic going across the cluster? I'll add a little tracing to zyre.
(I've got a fairly large refactor of Zyre and zbroker in the works, using the new actor model in CZMQ. Hope this doesn't destabilize things too much...)
Very little traffic. Only test traffic that consisted of very small transactions (open a pipe, write a byte, read a byte, close the pipe), and only a matter of 10 or so running concurrently. For the most part, the cluster was idle.
OK, so it's not caused by high water marks or such, rather some interconnection failure. Let's start by switching off automatic interface detection and making it all configured, so we can bring up the Zyre cluster gradually on the production environment, and isolate any problems as they hit.
Is this resolved by setting the interface explicitly? If so, can we close it?
Probably so. Will re-open under new issue if it happens again.
We recently pushed zbroker to production (!)
This would be a joyous event, except it doesn't seem to be working in the new environment. There are a few factors here -- we've got more network interfaces than in our staging, and we're running on more hosts. The current code seems to be working fine in staging but not at all in prod -- we're seeing high error rates even on very simple tests.
Here's an example: Error in test "scripts/simple_noop.yml"
Host "reader" (10.4.48.5)
Test script
Script Log
Broker Log
Host "writer" (10.4.48.7)
Test script
Script Log
Broker Log