Closed bmeck closed 10 years ago
It might also be related to restarting a broker but keeping worker processes alive?
That error should happen only when a worker dies during processing a request. That message is sent by a Broker that checks for Workers heartbeats. Are you sure that any Worker is dying?
The worker process does not die and does not use worker.stop()
Are you using any timeout option in your client requests?
nope
Ok I look forward into this
Could you try new release 0.086?
nofix, but managed to isolate a different oddity based upon the error you mentioned:
foo
foo
foo
causes W_ERROR, trying to force a disconnect/reconnect in our worker without restarting to see if it has the same symptoms
Please test 0.0.87. I have managed to fix some issues that could cause some race.
However I'd like to clarify a key point of zmq-omdp (and zmq in general). When you reconnect a Worker, Broker doesn't recognise the disconnection instantly. The reason is that zmq doesn't support socket disconnect events so you have to handle this state with heartbeating. So if a Worker disconnects and reconnects
I have reduced default heartbeating to 2.5 seconds so that a Worker is considered died by the Broker after 2.5 * 3 (7.5 seconds).
You can configure this options passing an object to the Client,Broker and Worker. Example new Client(broker, { heartbeat: 500 }) new Worker(broker, 'echo', { heartbeat: 500 }) new Broker(..., { heartbeat: 500 })
Make sure to use the same heartbeat value.
Let me know
seems to fix the race bit, will have to think about adding W_ERROR protection for new requests (via retry?).
On Wed, Oct 1, 2014 at 6:51 PM, Paolo Ardoino notifications@github.com wrote:
Please test 0.0.87. I have managed to fix some issues that could cause some race.
However I'd like to clarify a key point of zmq-omdp (and zmq in general). When you reconnect a Worker, Broker doesn't recognise the disconnection instantly. The reason is that zmq doesn't support socket disconnect events so you have to handle this state with heartbeating. So if a Worker disconnects and reconnects
I have reduced default heartbeating to 2.5 seconds so that a Worker is considered died by the Broker after 2.5 * 3 (7.5 seconds).
You can configure this options passing an object to the Client,Broker and Worker. Example new Client(broker, { heartbeat: 500 }) new Worker(broker, 'echo', { heartbeat: 500 }) new Broker(..., { heartbeat: 500 })
Make sure to use the same heartbeat value.
Let me know
— Reply to this email directly or view it on GitHub https://github.com/prdn/zmq-omdp/issues/6#issuecomment-57559725.
Yes, we should add an automatic retry flag for requests. If the worker dies for any reason the request should be requeued and submitted to a new worker.
Paolo On 2 Oct 2014 02:26, "Bradley Meck" notifications@github.com wrote:
seems to fix the race bit, will have to think about adding W_ERROR protection for new requests (via retry?).
On Wed, Oct 1, 2014 at 6:51 PM, Paolo Ardoino notifications@github.com wrote:
Please test 0.0.87. I have managed to fix some issues that could cause some race.
However I'd like to clarify a key point of zmq-omdp (and zmq in general). When you reconnect a Worker, Broker doesn't recognise the disconnection instantly. The reason is that zmq doesn't support socket disconnect events so you have to handle this state with heartbeating. So if a Worker disconnects and reconnects
I have reduced default heartbeating to 2.5 seconds so that a Worker is considered died by the Broker after 2.5 * 3 (7.5 seconds).
You can configure this options passing an object to the Client,Broker and Worker. Example new Client(broker, { heartbeat: 500 }) new Worker(broker, 'echo', { heartbeat: 500 }) new Broker(..., { heartbeat: 500 })
Make sure to use the same heartbeat value.
Let me know
— Reply to this email directly or view it on GitHub https://github.com/prdn/zmq-omdp/issues/6#issuecomment-57559725.
— Reply to this email directly or view it on GitHub https://github.com/prdn/zmq-omdp/issues/6#issuecomment-57562348.
@bmeck can I close this issue?
sure
On Tue, Oct 7, 2014 at 1:16 PM, Paolo Ardoino notifications@github.com wrote:
@bmeck https://github.com/bmeck can I close this issue?
— Reply to this email directly or view it on GitHub https://github.com/prdn/zmq-omdp/issues/6#issuecomment-58232686.
Every so often we see a W_ERROR, I think it has to do with a premature purge, but am unsure.
Server trace when W_ERROR is made
Client trace: