stompgem / stomp

A ruby gem for sending and receiving messages from a Stomp protocol compliant message queue. Includes: failover logic, ssl support.
http://stomp.github.com
Apache License 2.0
152 stars 80 forks source link

Initial connection does not failover on startup when first broker is down #98

Closed johntdyer closed 9 years ago

johntdyer commented 10 years ago

I am trying to two brokers w/ the following connection hash

hosts: [{:login=>"xxxx", :passcode=>"xxxx", :host=>"amq1.xxxxx.com", :port=>61613, :ssl=>false}, {:login=>"xxxx", :passcode=>"xxxx", :host=>"amq2.prod.xxxxx.com", :port=>61613, :ssl=>false}], initial_reconnect_delay: 5000, randomize: false, use_exponential_back_off: false, reliable: true

however the first broker in this case is down, shouldn't it immediately try the second?

/Users/jdyer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/stomp-1.3.2/lib/stomp/client.rb:97:in `rescue in initialize': Client failed to start in 10 seconds (Stomp::Error::StartTimeoutException)
    from /Users/jdyer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/stomp-1.3.2/lib/stomp/client.rb:89:in `initialize'
    from /Users/jdyer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/stomp-1.3.2/lib/stomp/client.rb:126:in `new'
    from /Users/jdyer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/stomp-1.3.2/lib/stomp/client.rb:126:in `open'
    from /Users/jdyer/Projects/tropo/ruby_transfer_agent/lib/transfer_agent/queue_listener.rb:38:in `create_connection'
    from /Users/jdyer/Projects/tropo/ruby_transfer_agent/lib/transfer_agent/queue_listener.rb:34:in `initialize'
    from lib/transfer_agent.rb:55:in `new'
    from lib/transfer_agent.rb:55:in `<module:TransferAgent>'
    from lib/transfer_agent.rb:5:in `<main>
gmallard commented 10 years ago

@johntdyer

Sorry for the delay. When this message hit my private mail, it went to a spam bucket. Not sure why.

Status: (I can recreate) accepted. It appears the default 10 second connection timeout broke this.

Usage notes for you: a) If you can use 1.3.1, this will probably be OK for you at present. b) Or, if you can use a Stomp#Connection, this will probably be OK for you at present. c) Regardless, you absolutely must add to the connection hash the parameter:

:max_reconnect_attempts => #

and specify some non-zero number. It defaults to zero, which in turn effectively means 'do not retry'.

d) Food for thought: your 'initial_reconnect_delay' (5000) is a fairly long time, it seems excessive to me. It is in seconds (not milliseconds). And is 83min20sec (appx).

So, this is on the to do list. Targeted for the next gem version.

Regards, G.

gmallard commented 10 years ago

c) Above is wrong / missated. 0 should be OK for max_reconnect_attempts.

johntdyer commented 10 years ago

Thanks, this helped

On Tue, May 13, 2014 at 11:20 AM, Guy M. Allard notifications@github.com wrote:

c) Above is wrong / missated. 0 should be OK for max_reconnect_attempts.

Reply to this email directly or view it on GitHub: https://github.com/stompgem/stomp/issues/98#issuecomment-42969191

gmallard commented 10 years ago

Comments for now .....

Stomp#Connection use is not affected.

The flaw is triggered by the condition:

:initial_reconnect_delay > :start_timeout

It can also be triggered when the total time spent attempting fail overs exceeds :start_timeout.

The problem can be totally bypassed by using the following in the Client's connect hash:

:start_timeout => 0

The difficulty is a fundamental clash between:

Final resolution is TBD at present. Resolution is very likely to favor retention of previous fail over logic in almost all cases.

gmallard commented 10 years ago

Considering 4 possible resolutions:

Any others?

Any preferences or other thoughts?

PaulGale commented 10 years ago

If there are no scenarios where :initial_reconnect_delay > :start_timeout is meaningful then throw an exception when this condition is detected, perhaps?

Thoughts?

gmallard commented 9 years ago

This should be fixed in v1.3.4.

Specifically:

847d346

7474222

Perhaps several other commits in the 1.3.3 -> 1.3.4 chain.