Closed iande closed 12 years ago
Two possibilities to consider:
The first option won't fully solve the problem, the aforementioned read loop will still run indefinitely. The second option would resolve the problem, allowing a reconnect handler to rescue the re-raised exception and attempt to connect again.
I still have a concern with installing the event handlers before the CONNECT/CONNECTED exchange is complete: those events will be triggered from within the same thread that performs client.connect
. Consider the following
def connect_to_broker c
begin
c.connect
rescue
sleep 2
retry
end
end
client = OnStomp::Client.new(...)
client.on_connection_closed do
connect_to_broker client
end
connect_to_broker client
# More code ...
If the TCP/IP connection is refused, that error is raised before connect
is called on the OnStomp::Connections::Stomp_1_0 instance created by OnStomp::Connections.connect
, thus no event handlers are installed, and everything works as expected. However, if the TCP/IP connection is established, and an EOF error occurs during the CONNECT/CONNECTED handshake, two things will happen:
on_connection_closed
handler will be invoked.So, let's suppose an EOFError occurs (or the broker takes too long to respond) while waiting for the CONNECTED frame. The callback is invoked, which calls client.connect
again. Suppose this time, the connection succeeds, the handler completes, the client is connected with a brand new connection object. We unwind back into the first connection, which then raises an exception, this in turn is rescued by the begin/end block in our initial call to connect_to_broker
, which in turn sleeps for 2 seconds, and then attempts to reconnect the client again. This time, the threaded processor of the client is up and running, so it is stopped before we reconnect, and then yet another connection is created. Assuming this one also connects properly, we continue execution at #More code ...
There are two main problems here. The first is that stopping the threaded processor does not close the connection, so the first successful connection is still open. However, since it is not attached to an IO loop, it will likely remain open for the duration of the program. The second problem is that we had a perfectly good connection and then threw it away when the reconnect code made the final call to c.connect
. Both issues could be prevented, in this case, by first checking the client's connection status before attempting to connect, e.g.:
begin
client.connect
rescue
unless client.connected?
sleep 2
retry
end
end
or by a fast return in Client#connect
:
def connect headers={}
return if connected?
# ...
end
However, this example is assuming the reconnect handler operates on the main thread only. If we used a separate stopped thread to make the connection, waking it up when a connection attempt was needed, we could still create the problem since Client#connect
is not synchronized.
Let's see if simply raising exceptions during the CONNECT/CONNECTED process is sufficient.
To further improve this, we'll raise a new error, ConnectionTimeoutError < FatalConnectionError
, if the read timeout is exceeded during the CONNECT/CONNECTED phase, and default read timeout to 120 seconds.
v1.0.7 has addressed this problem, though the solution could be a little more elegant.
The IO loop can run indefinitely if the connection is lost during the CONNECT/CONNECTED handshake. This is largely due to the fact that while
io_process_read
will trigger closing events, none have been bound to the connection at this stage. The problem is exacerbated by the fact that the read timeout check also only triggers an event without raising an exception, which causes this block in theconnect
method:to run indefinitely.
Thanks to Paul Gale for bringing this issue to my attention.