Reconnecting on initial tcp_connection_failure

johnae commented 13 years ago

I've been having trouble with this. I know I could just let the app exit and let whatever tools we use to run and monitor the app restart it but I'm in a situation where the actual starting of the app takes a lot of time and processing (we're running on small low-power systems among other things). IF there is a network problem or the rabbitmq-server is unreachable (down or something) I would like to try recovering every n seconds or something. I've got reconnecting nailed through on_tcp_connection_loss so that is no longer a problem - but I can't seem to get on_tcp_connection_failure to do the same for me.

What I've tried so far (simplified examples):

class Something

  def reconnect_on_connection_failure(settings)
    puts "Failed to connect, retrying in 5 seconds"
    sleep 5
    start
  end

  def start
    @options[:on_tcp_connection_failure] = method(:reconnect_on_connection_failure)
    AMQP.start(@host, @options) do
      ## whatever we need
    end
  end

end

class Something

  def reconnect_on_connection_failure(settings)
    puts "Failed to connect, retrying in 5 seconds"
    AMQP.stop
    EM.stop
    sleep 5
    start
  end

  def start
    @options[:on_tcp_connection_failure] = method(:reconnect_on_connection_failure)
    AMQP.start(@host, @options) do
      ## whatever we need
    end
  end

end

class Something

  def reconnect_on_connection_failure(settings)
    puts "Failed to connect, retrying in 5 seconds"
    AMQP.connection.reconnect(false, 5) ## perhaps all channels etc need to be recreated here?
  end

  def start
    @options[:on_tcp_connection_failure] = method(:reconnect_on_connection_failure)
    AMQP.start(@host, @options) do
      ## whatever we need
    end
  end

end

None of these do what I expect. One of them generated a possible_authentication_failure, so I even tried catching that and doing the same reconnection there to no avail. What am I missing?

michaelklishin commented 13 years ago

I think the problem seems to be that authentication exception is raised in addition to TCP connection failure.

johnae commented 13 years ago

Well. I tried catching the authentication exception as well, as I said, and doing the same procedure upon catching it...

michaelklishin commented 13 years ago

It is being raised in the EventMachine thread. This case (reconnecting on initial connection) is not covered right now because I have never heard anybody asking for it.

johnae commented 13 years ago

I see. Well I can understand that most people wouldn't bother and just let the whole app restart at some point. I guess I can do the same it's just that the restarting is quite painful on some of our systems and they're also not on the most reliable networks at times - that's why I investigated this.

michaelklishin commented 13 years ago

That's a valid request. I am not sure it will be solved before 0.8.0 is out, but at the same time, this functionality lives in amq-client so maybe releasing amq-client update will be enough for amqp gem users.

gerhard commented 12 years ago

Johnae, I had the same problem. Not sure if you managed to get this working in the end, but this does the trick:

require 'amqp'
require 'config'
require 'logger'

module ConnectionManager
  extend self
  include Config

  def logger
    @logger ||= Logger.new(STDOUT)
  end

  def connection_settings
    {
      :host     => RABBITMQ_CLUSTER.sample,
      :port     => RABBITMQ_PORT,
      :vhost    => RABBITMQ_VHOST,
      :user     => RABBITMQ_USER,
      :password => RABBITMQ_PASSWORD,
      :timeout  => RABBITMQ_TIMEOUT
    }
  end

  def connect
    logger.info("Connecting to RabbitMQ...")
    begin
      EM.run do
        connection = AMQP.connect(connection_settings)
        connection.on_open do
          logger.info("AQMP connection to #{connection_settings[:host]} established")
        end
      end
    rescue
      reconnect.(connection_settings)
    end
  end

  def reconnect
    Proc.new do |settings|
      logger.info("Failed to connect: #{settings.inspect}")
      EM.stop if EM.reactor_running?
      logger.info("Retrying in #{RABBITMQ_RECONNECT_EVERY} seconds")
      sleep RABBITMQ_RECONNECT_EVERY
      connect
    end
  end
end

ConnectionManager.connect

The only gotcha is reconnect.(connection_settings), a Ruby 1.9 way of calling procs.

gerhard commented 12 years ago

IMO, this is now a non-issue Michael.

michaelklishin commented 12 years ago

Good to know. By the way, reconnect.(connection_settings) is the same as reconnect.call(connection_settings).

gerhard commented 12 years ago

Call feels forced IMO. The shorter version feels a lot more Ruby-ish. Not to mention that it's a great hint to upgrade ; ).

johnae commented 12 years ago

I ditched amqp and rabbit and went with zeromq instead, not really because of this issue but zeromq doesn't have this problem by design. It's also much more efficient. As always - it depends on what your needs are and I'm not saying ZeroMQ is "better" than amqp, just that it was a better fit for us.

gerhard commented 12 years ago

Well, some of our services have been using it for a few years now, and while HA wasn't a priority back then, it is now, so we're exploring new parts of RabbitMQ. I must say, Erlang clustering is not the nicest or the most stable I came across, but when it works, it's a real joy to have.

For me, the ruby-amqp documentation is one of the best I came across, it's relatively easy to find all the answers, so the experience has been really good. Also, Michael is always around and I know Jakub personally (the former ruby-amqp contributor), so it's familiar ground.

Having said all that, ZMQ has been on my list for some time now, I will definitely check it out sooner rather than later. Cheers for chipping in!

michaelklishin commented 12 years ago

ZeroMQ is nice and has good use cases. Once I am through with documentation sites and 1.0 releases of my most active projects, I want to try making ZeroMQ story for JVM-based languages and Ruby a little bit better (libraries are mostly complete but not always easy to get started with and documentation guides do not exist, period)

gerhard commented 12 years ago

Looking good Michael! Never tried Clojure myself, would be more inclined to go JRuby rather than Clojure. Any good reasons to reconsider?

Yeah, ZMQ still feels very early days in the Ruby land. Must be different for Python or Java I assume.

michaelklishin commented 12 years ago

JRuby gets you access to the excellent VM and tooling (VisualVM, YourKit) and java.util.concurrent parts (that are excellent, even though you won't typically hear about "awesome concurrency story" in Java, it is there for common cases). But Ruby's libraries and development community are often blissfully unaware of concurrency, everything is written as if there ever was one thread, everything is mutable (and monkey-patchable!), async is being sold as cure for cancer. Beyond Web development, these are all major problems if you ask me.

I use Clojure for productivity reasons more than performance. In Clojure, the language is radically smaller, simpler and very stable (alphas have way fewer issues than CRuby and JRuby GA releases, because the language is so much smaller). Java interop and macros make it easy to build expressive libraries and apps from building blocks that have been around for a good decade, have been heavily used and optimized. You can extend existing protocols and data types without modifying the source in a safe efficient way (while in Ruby, monkey patching is anything but safe and makes implementing Ruby efficiently a lot harder, ask JRuby folks if you don't trust me).

Leiningen is my favorite build tool by far (and I have used Rake, Maven, Ant, Buildr and tried Gradle), you get best of Bundler-like dependency management and extensible build tool that uses Clojure for both configuration data and actual actions you may want to write.

There are great REPLs and the whole culture of REPL-driven development and even operations (using remote REPLs with authentication, over HTTP or SSH).

Data structures are immutable by default. Baseline performance is in practice often x10 that of JRuby (although JRuby 1.7 on JDK 7 will be much better). There are like 5 concurrency features in addition to what JDK has (4 ref types + promises). On top of that, I usually work on problems that functional languages fit better and the community is full of very competent people (I am not sure this is true about the Ruby community today).

I don't know if all this matters to you but if you choose to use Clojure, ClojureWerkz libraries will be there to hook you up with most popular modern data stores, messaging technology and all kinds of utilities.

Having said all that, there are rough edges (documentation beyond reference and books is poor, some compiler exceptions may be cryptic, few people have a lot of real experience with Clojure, there is need for a Clojure-oriented debugger, IDE plugins are not mature) and I am not saying I'd use Clojure for everything but it is my primary choice for anything non user-facing Web and infrastructure automation (where Chef is more than good enough for me).

By the way, my favorite Clojure book, Clojure Programming from O'Reilly, is 50% off today (Friday May 4th, 2012).

gerhard commented 12 years ago

We seriously hijacked this issue, but I'm very grateful for your post. Reading it, something "clicked" and it makes me want to learn Clojure more than node.js. Where do the competent Clojure peeps hang around? IRC? Mailing list?

michaelklishin commented 12 years ago

#clojure and #leiningen on freenode, clojure google group

michaelklishin commented 12 years ago

Also, a good way to start learning is 4clojure, don't miss Clojure cheatsheet (also) and clojuredocs.org. And learning with a toy leiningen project is definitely better than by running scripts so get Leiningen 2 and run lein2 new toying.

gerhard commented 12 years ago

Thanks so much! You're awesome!

ruby-amqp / amqp

Reconnecting on initial tcp_connection_failure #106