twitter / thrift_client

A Thrift client wrapper that encapsulates some common failover behavior
http://github.com/twitter/thrift_client
Apache License 2.0
196 stars 83 forks source link

Differentiate retries for NOT_OPEN and TIMED_OUT transport exceptions #56

Open ryangreenberg opened 10 years ago

ryangreenberg commented 10 years ago

The underlying thrift Ruby library raises TransportException under a variety of circumstances:

From thrift's socket.rb:

raise TransportException.new(TransportException::NOT_OPEN, "Connection timeout to #{@desc}")
raise TransportException.new(TransportException::NOT_OPEN, "Could not connect to #{@desc}: #{e}")
raise TransportException.new(TransportException::TIMED_OUT, "Socket: Timed out writing #{str.length} bytes to #{@desc}")
raise TransportException.new(TransportException::NOT_OPEN, e.message)
raise TransportException.new(TransportException::TIMED_OUT, "Socket: Timed out reading #{sz} bytes from #{@desc}")
raise TransportException.new(TransportException::NOT_OPEN, e.message)
raise TransportException.new(TransportException::UNKNOWN, "Socket: Could not read #{sz} bytes from #{@desc}")

The type of transport exception is a property of the exception instance (NOT_OPEN, TIMED_OUT, or UNKNOWN). Currently thrift_client provides the ability to retry only at the level of an exception class (e.g. TransportException). For non-idempotent requests, TIMED_OUT and UNKNOWN are not safe to retry, but NOT_OPEN should be safe. An ideal solution would be to have the Ruby thrift library itself raise subclasses of TransportException. In the interim, does this seem like a reasonable retry strategy to add to thrift_client? If so, I'm happy to do the implementation but would like some input on a good interface for configuring ThriftClient.

Some ideas I had:

ryangreenberg commented 10 years ago

@klingerf and @sprsquish, I know this library isn't something that you work on day-to-day, but if you have a moment I'd love to hear your take on this issue.

sprsquish commented 10 years ago

I think the second option might be your best bet here as it maintains backward compatibility.

matasar commented 10 years ago

Wouldn't the block option also preserve compatibility? I think that would be a better API in general, and it would be less likely to break if people are directly comparing exception classes with ==.

sprsquish commented 10 years ago

What does the API look like in that case?

ryangreenberg commented 10 years ago

Current options (from the gem docs):

:exception_classes: Which exceptions to catch and retry when sending a request. Defaults to [IOError, Thrift::Exception, Thrift::ApplicationException, Thrift::TransportException, NoServersAvailable] exception_class_overrides: For specifying children of classes in exception_classes for which you don't want to retry or reconnect.

Ways to allow a block for retry behavior and maintain backwards compatability:

In order to use this new syntax to retry NOT_OPEN TransportException and raise TIMED_OUT TransportException, I would do something like:

Option A.

@client = ThriftClient.new(
  MyCustomService::Client,
  servers,
  :transport_wrapper => Thrift::FramedTransport,
  :timeout           => 0.500,
  :connect_timeout   => 0.50,
  :exception_classes => lambda do |ex|
    # whatever logic you want here
    [IOError, Thrift::ApplicationException, Thrift::TransportException, NoServersAvailable].any? {|ea| ex.is_a?(ea) } ||
      ( ex.is_a?(TransportException) && ex.type == NOT_OPEN )
  end,
  :raise             => true,
)

Option B.

@client = ThriftClient.new(
  MyCustomService::Client,
  servers,
  :transport_wrapper => Thrift::FramedTransport,
  :timeout           => 0.500,
  :connect_timeout   => 0.50,
  :exception_classes => [IOError, Thrift::ApplicationException, Thrift::TransportException, NoServersAvailable], # have to provide so defaults with Thrift::Transport aren't used
  :retry_exception_if  => lambda do |ex|
    ( ex.is_a?(TransportException) && ex.type == NOT_OPEN )
  end
)

I'm very flexible on the actual key name retry_exception_if in option B.

I have a slight preference for A, just because I think exception_classes + exception_class_overrides + retry_exception_if is a more confusing interface for users to consider, but I can always be convinced otherwise.

sprsquish commented 10 years ago

I think A is a bit confusing, I'd expect something called "exception_classes" to contain a list of classes. How about option B where, if the block is provided then exception_classes is ignored? That would provide backward compatibility.

Given that you want to change the default behavior though, you'd have to add the new exception classes rather than using the block. Maybe a hybrid approach: do the exception unwrapping, add the new exceptions to the default list, and add an option to provide a block that overrides exception_classes completely.

ryangreenberg commented 10 years ago

@sprsquish good point--providing something other than a list of classes as exception_classes is weird. Let me first try pursuing this from another angle to see if the Thrift project is receptive to the idea of adding Thrift::TransportException::NotOpen and Thrift::TransportException::TimedOut to the underlying library.

ryangreenberg commented 10 years ago

See https://issues.apache.org/jira/browse/THRIFT-2403.

divtxt commented 9 years ago

Please consider stale idle connections in the group of safe to retry. I have the following scenario:

Here the client should check for a remotely closed idle connection and treat it like NOT_OPEN.