instacart / makara

A Read-Write Proxy for Connections; Also provides an ActiveRecord adapter.
http://tech.taskrabbit.com/
MIT License
930 stars 170 forks source link

ActiveRecord connection timeouts on non-makara, empty connection pool #147

Open aks opened 7 years ago

aks commented 7 years ago

We are evaluating makara as a replacement for our use of octopus within our Rails 4 app using Postgres 9.5.

One unresolved problem we are having in our evaluation is that there are some inexplicable ActiveRecord connection timeouts, but the stacktraces do not appear to involve makara at all.

From my review of the makara code, it appears that it "hijacks" and proxies all of the AR DB connections.

Is it expected that some AR connections will not go through makara?

I've attached a stack trace of one of the connection timeout errors. 1.txt

We have other errors that correctly involve makara, but which are not makara-induced. See the second stack trace. m1.txt

For the most part, makara is working correctly, except that we are having these strange DB connection timeouts, for which it appears the connection is not using makara.

Thanks for any insights.

aks commented 7 years ago

I've reviewed the makara code, and the AR code that was in the stacktrace. Here are my findings:

In makara_abstract_adapter.rb, line 108, makara hijacks the core AR methods:

hijack_method :execute, :select_rows, :exec_query, :transaction

It also wraps some other AR methods to cause their effects to be distributed to all the connections in both the master and slave pools:

send_to_all :connect, :reconnect!, :verify!, :clear_cache!, :reset!

However, I notice that in our copy of Rails AR (4.2.6), the find method has this method in core.rb, starting at line 148:

s = find_by_statement_cache[key] || find_by_statement_cache.synchronize {
  find_by_statement_cache[key] ||= StatementCache.create(connection) { |params|
    where(key => params.bind).limit(1)
  }
}
record = s.execute([id], self, connection).first

The stacktrace shows that the connection call, appearing in the argument of that last line, is the one hanging, through many other nested AR methods, waiting on an available connection. However, none of that nested code in the stacktrace involves makara. It's pure AR code, multi-thread safe, waiting for an available connection, with multiple nested mutex semaphore locks.

When I examine the connection handler code in makara, it becomes clear that the connection pools that makara is managing are distinct from the connection pools that AR is waiting on. Because makara hijacked most of AR's connection management, it's unclear that AR even has a connection pool to work with.

Is there some reason that the connection method isn't hijacked to make the connection retrieval also flow through makara?

bleonard commented 7 years ago

Thanks for your investigation. To my knowledge (which is somewhat limited), there is no reason. Seems like something to try. It's possible @mnelson remembers something. It's been awhile.

swordfish444 commented 7 years ago

@aks Did you ever figure this out? I'm also facing the same issue and would like to know what you ended up doing. Thanks!

aks commented 7 years ago

@swordfish444 -- we switched to evaluating two other gems: octopus and fresh_connection. Both are less complex gems, work with multi-threaded environments, and also manage the cache coherency problem.