instacart / makara

A Read-Write Proxy for Connections; Also provides an ActiveRecord adapter.
http://tech.taskrabbit.com/
MIT License
927 stars 170 forks source link

All slaves down, but no fallback to the master #186

Open darksoul42 opened 6 years ago

darksoul42 commented 6 years ago

I have a Redmine cluster with one master and two slaves running Postgres 9.5 on FreeBSD servers, and everything works as expected when everything is up. (Updates go to the master, selects go to the slave)

However, if one slave node goes down, I get a set_client_encoding error (though I suspect this is due to how the "pg" gem handles its own errors), and this does not happen when not defining "encoding". I could get that "invalid encoding error" to be gracefully handled, but this only revealed an underlying issue.

If both nodes are down or blacklisted, it seems 0.3.9 never falls back to the master, and retries forever only on the slaves, leading to an application timeout, meaning things do not work as advertised in the README.

I was wondering if there shouldn't be a tunable to say whether one wants to fallback to the master or not? I did look in the source code but could not find it.

(Also, as a side-note, if the master is down, given that Redmine requires updating stuff like authentication tokens, only having a slave alive is not enough)

Here is my database.yml :

production:
  adapter: postgresql_makara
  database: redmine
  username: redmine
  encoding: utf-8
  pool: 10
  makara:
    master_ttl: 10
    sticky: true
    connections:
      - role: master
        host: master.host
      - role: slave
        host: slave1.host
      - role: slave
        host: slave2.host
    connection_error_matchers:
      - '/invalid encoding name/'
darksoul42 commented 6 years ago

I could narrow it down to non-select queries (i.e queries absolutely requiring a master) trying to be executed at this point in proxy.rb :

    def any_connection
      @master_pool.provide do |con|
        yield con
      end
    rescue ::Makara::Errors::AllConnectionsBlacklisted, ::Makara::Errors::NoConnectionsAvailable
      begin
        @master_pool.disabled = true
        @slave_pool.provide do |con|
          yield con
        end
      ensure
        @master_pool.disabled = false
      end
    end

Either I end up with an error that leads to blacklisting of the master node and since there are no alive slaves, it completely falls flat, either it just endlessly stalls, probably because it can't find a live master that it "can" use. (It should be noted that restarting one slave node instantly restores functionality)

I wonder if this is not a case of refusing to use the same context because of the current strategy/stickiness logic, but I didn't dive deep in the internals yet so I can't confirm this, but it really feels like it tries to avoid using the master for "update" queries (or anything not matching the appropriate regexp) when it has already been used for "select" as a fallback, until a slave comes back. I can also confirm it tries with insistance to connect to the slaves before getting a connection refused. (This might be even more troublesome if the host was down and it had to time out...)

fmundaca commented 6 years ago

Hello, did you solve this ? apparently i'm experiencing the same problem

Thxs !

NKeerthi commented 5 years ago

@darksoul42 one way I solved this problem is by setting slave_strategy: failover which falls back to master when slave connection is lost. Setup: I tried this on my local machine by setting read only user on master and killing the mysql to slave user to check if it falls back to master. There is also another way of solving this by using connection_error_matchers as described in read me. You can list known errors, which will help in blacklisting the node. eg:

connection_error_matchers:
      - '/Query execution was interrupted/'
      - '/Access denied/'
psahni commented 9 months ago

@NKeerthi Will this fallback to master

Are these errors specifically to handle blacklisting