ledgetech / lua-resty-redis-connector

Connection utilities for lua-resty-redis
234 stars 71 forks source link

Example of sentinel master failover #47

Open esatterwhite opened 2 years ago

esatterwhite commented 2 years ago

I'm trying to understand the best way to deal with master failover. Is this something that the module does inherently? Or does every application need to implement this?

What is the best way to do this? And example would be very helpful

pintsized commented 2 years ago

What do you mean by "deal with master failover" exactly? Redis Sentinel can promote a replica to master in the event of failure, so from the point of view of this module, so long as the sentinels you provide in the config are still available, they should give you the new master when it is ready. You'll need to handle failures during any failover period, and one way of handling that can be to choose a replica for read-only operations (by specifying the role).

This document explains what to expect: https://redis.io/topics/sentinel

esatterwhite commented 2 years ago

No, nothing that complex. I only need to stay connected to the current primary. I'm new to lua/resty and how it manages connections / pools is a bit confusing to me.

local function do_redis() 
  local rc = require("resty.redis.connector").new()
  local master, err = local redis, err = rc:connect{
      url = 'sentinel://mymaster:m'
      sentinels = {
          { host = "sentinel-1", port = 26379 },
          { host = "sentinel-2", port = 26379 },
      }
    }
  }
 result, err = master.set('foo', 'bar')
 if err then
   return err
 end

 rc.set_keepalive(master)

end

So my understanding is every time connect is called is may potentially pick an existing long lived connection from a pool. It just looks like it is being created every time? Does it always select the current master when you call connect, or do existing long lived connections pulled from the pool stay connected to the same backend even during fail over?

Sorry for all the questions. I'm just trying to understand how to best use the library

pintsized commented 2 years ago

No worries, yes you're thinking about it in the right way. You might want to check the result from set_keepalive. For example. if there are bytes left on the wire it cannot place the connection in the pool. Also be aware of connection options, which inform how the pool works: https://github.com/openresty/lua-nginx-module#tcpsockconnect - there's a max pool size and also a timeout on the set_keepalive options for example - this often trips people up when testing and expecting to see connection reuse.

Using this connector module does add an extra layer of abstraction. So when you ask for a master node, you'll get back a lua-resty-redis connection to whatever node the quorum currently thinks is the master. If this dies, failover will be initiated, and a replica promoted, but this all happens in the background on the Redis Sentinel end. On our side, it just means when we ask sentinel for the current master, the answer we get back will now be different - and yes, this could be the first time you've connected to that node, so it won't be in the pool.

To answer your question, yes a connection to a failed master will live in the pool until it is either evicted, or reused - there's nothing actively managing the pool, it's just a cache of recently healthy connections. If a host is suddenly no longer available, the connection will obviously not be usable, and so you'll get errors and any attempt to call set_keepalive on it again will fail. This is the same if the connection came straight from the pool, or if the host died mid way through some commands you were running. One moment you were connected, the next you were not. The nice thing is you can simply keep calling connect and set_keepalive - you don't have to manage these failures in a special way.

esatterwhite commented 2 years ago

So this snippet I've got here is basically it? Maybe just a little more / smarter error handling?

Do I need to explicitly close a connection if set_keepalive errors?

esatterwhite commented 2 years ago

is a function call returning an error the only way to know that something is off with the connection? or are there any state properties once can introspect?

esatterwhite commented 2 years ago

@pintsized is it safe to use the pool settings when using sentinel? It looks like to me if the pool name is set it will use the same values when connecting to a sentinel to get the master name, as well as connecting directly to the master node.