Using Redis clustering - Githubissues

jwoertink commented 2 years ago

We have Redis on AWS in cluster mode, and it's been fine while using Cable-cr. However, we just tried to use this shard directly to store some data, and ran in to some issues:

REDIS.lrange("thing", 0, -1)
# MOVED 12273 127.0.0.1:6379

This ends up throwing an exception. Is there a way to handle this already? Or something that will have to be added in to support?

stefanwille commented 2 years ago

I have never tried this shard with Redis in cluster mode, and I don't know if anybody has done so, or what it would take to get it to work.

jwoertink commented 2 years ago

I briefly looked at how the Ruby client does it. I think what that was doing was run the command, if it fails with this "MOVED" command, then you rescue and grab the host that contains the data and run the command against that host.

So in a pseudo-code way:

cluster = Redis.new(cluster: ["redis://1.2.3.4:6379", "redis://4.5.6.7:6379"])

cluster.lrange("thing", 0, -1)

# somewhere deep inside
private def make_call
  run
rescue e : RedisFailed
  if e.message =~ /MOVED/
    host = get_host_from_error(e)
    Redis.current_host = host
    make_call
  end
end

I'm sure it's a lot more complicated than that, but that was the general idea I saw the ruby client doing.

I don't think it was too bad getting it to recreate locally using https://github.com/bitnami/bitnami-docker-redis-cluster. Boot up these cluster redis in Docker locally, and then you can just run LRANGE thing 0 -1 right from the redis-cli and see the same error.

@russ might be able to provide a bit more insight and a docker-compose that worked...

russ commented 2 years ago

Here is the docker-compose.yml file I used to boot a cluster.

version: '2'
services:
  redis-node-0:
    image: docker.io/bitnami/redis-cluster:6.2
    volumes:
      - redis-cluster_data-0:/bitnami/redis/data
    environment:
      - 'ALLOW_EMPTY_PASSWORD=yes'
      - 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

  redis-node-1:
    image: docker.io/bitnami/redis-cluster:6.2
    volumes:
      - redis-cluster_data-1:/bitnami/redis/data
    environment:
      - 'ALLOW_EMPTY_PASSWORD=yes'
      - 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

  redis-node-2:
    image: docker.io/bitnami/redis-cluster:6.2
    volumes:
      - redis-cluster_data-2:/bitnami/redis/data
    environment:
      - 'ALLOW_EMPTY_PASSWORD=yes'
      - 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

  redis-node-3:
    image: docker.io/bitnami/redis-cluster:6.2
    volumes:
      - redis-cluster_data-3:/bitnami/redis/data
    environment:
      - 'ALLOW_EMPTY_PASSWORD=yes'
      - 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

  redis-node-4:
    image: docker.io/bitnami/redis-cluster:6.2
    volumes:
      - redis-cluster_data-4:/bitnami/redis/data
    environment:
      - 'ALLOW_EMPTY_PASSWORD=yes'
      - 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

  redis-node-5:
    image: docker.io/bitnami/redis-cluster:6.2
    volumes:
      - redis-cluster_data-5:/bitnami/redis/data
    ports:
      - "6379:6379"
    depends_on:
      - redis-node-0
      - redis-node-1
      - redis-node-2
      - redis-node-3
      - redis-node-4
    environment:
      - 'ALLOW_EMPTY_PASSWORD=yes'
      - 'REDIS_CLUSTER_REPLICAS=1'
      - 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'
      - 'REDIS_CLUSTER_CREATOR=yes'

volumes:
  redis-cluster_data-0:
    driver: local
  redis-cluster_data-1:
    driver: local
  redis-cluster_data-2:
    driver: local
  redis-cluster_data-3:
    driver: local
  redis-cluster_data-4:
    driver: local
  redis-cluster_data-5:
    driver: local

Then just a simple request for data.

❯ redis-cli -h 172.27.0.1
172.27.0.1:6379> lrange thing 0 -1
(error) MOVED 14607 172.27.0.2:6379

My understanding is that the cluster responds with what node the data is actually on. So if I connect to the node given back, then I can fetch the data.

❯ redis-cli -h 172.27.0.2
172.27.0.2:6379> lrange thing 0 -1
1) "foobar"

jgaskins commented 2 years ago

Hat tip to @jwoertink for pointing me to this issue.

The way I implemented this in my own Redis shard was by creating an abstraction for clusters that automatically routes commands to the right servers based on the key name (inferred from the second element in the command array) and whether it's a read or write command. This is cleaner and more performant (and, in the case of Redis.current = ..., more concurrency-safe) than rescuing exceptions and reconnecting to the new server, since that will almost certainly happen a lot.

The server that holds a given key can be derived with server_for_keyslot(crc16(key) % 16_384):

crc16(key) is the CRC16 checksum algorithm
- This gives you the "keyslot"
- If the key contains curly braces, the only part of the key that should be hashed is inside it. For example, if your key is {post}:1234, you only hash post. The server will enforce this — check CLUSTER KEYSLOT docs for more info.
- This lets you ensure certain sets of keys all reside on the same Redis server so you can run multi-key commands, like RPOPLPUSH.
- Feel free to use my CRC16 implementation and slot derivation function with attribution
server_for_keyslot(keyslot) is implemented as:
- run CLUSTER NODES
- parse the output and cache it
- Find the server that owns the range that the keyslot falls in
- Ideally, you'd refresh this periodically in case the cluster topology changes, but I haven't implemented this in my own shard

The above docker-compose config is great, but if you're not developing inside that same Docker network, you'll likely run into issues. The CLUSTER NODES command gives you IP addresses that the Redis server can connect to, but if those are behind a NAT layer (such as a Docker network) that you won't be running your code from, you may not be able to reach them. I couldn't get it to work that way so I wrote a Ruby script to spin up a cluster to help me in developing my own shard — it spins up the number of Redis servers of each type given at the top of the script and runs them on consecutive ports starting at 6379. As written, this script spins up 3 masters and 2 replicas each, for a 9-node Redis cluster.

Show script

```ruby #!/usr/bin/env ruby starting_port = 6379 masters = 3 replicas_per_master = 2 hosts = [] pids = [] (masters * (replicas_per_master + 1)).times do |i| port = starting_port + i cmd = "redis-server --port #{port} --cluster-enabled yes --cluster-config-file redis-#{port}.conf --appendonly yes --appendfilename redis-#{port}.aof --dbfilename redis-#{port}.rdb > redis-#{port}.log" hosts << "127.0.0.1:#{port}" pids << spawn(cmd) end # Wire up all the Redis servers to each other puts "Wiring up servers into a cluster..." cluster_command = "redis-cli --cluster create #{hosts.join(' ')} --cluster-replicas #{replicas_per_master} --cluster-yes" puts cluster_command system cluster_command puts "Press Enter to terminate Redis cluster" gets pids.each do |pid| Process.kill "TERM", pid Process.wait pid end ```

stefanwille / crystal-redis

Using Redis clustering #127