redis / jedis

Redis Java client
MIT License
11.81k stars 3.86k forks source link

Do cluster slot renewal in a background thread #1347

Open xetorthio opened 8 years ago

xetorthio commented 8 years ago

Instead of doing it as soon as a MOVED was received from redis.

marcosnils commented 8 years ago

Hmm I'm wondering if this makes sense.

The only scenario where I believe this could be useful is if you have little traffic to the cache or a specific node and the background renewal happens before the user gets a MOVED response. I still believe this scenario is pretty rare and I don't really see the benefit of doing this.

IMHO slot migration (because of failures or new node additions) shouldn't be something that happens often, so I don't think it's bad to lock if two queries get a MOVED at the same time and wait until the cluster heals. In addition to this, some questions came to my mind and I'm wondering if you considered them:

A hybrid idea that might work is to perform slot renewal in a separated thread only when receiving a MOVED response. As it is today, Jedis won't return a response to the user until the slow renewal has finished, maybe we can improve this and perform the slow renewal in a separate thread so we can let new queries to be processed.

@HeartSaVioR @antirez WDYT?

xetorthio commented 8 years ago

I think you got it wrong. It doesn't need to happen in background before a MOVED to make sense. The whole idea is not to wait for the cache to catch up when there is a MOVED, there is no need. We already know where to go, why penalizing the command? The idea is to continue and serve the command, and just update the cache in the background. This also avoid unnecessary locks on the cache. So it not only makes the command response latency smoother in general but also probably make the code easier to follow.

On Mon, Jul 11, 2016, 08:28 Marcos Nils notifications@github.com wrote:

Hmm I'm wondering if this makes sense.

The only scenario where I believe this could be useful is if you have little traffic to the cache or a specific node and the background renewal happens before the user gets a MOVED response. I still believe this scenario is pretty rare and I don't really see the benefit of doing this.

IMHO slot migration (because of failures or new node additions) shouldn't be something that happens often, so I don't think it's bad to lock if two queries get a MOVED at the same time and wait until the cluster heals. In addition to this, some questions came to my mind and I'm wondering if you considered them:

-

How often will you perform slot renewal by default?. Even though you can make this configurable, default should be good enough to fit "most" cases.

I guess you'll still be doing slot renewal on MOVED responses as well?. How would you coordinate the background renewal vs the one triggered by the MOVED response?

A hybrid idea that might work is to perform slot renewal in a separated thread only when receiving a MOVED response. As it is today, Jedis won't return a response to the user until the slow renewal has finished, maybe we can improve this and perform the slow renewal in a separate thread so we can let new queries to be processed.

@HeartSaVioR https://github.com/HeartSaVioR WDYT?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xetorthio/jedis/issues/1347#issuecomment-231709105, or mute the thread https://github.com/notifications/unsubscribe/AAIhp3hGAy-aQCSSe7usHbhZGK38vvnkks5qUijfgaJpZM4JID3f .

marcosnils commented 8 years ago

The idea is to continue and serve the command, and just update the cache in the background. This also avoid unnecessary locks on the cache.

Ok, I got it wrong then. This is something similar to what I thought about :smile:

HeartSaVioR commented 8 years ago

The idea is to continue and serve the command, and just update the cache in the background. This also avoid unnecessary locks on the cache.

Please correct me if I'm wrong, but then slot will be shared variable between user threads and background thread. If we don't set a lock, while background thread reconstructs slots operations can be probably failing. We should make operations waiting anyway while slot is reconstructing.

marcosnils commented 8 years ago

If we don't set a lock, while background thread reconstructs slots all operations can be probably failing

@HeartSaVioR how so?. Threads that don't see the change will still get the MOVED response and will be redirected to the new node while new threads will go to the correct node. The background thread can only process one batch slot update at a time.

This way, you won't make any transactions to wait until slot renewal finishes.

HeartSaVioR commented 8 years ago

We are "moving" the slot information, which means we "delete" the slot, and "add" the slot. It can't be atomic, and it means there's race condition between main thread and background thread.

But I like the idea @xetorthio said. Main thread doesn't need to wait for cache to be updated. If we handle the case what I'm saying properly, I'm OK to change.

HeartSaVioR commented 8 years ago

Yeah, I think I did a mistake again. Slots are stored to a concurrent hashmap. I'm OK with this. Great idea.

marcosnils commented 8 years ago

Yeah, I think I did a mistake again. Slots are stored to a concurrent hashmap. I'm OK with this. Great idea.

I was about to say this :D

xetorthio commented 8 years ago

Also current implementation cleans all slots before filling them. We could avoid doing that. Just update the slots without doing the cleanup. This way we'll be making a better use of the ConcurrentHashMap

xetorthio commented 8 years ago

We could use a SingleThreadExecutor to make sure there in a single background thread and if it crashes for whatever reason, it will be re-launched. The only thing missing is a signaling mechanism to tell the thread to "wake up" and renew slots when a MOVED was found. Still thinking about hot to do it. Any ideas?

HeartSaVioR commented 8 years ago

AFAIK, we can just create a new task which contains slot information from MOVED response to that ExecutorService whenever we found MOVED, and background thread runs every new task.

xetorthio commented 8 years ago

Ok. That means we never ask for slots again, right?

On Wed, Jul 13, 2016, 01:11 Jungtaek Lim notifications@github.com wrote:

AFAIK, we can just create a new task which contains slot information from MOVED response to that ExecutorService whenever we found MOVED, and background thread runs every new task.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xetorthio/jedis/issues/1347#issuecomment-232251942, or mute the thread https://github.com/notifications/unsubscribe/AAIhp_J5WMACDgqiIDwY8rrMHmcBTBeGks5qVGV4gaJpZM4JID3f .

marcosnils commented 8 years ago

@HeartSaVioR but won't tasks queue if we do this?. The idea is that there can one be one task in the queue, it wouldn't make sense to queue slot renewal tasks.

xetorthio commented 8 years ago

I'm not sure I understood you comment @marcosnils

marcosnils commented 8 years ago

@HeartSaVioR suggest adding a new tasks to the ExecutorService each time a MOVED response is returned. The SingleThreadExecutor will execute one task at a time but every time you add a task the task will be queued for later execution. What I'm saying is that there shouldn't be any queuing in this process, only one task can be processed at a time and that's it.

xetorthio commented 8 years ago

But how your suggestion works? Imagine you get a MOVED at the same time from 2 different jedis instances. Both need to do something. What do they do? With what @HeartSaVioR suggest, both queue them in the executor. And eventually the background thread will refresh the cache.

What will your suggestion do?

HeartSaVioR commented 8 years ago

@xetorthio @marcosnils I'd like to clarify once more. There're two different strategy of updating slot information when we get MOVED.

  1. update one slot which is based on MOVED response Then I'd like to see how much time a thread is waiting for concurrent map to write. If it's fairly tiny we even don't need to make slot update fire and forget. Just update it.
  2. update whole slots (say updating all 16384 slots) We need to set strategy in this case.

What we're currently doing is in fact slowest but clearest to resolve this scenario. No race condition because it's guarded with RW lock. Apart from race condition, there might be some threads throwing MOVED which reads slot information before write lock taking effect.

One sketched idea to resolve is comparing fetched timestamp (I mean the moment when the thread queries to the cache) and slot updated timestamp (in cache), and do update slots when only fetched timestamp is greater than slot updated timestamp. (which means that it queries to Redis Cluster with latest update of slot information but Redis responds MOVED.)

If we don't want to maintain RW lock but want to update whole slots, sketched idea could be extended to have timestamp for each slot (might be crazy to maintain timestamp of each slot). It's just a sketched idea so we could have better alternatives or this idea could be improved.

tl.dr; Summary of my sketched idea is: if borrowed thread got MOVED response, check that it refers latest information of that slot. If it does, we need to update slot(s). If not, we don't need to update slot(s).

xetorthio commented 8 years ago

I think that updating only one slot has it's benefits, but it could be also bad. This depends on the scenario and what exactly happens is the cluster. I guess that a MOVED shouldn't be something that happens very frequently, so updating all slots when it happens sounds pretty good.

Now this doesn't need to be too strict. Because if cache is not up to date, everything still works, only that we'll have hops.

So probably the best solution is the one that it' simplest to understand, code and maintain.

A background thread that does the update sounds good. What triggers the update is still something I'm not sure of.

On Wed, Jul 13, 2016, 11:23 Jungtaek Lim notifications@github.com wrote:

@xetorthio https://github.com/xetorthio @marcosnils https://github.com/marcosnils I'd like to clarify once more. There're two different strategy of updating slot information when we get MOVED.

1.

update one slot which is based on MOVED response Then I'd like to see how much time a thread is waiting for concurrent map to write. If it's fairly tiny we even don't need to make slot update fire and forget. Just update it. 2.

update whole slots (say updating all 16384 slots) We need to set strategy in this case.

What we're currently doing is in fact slowest but clearest to resolve this scenario. No race condition because it's guarded with RW lock. Apart from race condition, there might be some threads throwing MOVED which reads slot information before write lock taking effect.

One sketched idea to resolve is comparing fetched timestamp (I mean the moment when the thread queries to the cache) and slot updated timestamp (in cache), and do update slots when only fetched timestamp is greater than slot updated timestamp. (which means that it queries to Redis Cluster with latest update of slot information but Redis responds MOVED.)

If we don't want to maintain RW lock but want to update whole slots, sketched idea could be extended to have timestamp for each slot. It's just a sketched idea so we could have better alternatives or this idea could be improved.

tl.dr; Summary of my sketched idea is: if borrowed thread got MOVED response, check that it refers latest information of that slot. If it does, we need to update slot(s). If not, we don't need to update slot(s).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xetorthio/jedis/issues/1347#issuecomment-232370935, or mute the thread https://github.com/notifications/unsubscribe/AAIhpw5MTLQy-adY_ezNT0cFJK0XR8I7ks5qVPT6gaJpZM4JID3f .

mp911de commented 8 years ago

IMHO updating slot-wise on each MOVED redirection isn't necessarily efficient. lettuce follows the concept of having multiple strategies

  1. Regular refresh(polling in time intervals)
  2. Adaptive refresh

The adaptive topology refresh listens to persistent disconnects, MOVED and ASK redirections (configurable). If the cluster connection runs into a -MOVED error, then this is the signal to trigger a topology update (in lettuce topology updates are async anyway).

Another point to consider is a rate-limiting. Busy applications can run into 1000's of events per second, you don't want to trigger that many updates.

Integrating Threads into a framework always pulls in more stuff than expected. You don't want to launch a Thread per connection as that's expensive and with sharing Thread s you quickly end up adding configuration facilities. Wanted just mention that point to you . So sharing an ExecutorService with single/pooled implementation would be good when you want to perform updates using worker threads.

xetorthio commented 8 years ago

Thanks for your input @mp911de Checking the ruby and nodejs client, they both do what we where thinking at the beginning of this thread. They handle the MOVED and serve the command, and then trigger the refresh. There is no regular refresh. Now one thing that I like about ioredis, is that it does 2 things when it gets a MOVED. It changes the slot and then triggers the refresh (async of course). The good thing about this, is that any other command that happen to ask for the same slot, which the background thread refreshes the cache, will go to the right place. So it is a nice approach we should consider.

DanielYWoo commented 7 years ago

IMHO when client receives a MOVED command, we should not deem it as an exception ( JedisMovedDataException), it's quite normal during migration. We can just continue to serve MOVE command and update the slot cache in a background thread. Since redis is not a CP model in CAP, we can sacrifice consistency for liveness. Actually, we don't even have to do it each time we see a MOVED response, when there is a brain-split each clients will frequently update the slot cache in vain, when the notorious "too many redirects" error occurs, our clients will be spending most of IO resources to update the slot cache, and update it wrongly.

jocull commented 1 year ago

This thread is very old, but I don't think the problem has been solved, or it has not been solved consistently. I'm not overly familiar with all of the different cluster, connection pool, configuration options, but I do have a requirement to hold a consistent connection for a particular hash slot for a series of operations. Here is the basic code I worked out, after ruling out JedisCluster:

try (Connection conn = connectionProvider.getConnectionFromSlot(JedisClusterCRC16.getSlot(aStringKey))) {
    final Jedis client = new Jedis(conn);
    final String result = client.get(aStringKey); // ?
}

If the client sees a MOVED error that might work, I did not try it. But if the client gets a connection exception because the host has gone offline it never seems to recover from this, even after the cluster has performed a failover and recovered itself. It will be persistently stuck until you take care of the problem.

One solution is something like this - similar to Lettuce's adaptive refresh:

try (Connection conn = connectionProvider.getConnectionFromSlot(JedisClusterCRC16.getSlot(aStringKey))) {
    try {
        final Jedis client = new Jedis(conn);
        final String result = client.get(aStringKey); // ?
    } catch (JedisConnectionException ex) { // Only catches connection errors, will likely miss `MOVED` or others
        connectionProvider.renewSlotCache();
        // enter a retry loop here
    }
}

Another solution, or a companion solution in case you need both is to do the periodic refresh that was suggested above. Lettuce has this option and it has become absolutely necessary for stability in production.

// Somewhere in your application setup...
final ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
final ClusterConnectionProvider connectionProvider = new ClusterConnectionProvider(hosts, jedisClientConfig);
scheduledExecutorService.scheduleAtFixedRate(connectionProvider::renewSlotCache, 0, 15, TimeUnit.SECONDS);
jocull commented 1 year ago

What exception did you see? JedisMovedDataException or something else?

I was primarily testing cluster destruction and failovers where a primary is lost. In that case you see socket exceptions, wrapped under a JedisConnectionException and the cluster topology needs to be refreshed from another source since the primary for the slot (at the time of the last slot cache update) is offline.

jocull commented 1 year ago

I did test this with JedisCluster and it works very well on failovers. I think the implementations are different when you take direct control over a Connection. Perhaps I can work around that requirement to hold a specific connection by using a pipeline or some other transaction mechanism.

I was migrating the test from Lettuce where the equivalent of JedisCluster::waitReplicas does not accept a key and thus cannot be routed properly by the cluster management. I originally missed that in Jedis that is not the case and it simplifies the implementation greatly! 😄

github-actions[bot] commented 2 months ago

This issue is marked stale. It will be closed in 30 days if it is not updated.