ktheory / dalli-elasticache

A wrapper for Dalli with support for AWS ElastiCache
MIT License
128 stars 32 forks source link

Documentation on how often to refresh nodes #30

Open synth opened 7 years ago

synth commented 7 years ago

The README states how you can refresh the client, but it does not state how often this should occur or by what means.

I couldn't find much documentation on how often or why Elasticache nodes might drop off or join. The only thing the AWS docs mention is that:

If a node fails, ElastiCache takes down that node and spins up a replacement. The replacement process takes a few minutes. During this time the metadata in all the nodes still shows the endpoint for the failed node, but any attempt to interact with the node will fail. Therefore, your logic should always include retry logic. Reference

What are users of this gem expected to implement in terms of refreshing? I see in the Dalli-Elasticache docs that it refreshes upon app server restart. However, should we implement a refresh as a cron? Every 30minutes? Sample code would be most appreciated!

I've also posted a similar question on Stack overflow: https://stackoverflow.com/questions/47170376/how-to-refresh-clustered-redis-elasticache-nodes-in-a-rails-app Thanks.

Physium commented 3 years ago

So given that this was asked in 2017... was there any conclusion to this? Aside from how frequently is this being refreshed, in a typical rails app the config will be initialise on restart, how do I even call .refresh outside of production.rb?

petergoldstein commented 2 years ago

This is a good question and I plan to write up some documentation shortly. Some quick notes:

  1. In real ElastiCache clusters, node failures are infrequent - it is not uncommon for the set of nodes to stay constant for months if the cluster isn't scaled.
  2. Dalli will stop using a node for caching if it can't reach a node. It will distribute requests that would have previously gone to that node to other nodes in the cluster. So assuming that not all of the nodes in your cluster fail, then an individual node failure may degrade your caching performance but won't disable caching
  3. By default Dalli will attempt to reconnect to a node that has been previously marked unreachable after a period. So it should recover from transient node issues in which the set of nodes aren't changed.
  4. As of this writing (1/20/22), Dalli does not refresh the set of nodes after initialization of the Dalli::Client. That is, refreshing the Dalli::Elasticache set will have no impact on caching behavior for existing clients, including those configured for the Rails cache and session stores. Dalli requires enhancements to take advantage of node updates. Those may make it into Dalli 4.0.