Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
When an instance with RDB persistence enabled is shutting down it will block and timeout any client requests.
A dataset with between 1 and 2GBs is enough to keep the instance serving timeouts for about 14 secs in a setup I have with pretty decent storage.
That's hard to handle from client side because it doesn't know it needs to reconnect to a different replica. Timeouts could be genuine during the lifetime of the instance or be a shutdown, which we know we want to move to another replica in this case.
Health probes on the instance will detect it's timing out, but that can't be immediately, meaning a period of unavailability.
Expected behavior
At least keep serving reads on read-only replicas or kill all clients (and don't allow reconnection) so they try to reconnect to a different replica (not sure about master/replicas connected)
Additional information
On server.c : prepareForShutdown() we can see this behavior:
/* Create a new RDB file before exiting. */
if ((server.saveparamslen > 0 && !nosave) || save) {
serverLog(LL_NOTICE,"Saving the final RDB snapshot before exiting.");
if (server.supervised_mode == SUPERVISED_SYSTEMD)
redisCommunicateSystemd("STATUS=Saving the final RDB snapshot\n");
/* Snapshotting. Perform a SYNC SAVE and exit */
rdbSaveInfo rsi, *rsiptr;
rsiptr = rdbPopulateSaveInfo(&rsi);
if (rdbSave(server.rdb_filename,rsiptr) != C_OK) {
/* Ooops.. error saving! The best we can do is to continue
* operating. Note that if there was a background saving process,
* in the next cron() Redis will be notified that the background
* saving aborted, handling special stuff like slaves pending for
* synchronization... */
serverLog(LL_WARNING,"Error trying to save the DB, can't exit.");
if (server.supervised_mode == SUPERVISED_SYSTEMD)
redisCommunicateSystemd("STATUS=Error trying to save the DB, can't exit.\n");
return C_ERR;
}
}
When an instance with RDB persistence enabled is shutting down it will block and timeout any client requests. A dataset with between 1 and 2GBs is enough to keep the instance serving timeouts for about 14 secs in a setup I have with pretty decent storage. That's hard to handle from client side because it doesn't know it needs to reconnect to a different replica. Timeouts could be genuine during the lifetime of the instance or be a shutdown, which we know we want to move to another replica in this case.
Health probes on the instance will detect it's timing out, but that can't be immediately, meaning a period of unavailability.
Expected behavior
At least keep serving reads on read-only replicas or kill all clients (and don't allow reconnection) so they try to reconnect to a different replica (not sure about master/replicas connected)
Additional information
On server.c : prepareForShutdown() we can see this behavior: