Open msellinger opened 6 years ago
@msellinger what's the status here, are you waiting for me merging this or are you going to change anything?
@magro This is to inform you that we are currently testing this patch (based on the current MSM version), we will also switch to Tomcat 8.5 in the process. Once everything is tested, I will report back, then I will apply again for including this in mainline.
We have the current master plus this pull request running in production for several weeks now and did not notice any regressions. From my side, this pull request is ready to be merged.
This patch handles failure of Redis operations in a better way and is aimed to especially allow failure of an Amazon Elasticache node to trigger immediate logging and reconnection of all Redis connection MSM currently uses. The current code has multiple deficiencies in this respect, especially
(a) it wouldn't detect the failure in all cases since not all failures lead to a JedisConnectionException (the new code catches Exception instead of JedisConnectionException)
(b) It reconnects all Redis connections even when a single failure occurs to make sure when the failure is detected no old defunct connections are laying around (this might lead to hard-to-track down errors at later)
(c) It does proper INFO / ERROR logging to allow debugging the problem later in logs instead of just doing DEBUG logging which is likely switched off on a live system
Note that this patch assumes that every Redis error is an indication that the node failed, which however should be safe to assume in production systems.