Closed onzag closed 4 years ago
Adding the unsure label because we are not sure whether this is a problem yet.
https://stackoverflow.com/questions/47253429/how-to-detect-client-disconnect-from-redis-in-nodejs
This should give some insight about handling of errors with redis connections.
Itemize now offers a redis wrapper that improves its resilience against network issues between clusters.
If redis dies and loses connection it can now be handled, the scenario is not optimal at all, as data corruption is considered during this outage, so caches are wiped during any sort of redis outage.
The behaviour also had to be changed where events are not queued during a redis outage, but rather it throws an error so that the server doesn't have to wait, causing a bunch of queued renders, and rather allow it to collapse.
If a cluster manager loses connection of its redis clients to the global cache it should wipe the cache and remove all the listeners as it doesn't know whether during the outage it lost events and as such the cache suddenly becomes invalid.
So all the cache should be marked as invalid, no feedback to check, simply blow it; and add a log message as error type because this shouldn't have happened to start with.
Knex should have no issue with this as during an outage will cause the endpoints to crash giving INTERNAL_SERVER_ERROR and once it recovers any knex related functionality should be mantained, no such case is with the local cluster cache, so the cache should be blown.
Remembering that sometimes the cache is the same, we should ensure not to blow global cache variables; now this should mean that it shouldn't even happen because then there's a single cluster and the global is the same, but just to keep consistency.