miguelgrinberg / python-socketio

Python Socket.IO server and client
MIT License
4.02k stars 590 forks source link

AsyncRedisManager: minimizing the loss of socket event when Redis instance is temporarily unavailable #1411

Open yaongmeow opened 2 days ago

yaongmeow commented 2 days ago

Is your feature request related to a problem? Please describe. I would like to improve the reliability of the AsyncRedisManager. I am using a Redis pod deployed within a Kubernetes network. To reduce the loss of events in case the pod restarts, I have thought of a new feature. So I customized original AsyncRedisManager to improve its reliability and used that for my project.

Describe the solution you’d like I aim to minimize the loss of socket events when the Redis instance becomes unavailable. This enhancement will ensure that temporary Redis outages result in minimal event loss, preserving the integrity of the data flow.

Describe alternatives you’ve considered If the manager fails to connect to Redis, it could place event data into a queue and repeatedly attempt to reconnect. Once the connection is restored, the manager would publish the events in the original order.

English is not my first language, so I used a translation tool to write this issue. I kindly ask for your understanding if there are any inappropriate or awkward expressions. Thank you for your consideration!

miguelgrinberg commented 1 day ago

This should probably be implemented as a generic solution, not just for the AsyncRedisManager. But in any case, my concern is that the queue may grow to a very large size if Redis is offline for long. How do you prevent this in your solution?

yaongmeow commented 1 day ago

To address this concern, I have set a maximum retry duration. If the retry time exceeds the maximum allowed duration, the retry attempt is abandoned, and the queue is cleared. In my service, I have set this duration to 5 minutes. Realistically, I believe it is challenging to ensure reliability if Redis remains unavailable for long time. My proposal, as described above, is a solution designed to be useful in situations where Redis needs to temporarily restart.

miguelgrinberg commented 21 hours ago

The way you ensure Redis reliability is by using a cluster, not a single instance. Then restarting an instance does not affect the availabilty because you always have additional instances as backup.