heroku / logplex

[DEPRECATED] Heroku log router
Other
981 stars 96 forks source link

Adjust redis writer restart intensity #253

Closed ypaq closed 5 years ago

ypaq commented 5 years ago

Rationale

The redis writers connectivity appears unreliable.

Changes

Details

The prior restart intensity of the redis writer supervisor configuration did allow for 1000 restarts per 1 second. This is problematic for restarting redis writers, for example, when there are 100 shards and 10 writers per shard. On a network connectivity problem the number of restarts gets easily exceeded which forces a redis writer supervisor restart. After a supervisor restart all prior created connection information is lost. Without manual intervention the connection information is not automatically recovered.

The logplex_queue processes hold a list of workers for book keeping. This list doesn't have a function except for introspection. Without the change here this list becomes outdated as writer connections to redis go away.

A redis writer process would exit normally which prevents a automatic restart by the supervisor on unexpected errors when fetching messages from its logplex_queue process.