marcoCasamento / Hangfire.Redis.StackExchange

HangFire Redis storage based on original (and now unsupported) Hangfire.Redis but using lovely StackExchange.Redis client
Other
448 stars 108 forks source link

Jobs not processed when first cluster node in connection string is down #126

Closed Matiszak closed 4 months ago

Matiszak commented 10 months ago

Redis can tolerate a few nodes being down as long as the whole cluster is ok. But this code in link below depends solely on first node being up. If it's not (while the whole cluster is still okay) then processing of the jobs fails. What's worse there is only one exception. Subsequent logs about processing not working are logged as "Debug".

Reproduction steps: 1) Setup redis cluster with 3 masters and 3 slaves (that's my test case but any other configuration in which disabling one node does not bring the whole cluster down is okay) 2) Setup redis with connection string like so: server06.local.dev:6129,server05.local.dev:6129,server04.local.dev:6129,server03.local.dev:6129,server02.local.dev:6129,server01.local.dev:6129,password=some_password 3) Bring down server06 4) Schedule new job and observe it being constantly in "Scheduled" state

Issues: 1) Jobs are not processed 2) Logs do not show anything beside first 'Error' message which indicates there will be a retry in X seconds, after that no more 'Error' logs. (like many other users of the library we are recording only logs >=Warning).

https://github.com/marcoCasamento/Hangfire.Redis.StackExchange/blob/806dcabd06c45878f5232ef4fb91a5c13fcda2c0/Hangfire.Redis.StackExchange/RedisStorage.cs#L118

StackExchange.Redis version: 2.6.122 First exceptions('Error'):

2023-09-10 17:28:04.4083|Error|Hangfire.AspNetCore.AspNetCoreLog.Log|Execution DelayedJobScheduler is in the Failed state now due to an exception, execution will be retried no more than in 00:00:09|StackExchange.Redis.RedisConnectionException: The message timed out in the backlog attempting to send because no connection became available (5000ms) - Last Connection Exception: It was not possible to connect to the redis server(s). Error connecting right now. To allow this multiplexer to continue retrying until it's able to connect, use abortConnect=false in your connection string or AbortOnConnectFail=false; in your code. ConnectTimeout, command=TIME, timeout: 5000, inst: 0, qu: 2, qs: 0, aw: False, bw: SpinningDown, last-in: 0, cur-in: 0, sync-ops: 47, async-ops: 1, serverEndpoint: server06.local.dev:6129, conn-sec: n/a, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: COMPUTER_NAME(SE.Redis-v2.6.122.0), IOCP: (Busy=2,Free=998,Min=16,Max=1000), WORKER: (Busy=1,Free=32766,Min=16,Max=32767), POOL: (Threads=26,QueuedItems=0,CompletedItems=631,Timers=15), v: 2.6.122.0 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
 ---> StackExchange.Redis.RedisConnectionException: It was not possible to connect to the redis server(s). Error connecting right now. To allow this multiplexer to continue retrying until it's able to connect, use abortConnect=false in your connection string or AbortOnConnectFail=false; in your code. ConnectTimeout
   --- End of inner exception stack trace ---
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in C:\app\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 2094
   at StackExchange.Redis.RedisServer.Time(CommandFlags flags) in C:\app\src\StackExchange.Redis\RedisServer.cs:line 623
   at Hangfire.Redis.StackExchange.RedisConnection.GetUtcDateTime() in C:\app\src\Hangfire.Redis.StackExchange\RedisConnection.cs:line 112
   at Hangfire.Server.DelayedJobScheduler.<>c__DisplayClass14_0.<EnqueueNextScheduledJobs>b__0(IStorageConnection connection)
   at Hangfire.Server.DelayedJobScheduler.UseConnectionDistributedLock[T](JobStorage storage, Func`2 action)
   at Hangfire.Server.DelayedJobScheduler.Execute(BackgroundProcessContext context)
   at Hangfire.Server.BackgroundProcessDispatcherBuilder.ExecuteProcess(Guid executionId, Object state)
   at Hangfire.Processing.BackgroundExecution.Run(Action`2 callback, Object state)

2023-09-10 17:28:04.4350|Error|Hangfire.AspNetCore.AspNetCoreLog.Log|Execution RecurringJobScheduler is in the Failed state now due to an exception, execution will be retried no more than in 00:00:09|StackExchange.Redis.RedisConnectionException: The message timed out in the backlog attempting to send because no connection became available (5000ms) - Last Connection Exception: It was not possible to connect to the redis server(s). Error connecting right now. To allow this multiplexer to continue retrying until it's able to connect, use abortConnect=false in your connection string or AbortOnConnectFail=false; in your code. ConnectTimeout, command=TIME, timeout: 5000, inst: 0, qu: 2, qs: 0, aw: False, bw: SpinningDown, last-in: 0, cur-in: 0, sync-ops: 47, async-ops: 1, serverEndpoint: server06.local.dev:6129, conn-sec: n/a, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: COMPUTER_NAME(SE.Redis-v2.6.122.0), IOCP: (Busy=2,Free=998,Min=16,Max=1000), WORKER: (Busy=1,Free=32766,Min=16,Max=32767), POOL: (Threads=26,QueuedItems=0,CompletedItems=636,Timers=14), v: 2.6.122.0 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
 ---> StackExchange.Redis.RedisConnectionException: It was not possible to connect to the redis server(s). Error connecting right now. To allow this multiplexer to continue retrying until it's able to connect, use abortConnect=false in your connection string or AbortOnConnectFail=false; in your code. ConnectTimeout
   --- End of inner exception stack trace ---
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in C:\app\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 2094
   at StackExchange.Redis.RedisServer.Time(CommandFlags flags) in C:\app\src\StackExchange.Redis\RedisServer.cs:line 623
   at Hangfire.Redis.StackExchange.RedisConnection.GetUtcDateTime() in C:\app\src\Hangfire.Redis.StackExchange\RedisConnection.cs:line 112
   at Hangfire.Server.RecurringJobScheduler.<>c__DisplayClass18_0.<EnqueueNextRecurringJobs>b__0(IStorageConnection connection)
   at Hangfire.Server.RecurringJobScheduler.UseConnectionDistributedLock[T](JobStorage storage, Func`2 action)
   at Hangfire.Server.RecurringJobScheduler.Execute(BackgroundProcessContext context)
   at Hangfire.Server.BackgroundProcessDispatcherBuilder.ExecuteProcess(Guid executionId, Object state)
   at Hangfire.Processing.BackgroundExecution.Run(Action`2 callback, Object state)

Example of subsequent exceptions ('Debug'):

2023-09-10 17:28:18.6930|Debug|Hangfire.AspNetCore.AspNetCoreLog.Log|Execution loop RecurringJobScheduler:78bb810d caught an exception and will be retried in 00:00:16|StackExchange.Redis.RedisConnectionException: The message timed out in the backlog attempting to send because no connection became available (5000ms) - Last Connection Exception: It was not possible to connect to the redis server(s). Error connecting right now. To allow this multiplexer to continue retrying until it's able to connect, use abortConnect=false in your connection string or AbortOnConnectFail=false; in your code. ConnectTimeout, command=TIME, timeout: 5000, inst: 0, qu: 2, qs: 0, aw: False, bw: SpinningDown, rs: NotStarted, ws: Idle, in: 0, last-in: 0, cur-in: 0, sync-ops: 59, async-ops: 1, serverEndpoint: server06.local.dev:6129, conn-sec: n/a, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: COMPUTER_NAME(SE.Redis-v2.6.122.0), IOCP: (Busy=0,Free=1000,Min=16,Max=1000), WORKER: (Busy=1,Free=32766,Min=16,Max=32767), POOL: (Threads=26,QueuedItems=0,CompletedItems=978,Timers=10), v: 2.6.122.0 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
 ---> StackExchange.Redis.RedisConnectionException: It was not possible to connect to the redis server(s). Error connecting right now. To allow this multiplexer to continue retrying until it's able to connect, use abortConnect=false in your connection string or AbortOnConnectFail=false; in your code. ConnectTimeout
   --- End of inner exception stack trace ---
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in C:\app\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 2094
   at StackExchange.Redis.RedisServer.Time(CommandFlags flags) in C:\app\src\StackExchange.Redis\RedisServer.cs:line 623
   at Hangfire.Redis.StackExchange.RedisConnection.GetUtcDateTime() in C:\app\src\Hangfire.Redis.StackExchange\RedisConnection.cs:line 112
   at Hangfire.Server.RecurringJobScheduler.<>c__DisplayClass18_0.<EnqueueNextRecurringJobs>b__0(IStorageConnection connection)
   at Hangfire.Server.RecurringJobScheduler.UseConnectionDistributedLock[T](JobStorage storage, Func`2 action)
   at Hangfire.Server.RecurringJobScheduler.Execute(BackgroundProcessContext context)
   at Hangfire.Server.BackgroundProcessDispatcherBuilder.ExecuteProcess(Guid executionId, Object state)
   at Hangfire.Processing.BackgroundExecution.Run(Action`2 callback, Object state)