marcoCasamento / Hangfire.Redis.StackExchange

HangFire Redis storage based on original (and now unsupported) Hangfire.Redis but using lovely StackExchange.Redis client
Other
452 stars 108 forks source link

StackExchange.Redis.RedisTimeoutException - Hangfire.Redis.RedisLock.ExpirationTimerTick #88

Closed sergiocharpineljr closed 3 years ago

sergiocharpineljr commented 4 years ago

I'm getting timeouts with latest version of Hangfire.Redis.StackExchange and Hangfire 1.8:

Message: Timeout performing UNWATCH (30000ms), next: EXISTS {hangfire}:job:93df8b81-c97d-42ea-ace1-7d1c435a0e33, inst: 0, qu: 0, qs: 1305, aw: True, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 172, serverEndpoint: Unspecified/MYREDISSERVER:6379, mgr: 10 of 10 available, clientName: MYSERVER, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=1,Free=32766,Min=100,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I've investigated network problems, server or client hangs, CPU load, number of min threads, but couldn't find any clues.

The problem happens mostly when there are simultaneous scheduled jobs running. But it also seems to happen with hangfire internal validations, even when there isn't any job running.

Another thing I've noticed is that the majority of them have the same stack trace. So maybe something with the locking mechanism?

StackTrace: at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisTransaction.Execute(CommandFlags flags) at StackExchange.Redis.RedisDatabase.LockExtend(RedisKey key, RedisValue value, TimeSpan expiry, CommandFlags flags) at Hangfire.Redis.RedisLock.ExpirationTimerTick(Object state) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.TimerQueueTimer.CallCallback() at System.Threading.TimerQueueTimer.Fire()

liyuan0420 commented 4 years ago

这个问题解决了吗?我也遇到这个问题

marcoCasamento commented 4 years ago

english please. I don't speak 中国 I've never been able to reproduce it, maybe a slowlog dump could help

marcoCasamento commented 4 years ago

Yeah, is probably that.

I’m going to close the issue, it’s not a issue without a minimal repro

Best

From: Vadzim Adamtsevich notifications@github.com Sent: martedì 7 aprile 2020 22:49 To: marcoCasamento/Hangfire.Redis.StackExchange Hangfire.Redis.StackExchange@noreply.github.com Cc: Marco Casamento marco.casamento@gmail.com; Comment comment@noreply.github.com Subject: Re: [marcoCasamento/Hangfire.Redis.StackExchange] StackExchange.Redis.RedisTimeoutException - Hangfire.Redis.RedisLock.ExpirationTimerTick (#88)

what do you think about? StackExchange/StackExchange.Redis#508 https://github.com/StackExchange/StackExchange.Redis/issues/508

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/marcoCasamento/Hangfire.Redis.StackExchange/issues/88#issuecomment-610612686 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AABW2Y2NQI35RX3RM6RMGRDRLOGTBANCNFSM4IQLJFYA . https://github.com/notifications/beacon/AABW2Y4GLIXO5MIFUERN77LRLOGTBA5CNFSM4IQLJFYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOERSTLTQ.gif

coolkev commented 3 years ago

@marcoCasamento I have experienced a timeout in this same ExpirationTimerTick method. I don't see anything out of the ordinary that would be causing a timeout. However if a timeout does in fact occur, it takes down the entire app pool because it is an unhandled exception inside of Timer callback (which is running on a separate thread). I dont really see a gracefully way to recover from such an exception, perhaps a couple retries before logging the exception, and then silently ignoring the exception so the app pool does not crash completely.

I'd be happy to submit a PR if you think this would be a good solution.

oguzhantopcu commented 3 years ago

I have same problem with @coolkev, too. Whenever a timeout happens in ExpirationTimerTick method, it takes down the whole application.

IMHO a slow running redis instance should not crash the application. My app crashes 1-2 times per week because of this issue.

oguzhantopcu commented 3 years ago

After redis 6 update, my app started to crash everyday because of this issue.

NepPure commented 3 years ago

The same problem. Down the entire app pool because it.

FATAL 2021-02-04 03:05:47,645 [1113 ] 
StackExchange.Redis.RedisTimeoutException: The timeout was reached before the message could be written to the output buffer, and it was not sent, command=EXEC, timeout: 30000, outbound: 0KiB, inbound: 0KiB, next: RPOPLPUSH {hangfire}:queue:shangqiu, inst: 0, qu: 0, qs: 166, aw: True, rs: ReadAsync, ws: Flushed, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: 127.0.0.1:6379, mgr: 10 of 10 available, clientName: WIN-52MDFIBLVGU, IOCP: (Busy=0,Free=1000,Min=300,Max=1000), WORKER: (Busy=19,Free=32748,Min=300,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in C:\projects\stackexchange-redis\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 2232
   at StackExchange.Redis.RedisTransaction.Execute(CommandFlags flags) in C:\projects\stackexchange-redis\src\StackExchange.Redis\RedisTransaction.cs:line 53
   at StackExchange.Redis.RedisDatabase.LockExtend(RedisKey key, RedisValue value, TimeSpan expiry, CommandFlags flags) in C:\projects\stackexchange-redis\src\StackExchange.Redis\RedisDatabase.cs:line 1060
   at Hangfire.Redis.RedisLock.ExpirationTimerTick(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()
marcoCasamento commented 3 years ago

That error basically means that it was not possible to get a response from Redis in the configured timeout. I'd questioning what's the point of using Hangfire with Redis if it so slow, but let's move on.

I don't see an easy way to keep the required Lock mechanism and resolve the exception. Swallowing and logging it can lead to hard to debug issues, especially with job continuation and job batches (and whatever code runs "inside" the lock).

I'd rather try to increase the timeout. One point where there is an hardcoded timeSpan is the FetchLockTimeOut. You could try to change: //yield return new FetchedJobsWatcher(this, _options.InvisibilityTimeout); yield return new FetchedJobsWatcher(this, _options.InvisibilityTimeout, new FetchedJobsWatcherOptions() { FetchedLockTimeout = TimeSpan.FromMinutes(5)}); in RedisStorage

But most of the calls that create the Lock object are inside Hangfire main repository or even directly in your code. Just to name that ones that I saw: The DisableConcurrentExecutionAttribute constructor take a timeoutInSeconds parameter that in turn will request a Lock with that timeout. The RecurringJobManager request a lock with a TimeSpan of 15 seconds that appears to be hardcoded. The DelayedJobScheduler request a lock with a TimeSpan of 1 minute that appears to be hardcoded The RecurrnigJobScheduler request a lock with a TimeSpan of 1 minute that appears to be hardcoded

So... as a form of an hack you could try to perhaps multiply by an arbitrary factor the TimeSpan inside the AcquireDistributedLock in RedisConnection

You'll probably see long times before a job get executed, but that would be another point. From my side, that specific error has never happened, but a final note: I once tried the Azure Redis Basic SKU (the most affordable one) and experienced all sort of errors. I've than switched to a simple Linux VM, even a B1 is way faster. Worth to mention that you could also run redis on your windows server 2019 VM thanks to WSL