Open ajvondrak opened 8 months ago
However, it's unclear to me what the interaction will be between two "top level" calls to these transaction commands. I.e., will the futures fire with the desired timing, or do they go out of sync in some way that may cause an issue?
I think I figured this part out by adding some Redis logging to the repro script:
Annoyingly, the log formats aren't the same between versions. But the salient point is that the Redis commands are actually executed a bit differently in Redis 4 vs Redis 5.
With Redis 4, the enqueue in the nested pipeline (from within the resque gem) gets added to the MULTI
/EXEC
already in progress that resque-scheduler is wrapping around each batch. Thus, the queue push happens inside of the transaction:
MULTI
SREM resque:timestamps:{"class":"Job","args":[],"queue":"jobs"} delayed:1703722645
SADD resque:queues jobs
RPUSH resque:queue:jobs {"class":"Job","args":[]}
LTRIM resque:delayed:1703722645 100 -1
EXEC
With Redis 5, the nested pipeline appears to fire eagerly, since it's effectively being called on the top-level like Resque.redis.pipelined { ... }
instead of on the block argument like Resque.redis.multi { |m| m.pipelined { ... } }
. Thus, the queue push happens outside of the transaction:
SADD resque:queues jobs
RPUSH resque:queue:jobs {"class":"Job","args":[]}
MULTI
SREM resque:timestamps:{"class":"Job","args":[],"queue":"jobs"} delayed:1703722657
LTRIM resque:delayed:1703722657 100 -1
EXEC
Note that the SADD
+ RPUSH
are still sent as one unit in a pipelined command, but they're still apparently sent before the MULTI
/EXEC
block, meaning that the enqueue could still go through even if the schedule bookkeeping gets rolled back.
I'm guessing that this is not the desired behavior. I'm not totally sure of the implications, but it seems fairly serious. As such, I'll amend the title to reflect the broader nature of the issue.
Is there any update on how to fix this issue? I've found that setting Redis.silence_deprecations = true
does not resolve it.
PS - using the following:
We've been observing job duplication on our app (with redis-rb 5.x), very probably related to this issue. As mentioned by @ajvondrak this seems like a serious problem indeed.
Fixing this issue would require patching both resque and resque-scheduler to pass down the transaction object, so quite invasive and complex... This would not solve issues for enqueue hooks that use redis though. Perhaps a revert of https://github.com/resque/resque-scheduler/pull/767 should be considered?
When upgrading to resque-scheduler 4.9.0 and above with redis-rb 4.x, we still get the deprecation notices related to https://github.com/resque/resque/issues/1794 and https://github.com/resque/resque-scheduler/issues/745.
Repro:
After enough poking around through stack traces, I believe I've discovered the issue, and it's a subtle interaction between the resque & resque-scheduler gems:
Resque.redis.multi
here: https://github.com/resque/resque-scheduler/blob/462e33e48a799f3bd89c305e1f6fe24996810bb5/lib/resque/scheduler.rb#L253-L259Resque::Job.create
: https://github.com/resque/resque-scheduler/blob/462e33e48a799f3bd89c305e1f6fe24996810bb5/lib/resque/scheduler.rb#L327Job.create
in such a way that we effectively make a call toResque.redis.pipelined
: https://github.com/resque/resque/blob/2f9d080ce86eb2e3f1f3d47599a21c576124c6f3/lib/resque/data_store.rb#L105-L108Copying the shape of the above into a direct repro, we're basically doing
instead of what the redis gem wants, which is
It might be hard to actually achieve the latter, though, since the code is straddling two gems. 😕
We were not getting this warning prior to resque-scheduler 4.9 since the outer
multi
was added by https://github.com/resque/resque-scheduler/pull/767 (which also causes other nested transaction issues, like in https://github.com/resque/resque-scheduler/issues/773).It seems like this still works circa redis 5.x. At least when switching the version in the above repro, I get:
I think the deprecation may just kind of be firing in a wonky way due to how redis 4.x tries to juggle state.
Circa 4.x:
Redis#multi
replaces@client
with aDeprecatedMulti
: https://github.com/redis/redis-rb/blob/0afcf1cad6f4fc74213f2b4c7a20819581c36897/lib/redis.rb#L225Redis#pipelined
replaces@client
with aDeprecatedPipeline
: https://github.com/redis/redis-rb/blob/0afcf1cad6f4fc74213f2b4c7a20819581c36897/lib/redis.rb#L174DeprecatedMulti
andDeprecatedPipeline
are defined such that any call to the underlyingPipeline::Multi
orPipeline
methods will issue deprecation warnings: https://github.com/redis/redis-rb/blob/0afcf1cad6f4fc74213f2b4c7a20819581c36897/lib/redis/pipeline.rb#L208-L236pipelined
call happens onResque.redis
, it winds up delegating to the@client
, which issues the deprecation warningCirca 5.x:
Redis#multi
will just yield a newMultiConnection
instance: https://github.com/redis/redis-rb/blob/667b0007191cb5c9ed6f3b1ea7fc97a40f8f8c93/lib/redis/commands/transactions.rb#L23-L29Redis#pipelined
will just yield a newPipelinedConnection
instance: https://github.com/redis/redis-rb/blob/935f64f18d7c37386e9b01db71683f8f3b868cd2/lib/redis.rb#L102-L108multi
/pipelined
in their respective ways: https://github.com/redis/redis-rb/blob/667b0007191cb5c9ed6f3b1ea7fc97a40f8f8c93/lib/redis/pipeline.rb#L16-L31 + https://github.com/redis/redis-rb/blob/667b0007191cb5c9ed6f3b1ea7fc97a40f8f8c93/lib/redis/pipeline.rb#L59-L61~However, it's unclear to me what the interaction will be between two "top level" calls to these transaction commands. I.e., will the futures fire with the desired timing, or do they go out of sync in some way that may cause an issue?~ EDIT: per investigation below, it seems that the "nested" pipeline actually silently fires outside of the new
MULTI
/EXEC
transaction around the batch; see https://github.com/resque/resque-scheduler/issues/787#issuecomment-1870717786 for details.