Closed pyueli closed 1 month ago
Can you give me a sampling of the actual commands.? Seeing the key names should tell us the root cause.
Below is a sample of the lmove command. But they are triggered from job_fetch. LMOVE queue:xxxx... queue:sq|xxxxx... RIGHT LEFT It seems there are massive numbers of lmove commands triggered by some other sidekiq components. We are trying to find out the root cause.
Yep, that’s super fetch. Your LMOVE command count will scale linearly with the number of queues you have, which is why I recommend only using a handful of queues per process. How many named queues do you have?
The command I pasted LMOVE queue:xxxx... queue:sq|xxxxx... RIGHT LEFT
is from super fetch. That is valid. But the problem is we found a huge difference between the number of job_fetch and the number of LMOVE. There must be some other components making tons of LMOVE operations like 30k per second. We only have three queues and we run one process per queue.
I don’t know what “job_fetch” is.
Superfetch is the only code in Sidekiq and Pro which use lmove. Enterprise rate limiting uses it too but you aren’t on Ent. Do you have any other plugins?
Superfetch does use lmove when recovering jobs, not just for fetching. Do you suddenly see thousands of jobs appearing in your queues?
job_fetch is datadog span name around super fetch. Below is the stack trace.
from /usr/local/bundle/ruby/3.2.0/gems/sidekiq-pro-7.2.0/lib/sidekiq/pro/super_fetch.rb:300:in `retrieve_work'
from /usr/local/bundle/ruby/3.2.0/gems/sidekiq-7.2.2/lib/sidekiq/processor.rb:87:in `get_one'
from /usr/local/bundle/ruby/3.2.0/gems/sidekiq-7.2.2/lib/sidekiq/processor.rb:99:in `fetch'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/contrib/sidekiq/server_internal_tracer/job_fetch.rb:26:in `block in fetch'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/trace_operation.rb:192:in `block in measure'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/span_operation.rb:150:in `measure'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/trace_operation.rb:192:in `measure'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/tracer.rb:380:in `start_span'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/tracer.rb:160:in `block in trace'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/context.rb:43:in `activate!'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/tracer.rb:159:in `trace'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing.rb:16:in `trace'
from /usr/local/bundle/ruby/3.2.0/gems/ddtrace-1.10.1/lib/datadog/tracing/contrib/sidekiq/server_internal_tracer/job_fetch.rb:13:in `fetch'
from /usr/local/bundle/ruby/3.2.0/gems/sidekiq-7.2.2/lib/sidekiq/processor.rb:81:in `process_one'
from /usr/local/bundle/ruby/3.2.0/gems/sidekiq-7.2.2/lib/sidekiq/processor.rb:72:in `run'
from /usr/local/bundle/ruby/3.2.0/gems/sidekiq-7.2.2/lib/sidekiq/component.rb:10:in `watchdog'
from /usr/local/bundle/ruby/3.2.0/gems/sidekiq-7.2.2/lib/sidekiq/component.rb:19:in `block in safe_thread'
We don't see thousands of jobs appearing suddenly. The high number of lmove started right after we upgraded to sidekiq 7.2 We are using both Pro and Enterprise. But the service doesn't use rate limiting.
Just to confirm with you. We only have three places in Sidekiq, which calls LMOVE.
Correct. Only Ent's concurrent limiter uses LMOVE.
I believe 7.2.0 is when I dropped support for earlier Redis versions and migrated from RPOPLPUSH to LMOVE. Why aren't you using 7.2.4, maybe this was already fixed?
EDIT: Sorry, I realized you were talking about Pro 7.2.0, I assume with Sidekiq 7.2.4.
One question regarding one of your previous comment:
Your LMOVE command count will scale linearly with the number of queues you have, which is why I recommend only using a handful of queues per process. How many named queues do you have?
Let's say my process has [q1, q2, q3]. Will Sidekiq keep fetching jobs from queues without sleep even if the queues are empty? If that is the case, [q1, q2, q3, q4, q5] will have the same number of LMOVE as [q1, q2, q3] since the Sidekiq keeps running LMOVE to fetch jobs continuously. Why does the number of queues matter?
Also did Sidekiq 7 make some improvement to make super fetch faster compared with Sidekiq 6?
Close this issue. It turns out to be a service issue.
Ruby version: 3.2 Rails version: 7.1 Sidekiq / Pro / Enterprise version(s): Sidekiq Pro 7.2.0
We see massive number of Redis lmove operations after upgrading to Sidekiq 7.2. One of our services makes 30K lmove Redis requests per second while the sidekiq job_fetch is only 460 requests per second. The Redis DB is used by Sidekiq exclusively.
Understand some of lmove operations are valid since they are triggered by the retrieve_work method of super_fetch. But it cannot explain the huge difference between the numbers of lmove and job_fetch. Any idea on what could be the root cause of the massive number of lmove operations?
Thx!