Closed noahkconley closed 1 year ago
Ok, I was looking for your initializer, which looks fine. "It just disappears" is hard to debug.
A process will disappear from the Busy page if its heartbeat data expires in Redis. The heartbeat data lives for 60 seconds and the heartbeat thread refreshes it every 5 seconds. Are you using any other 3rd party Sidekiq gems or plugins? Have you tried upgrading to see if the problem is fixed in a later version?
BTW JOB_CONCURRENCY
is not something Sidekiq supports. I'm not sure what that does.
I would at least upgrade to the latest 6.5 and see if that helps.
Apologies for the confusion, I've discovered that JOB_CONCURRENCY is an env variable we pass to sidekiq, it's for the --concurrency
option. Sidekiq is initialized in our dockerfile as follows:
DB_STATEMENT_TIMEOUT=0 bundle exec sidekiq -v -c $JOB_CONCURRENCY -q default -q accounting -q billing_platform -q reports
We can definitely try upgrading and see if that helps.
We've upgraded sidekiq to 6.5.12, sidekiq-pro to 5.5.8, and sidekiq-ent to 2.5.3, but we're seeing no change in behavior, the jobs get picked up and the worker disappears. I would love to provide you with error logs and a backtrace, but they seem to be failing completely silently. You mentioned a "heartbeat" thread, is there a way to monitor that?
Sorry to hear that. I'm not sure how to debug it via GitHub comments. Can you reproduce it locally?
It seems like sidekiq is not the issue here, we're using a gem called tiktoken_ruby
which is known to have thread safety issues, so we're getting deadlocks on the worker but it's not being reported in any way. Using Sidekiq::Limiter in the job we're calling seems to do what we need it to do.
Ah ok, sounds like you are seeing deadlocks, which would cause the process to silently disappear on the Busy page. Note that the recent versions of Enterprise support a new Kubernetes health check which would detect this problem (obviously only useful if you are using k8s).
https://github.com/sidekiq/sidekiq/wiki/Kubernetes#sidekiq-enterprise
We're running into an issue with Sidekiq and job concurrency. We're using Sidekiq v 6.4.2, running on AWS Elasticache for Redis, v7. The container is running Debian v11. The pods each have 3G of memory.
When we have JOB_CONCURRENCY set to 3 or more, and all 3 threads have picked up jobs, the worker disappears from the dashboard and the job seems to have been dropped.
This doesn't happen when JOB_CONCURRENCY is set to 2 or 1. We've tried increasing the worker's memory to way more than is needed, but they still fail. At the point of failure, barely any memory has been consumed. No error messages show in our DataDog logs. Neither the container running the job or the pod have crashed or show any other signs of being unhealthy. The worker processes simply "disappear" from the sidekiq busy dashboard.
We've tried scouring the internet for solutions to this issue, most of the discussions we've seen are years old, and now we are at a loss as to how to continue debugging this issue. Any assistance would be greatly appreciated.
Ruby version: 3.1.4 Rails version: 6.1.7.6 Sidekiq version: 6.4.2 Sidekiq Pro version: 5.3.1 Sidekiq Enterprise version: 2.3.1
Please include your initializer, sidekiq.yml, and any error message with the full backtrace. config/initializers/sidekiq.rb:
If you are using an old version, have you checked the changelogs to see if your issue has been fixed in a later version? I've looked through the changelogs and don't see anything that seems relevant, but I also don't really know what I'm looking for since we have no error messages.
https://github.com/sidekiq/sidekiq/blob/main/Changes.md https://github.com/sidekiq/sidekiq/blob/main/Pro-Changes.md https://github.com/sidekiq/sidekiq/blob/main/Ent-Changes.md