Closed lephyrius closed 6 years ago
Noticing this too
We're seeing these too. Some more information would be great!
Hello, for those of you seeing this message, understand that Sidekiq is not running those jobs. We just got bit by this.
Can you all provide the gem version you are using, what version of Sidekiq and which version of redis? @lephyrius @tmlee @adamof @rezonant
Sidekiq 4.2.9, Sidekiq-unique-jobs 5.0.10, Sidekiq-scheduler 2.1.2, Rails 5.1.6 -- Redis is Elasticache on AWS, engine compat 3.2.4.
Version 6 won't have this problem and it will take care of the upgrade automatically.
Keep an eye out on the README and CHANGELOG for more information about the changes.
This seems to still happen on version 6 when a worker process dies -- is that expected? What's the recommended way to handle?
So there is something I still need to look into but it is hard to test. I don't think I have coverage for jobs coming from the scheduled and retry set in the form of an integration test. What that means is that I don't want to sound too sure about anything.
The way I intended the gem to work though was that when a worker process crashes and sidekiq hopefully puts the job back the lock would remain so that no other jobs will be allowed.
When the job is considered dead by sidekiq you could add a death handler to also delete the lock. See: https://github.com/mhenrixon/sidekiq-unique-jobs#cleanup-dead-locks
I do have the handler set up, and it works perfectly for jobs which generate an exception. However, whenever I stop or start workers I see many messages like:
FATAL: the unique_key: "som unique key" needs to be unlocked manually
For further context, I don't retry jobs (but I do have retry set to 0 in sidekiq, not null, which triggers the death handler appropriately on exceptions), so I'm not sure that Sidekiq puts jobs back in this scenario.
Thanks for the info @smock I'll add some coverage later today.
Just wanted to check in on this, any updates?
@smock I did add some coverage and couldn't find any problem. The error message is generated from here: https://github.com/mhenrixon/sidekiq-unique-jobs/blob/master/lib/sidekiq_unique_jobs/lock/base_lock.rb#L102
What it means is that when the server is executing the workers #perform
method it receives the exception Sidekiq::Shutdown
which means the job was not done executing when we had to abort. If the job is lost we could unlock but if the job is pushed back to the retry queue or job queue then we should keep the lock to avoid duplicates.
What I have not investigated yet is if the job is put back or not.
@mhenrixon To add some more context to this, somehow the jobs are never re-queued when Sidekiq is shutting down. Hopefully this is helpful.
It seems that my job with lock: :until_and_while_executing
doesn't get re-queued correctly if it's in the middle of processing and the server shuts down. The worker that does not have any unique options gets re-queued just fine.
Here is what I have on the MyWorkerClass.
sidekiq_options retry: true, lock: :until_and_while_executing, log_duplicate_payload: true
2018-08-13T22:52:58.274Z 1 TID-gpe208zth INFO: Shutting down
2018-08-13T22:52:58.274Z 1 TID-gpe208zth INFO: Terminating quiet workers
2018-08-13T22:52:58.274Z 1 TID-gpe2y4bwd INFO: Scheduler exiting...
2018-08-13T22:52:58.775Z 1 TID-gpe208zth INFO: Pausing to allow workers to finish...
2018-08-13T22:53:05.778Z 1 TID-gpe208zth WARN: Terminating 3 busy worker threads
2018-08-13T22:53:05.778Z 1 TID-gpe208zth WARN: Work still in progress [
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="...">,
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="...">,
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:my_other_queue", job="">]
2018-08-13T22:53:05.780Z 1 TID-gpe208zth INFO: Pushed 3 jobs back to Redis
2018-08-13T22:53:05.781Z 1 TID-gpe2y465p MyWorkerClass JID-7f038e3738663e7bee45c6e1 FATAL: the unique_key: uniquejobs:unique_digest_1:RUN needs to be unlocked manually
2018-08-13T22:53:05.783Z 1 TID-gpe2y465p MyWorkerClass JID-7f038e3738663e7bee45c6e1 INFO: fail: 45.018 sec
2018-08-13T22:53:05.783Z 1 TID-gpe2q7aul MyWorkerClass JID-eaad6b51d0870d8d8488beb5 FATAL: the unique_key: uniquejobs:unique_digest_2:RUN needs to be unlocked manually
2018-08-13T22:53:05.783Z 1 TID-gpe2q7aul MyWorkerClass JID-eaad6b51d0870d8d8488beb5 INFO: fail: 60.75 sec
2018-08-13T22:53:05.785Z 1 TID-gpe208zth INFO: Bye!
Let me know how else I can help or if you can point me in the right direction. This gem saves me a lot of headaches and am interested in seeing this fixed.
According to all that I can find the job should be pushed back on shutdown: https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/manager.rb#L128 I haven't confirmed what is happening exactly. I think it might be that the first lock needs removal but I would have to add some test cases for this.
Will let you know what I can find!
I also had these errors today, when I made some existing queues unique. We are doing a rolling-update. So, while there were still some clients (without the unique configuration) pushing jobs, workers with the unique configuration were already trying to process these jobs. Once all clients and workers were running with the same configuration, the error did not pop up anymore.
I don't know if it's related to the rolling-update having clients/workers with different configurations or to the workers shutting down.
I need to improve the logging...
It isn't an error. I am changing this to warning level because in most cases it is just a warning (sidekiq pushes the job back to the queue).
Just wanted to loop back around to this as I probably confused some of the options that are available in this gem with my specific worker requirements.
I have a specific worker that actually needed to be unique until it's executing (:until_executing
) but only allow one to execute at a time. Until executing fails my specific requirement of only one executing at time. If my application enqueues another job, I only need one of them to run. Using the strategy below, sidekiq is able to enqueue the job (even when shutting down) since the conflict strategy doesn't care if there is a job with the same unique digest in the queues.
My fix was to add a runtime lock using the redis-lock gem. The finished class looks something like this.
class HardWorker
include Sidekiq::Worker
sidekiq_options retry: true,
lock: :until_executing,
on_conflict: :replace,
log_duplicate_payload: true
def perform(params)
client = Redis.new
client.lock("my_hard_worker_lock") do |lock|
puts 'Doing hard work!'
sleep 5
end
rescue Redis::Lock::LockNotAcquired
# Try again later replacing any job in the queue
new_job_id = HardWorker.perform_in(rand(10..1.minutes), params)
puts "Could not grab the lock. Enqueued again. new_job_id=#{new_job_id}"
end
end
Hopefully I didn't confuse people more or miss something blatantly obvious.
@adeubank you don't need the extra gem. All you need to do is change your lock type to lock: :until_and_while_executing
I have started to see these messages in my log:
FATAL: the unique_key: "som unique key" needs to be unlocked manually
What does it mean? How do I unlock the job manually?