mhenrixon / sidekiq-unique-jobs

Prevents duplicate Sidekiq jobs
MIT License
1.47k stars 279 forks source link

Unique job needs to be unlocked manually? #261

Closed lephyrius closed 6 years ago

lephyrius commented 6 years ago

I have started to see these messages in my log: FATAL: the unique_key: "som unique key" needs to be unlocked manually

What does it mean? How do I unlock the job manually?

tmlee commented 6 years ago

Noticing this too

adamof commented 6 years ago

We're seeing these too. Some more information would be great!

rezonant commented 6 years ago

Hello, for those of you seeing this message, understand that Sidekiq is not running those jobs. We just got bit by this.

mhenrixon commented 6 years ago

Can you all provide the gem version you are using, what version of Sidekiq and which version of redis? @lephyrius @tmlee @adamof @rezonant

rezonant commented 6 years ago

Sidekiq 4.2.9, Sidekiq-unique-jobs 5.0.10, Sidekiq-scheduler 2.1.2, Rails 5.1.6 -- Redis is Elasticache on AWS, engine compat 3.2.4.

mhenrixon commented 6 years ago

Version 6 won't have this problem and it will take care of the upgrade automatically.

Keep an eye out on the README and CHANGELOG for more information about the changes.

smock commented 6 years ago

This seems to still happen on version 6 when a worker process dies -- is that expected? What's the recommended way to handle?

mhenrixon commented 6 years ago

So there is something I still need to look into but it is hard to test. I don't think I have coverage for jobs coming from the scheduled and retry set in the form of an integration test. What that means is that I don't want to sound too sure about anything.

The way I intended the gem to work though was that when a worker process crashes and sidekiq hopefully puts the job back the lock would remain so that no other jobs will be allowed.

When the job is considered dead by sidekiq you could add a death handler to also delete the lock. See: https://github.com/mhenrixon/sidekiq-unique-jobs#cleanup-dead-locks

smock commented 6 years ago

I do have the handler set up, and it works perfectly for jobs which generate an exception. However, whenever I stop or start workers I see many messages like:

FATAL: the unique_key: "som unique key" needs to be unlocked manually

For further context, I don't retry jobs (but I do have retry set to 0 in sidekiq, not null, which triggers the death handler appropriately on exceptions), so I'm not sure that Sidekiq puts jobs back in this scenario.

mhenrixon commented 6 years ago

Thanks for the info @smock I'll add some coverage later today.

smock commented 6 years ago

Just wanted to check in on this, any updates?

mhenrixon commented 6 years ago

@smock I did add some coverage and couldn't find any problem. The error message is generated from here: https://github.com/mhenrixon/sidekiq-unique-jobs/blob/master/lib/sidekiq_unique_jobs/lock/base_lock.rb#L102

What it means is that when the server is executing the workers #perform method it receives the exception Sidekiq::Shutdown which means the job was not done executing when we had to abort. If the job is lost we could unlock but if the job is pushed back to the retry queue or job queue then we should keep the lock to avoid duplicates.

What I have not investigated yet is if the job is put back or not.

adeubank commented 6 years ago

@mhenrixon To add some more context to this, somehow the jobs are never re-queued when Sidekiq is shutting down. Hopefully this is helpful.

It seems that my job with lock: :until_and_while_executing doesn't get re-queued correctly if it's in the middle of processing and the server shuts down. The worker that does not have any unique options gets re-queued just fine.

Here is what I have on the MyWorkerClass.

sidekiq_options retry: true, lock: :until_and_while_executing, log_duplicate_payload: true
2018-08-13T22:52:58.274Z 1 TID-gpe208zth INFO: Shutting down
2018-08-13T22:52:58.274Z 1 TID-gpe208zth INFO: Terminating quiet workers
2018-08-13T22:52:58.274Z 1 TID-gpe2y4bwd INFO: Scheduler exiting...
2018-08-13T22:52:58.775Z 1 TID-gpe208zth INFO: Pausing to allow workers to finish...
2018-08-13T22:53:05.778Z 1 TID-gpe208zth WARN: Terminating 3 busy worker threads
2018-08-13T22:53:05.778Z 1 TID-gpe208zth WARN: Work still in progress [
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="...">,
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="...">,
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:my_other_queue", job="">]
2018-08-13T22:53:05.780Z 1 TID-gpe208zth INFO: Pushed 3 jobs back to Redis
2018-08-13T22:53:05.781Z 1 TID-gpe2y465p MyWorkerClass JID-7f038e3738663e7bee45c6e1 FATAL: the unique_key: uniquejobs:unique_digest_1:RUN needs to be unlocked manually
2018-08-13T22:53:05.783Z 1 TID-gpe2y465p MyWorkerClass JID-7f038e3738663e7bee45c6e1 INFO: fail: 45.018 sec
2018-08-13T22:53:05.783Z 1 TID-gpe2q7aul MyWorkerClass JID-eaad6b51d0870d8d8488beb5 FATAL: the unique_key: uniquejobs:unique_digest_2:RUN needs to be unlocked manually
2018-08-13T22:53:05.783Z 1 TID-gpe2q7aul MyWorkerClass JID-eaad6b51d0870d8d8488beb5 INFO: fail: 60.75 sec
2018-08-13T22:53:05.785Z 1 TID-gpe208zth INFO: Bye!

Let me know how else I can help or if you can point me in the right direction. This gem saves me a lot of headaches and am interested in seeing this fixed.

mhenrixon commented 6 years ago

According to all that I can find the job should be pushed back on shutdown: https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/manager.rb#L128 I haven't confirmed what is happening exactly. I think it might be that the first lock needs removal but I would have to add some test cases for this.

Will let you know what I can find!

tsauerwein commented 6 years ago

I also had these errors today, when I made some existing queues unique. We are doing a rolling-update. So, while there were still some clients (without the unique configuration) pushing jobs, workers with the unique configuration were already trying to process these jobs. Once all clients and workers were running with the same configuration, the error did not pop up anymore.

I don't know if it's related to the rolling-update having clients/workers with different configurations or to the workers shutting down.

mhenrixon commented 6 years ago

I need to improve the logging...

mhenrixon commented 6 years ago

It isn't an error. I am changing this to warning level because in most cases it is just a warning (sidekiq pushes the job back to the queue).

adeubank commented 6 years ago

Just wanted to loop back around to this as I probably confused some of the options that are available in this gem with my specific worker requirements.

I have a specific worker that actually needed to be unique until it's executing (:until_executing) but only allow one to execute at a time. Until executing fails my specific requirement of only one executing at time. If my application enqueues another job, I only need one of them to run. Using the strategy below, sidekiq is able to enqueue the job (even when shutting down) since the conflict strategy doesn't care if there is a job with the same unique digest in the queues.

My fix was to add a runtime lock using the redis-lock gem. The finished class looks something like this.


class HardWorker
  include Sidekiq::Worker
  sidekiq_options retry:                 true,
                  lock:                  :until_executing,
                  on_conflict:           :replace,
                  log_duplicate_payload: true

  def perform(params)
    client = Redis.new
    client.lock("my_hard_worker_lock") do |lock|
      puts 'Doing hard work!'
      sleep 5
    end
  rescue Redis::Lock::LockNotAcquired
    # Try again later replacing any job in the queue
    new_job_id = HardWorker.perform_in(rand(10..1.minutes), params)
    puts "Could not grab the lock. Enqueued again. new_job_id=#{new_job_id}"
  end
end

Hopefully I didn't confuse people more or miss something blatantly obvious.

mhenrixon commented 6 years ago

@adeubank you don't need the extra gem. All you need to do is change your lock type to lock: :until_and_while_executing