steinitzu / celery-singleton

Seamlessly prevent duplicate executions of celery tasks
MIT License
237 stars 36 forks source link

Automatic lock release - fixes #21 #22

Open tbrien opened 4 years ago

tbrien commented 4 years ago

Add a configuration for allowing automatic lock release for a task. Decision to release a lock is based on celery worker inspection.

tbrien commented 4 years ago

Thanks for your precious guidelines ! The PR was updated and now apply every suggested changes of previous review (unicity of a task is now based on its lock ID, retrieved from backend). The behaviour was also updated for taking into account scheduled tasks (only active tasks where scanned before) and task scanning (active and scheduled) is only made if a lock is found.

Here is a sum up of the new option : This new option, if set for a given task will check for every submission of this task if a lock is present in the backend. If true, the Celery control API is used for scanning workers for active or scheduled tasks matching the lock ID. If no matching task is found at the moment of task submission, the lock is removed.

Note that there are some edge cases, as you suggested @steinitzu : this may lead to multiple tasks running if a task submitted removes a lock and another task is scheduled before the worker state is updated (task does not appear in scheduled or active tasks). It could also lead to performance issues if a large amount of tasks are running or scheduled and scanned regularly.

This new behaviour was useful for long running tasks, scheduled at relatively low frequency (every 1-15 minutes), in an environnement where workers are ungracefully restarted for exploitation needs. This may not suite every needs.

steinitzu commented 4 years ago

It still seems strange to me to only cover scheduled (ETA) and active tasks.

Cause far as I understand, if your workers are busy and you're queuing up new tasks without scheduling this will always clear the lock and you can add infinite duplicate tasks to the queue. I know if you want to inspect the queue you have to poll the broker, is there some broker agnostic way to do that?

Otherwise I can live with some occasional race conditions and performance issues with this option as long as they're documented and users are aware of them.

tbrien commented 4 years ago

To my knowledge, there is no way to list all submitted tasks in a broker agnostic way.

I updated the documentation with performances consideration and a warning of the incompatibility of the challenge_lock_on_startup in an environnement where tasks marked with that flag are queued while no worker is available for reserving/running them.

Also, the challenge now scans for reserved tasks and the logic is now described in the README

--- here are some details about worker inspection --- after some digging in celery documentation for inspecting workers, there is also a way to retrieve task that have been received but not yet processed. Only a few tasks seems to be "reserved" by a worker. In a local test with 31 tasks queued (running 10s each), at worker startup (with a concurrency=1), 1 task became active and 5 where reserved. 25 tasks where only known to the broker.