Avoid duplicate job registration on distributed systems.

romanzipp / Laravel-Queue-Monitor

Monitoring Laravel Jobs with your Database

https://packagist.org/packages/romanzipp/laravel-queue-monitor

MIT License

693 stars 91 forks source link

Avoid duplicate job registration on distributed systems. #68

Closed juanparati closed 1 year ago

juanparati commented 3 years ago

https://github.com/romanzipp/Laravel-Queue-Monitor/issues/48

It seems that when jobs are distributed across multiple Laravel instances the Laravel queue monitor register the same job 2 or more times, but because only the last job is executed the older one keeps in the "running" state forever without any cause of error or timeout.

In order to avoid concurrency issues Laravel has implemented the "withoutOverlapping" and "onOneServer" methods, that will avoid to run a job more than one time per schedule and it works pretty well when the right cache is used (Like Redis), however the Laravel Queue Monitor can register more than one time the same job.

In the example attached to this pull request you can find a screenshot of the queue_monitor that show a job registered two times. If you observe the "started_at_exact" same job was registered with some milliseconds of difference.

Screenshot 2021-05-20 at 14 28 07

My pull request will delete the duplicate jobs avoiding to keep the jobs on running state.

Oxicode commented 3 years ago

Good

juanparati commented 3 years ago

@romanzipp : Hey Roman, Do you think that is feasible to merge this pull request?

romanzipp commented 3 years ago

I'm currently using this package in a production system with distributed queue workers and did observe any issues like this. Although the fix might work, it's looks like a workaround which can be solved more cleanly.

Could you share a rough system architecture overview to reproduce the issue?

juanparati commented 3 years ago

Hi Roman.

I am currently running two servers as queue workers that are connected to the same queue. I also use a Redis server as cache driver that support locks.

I enclose an architecture diagram: basic_distributed

Both servers have the same schedule for the same jobs and I commented before I don't have any concurrency collision issues because I use the "withoutOverlapping" and the "onOneServer" methods for the schedules.

juanparati commented 3 years ago

I have to say that I have 1 or 2 jobs every day that keep stuck on running state every day. Those jobs stuck jobs are diverse and I run in average around 3 jobs per minute. The concurrency level in my distributed system is high.

juanparati commented 3 years ago

I enclose another screenshot of another job duplicate.

As you can observe the oldest job dispatched is the job that was executed, and the first one was totally ignore so it keeps on running state forever.

I understand that this pull request can looks like a workaround and it may not necessary in many cases, so If you agree I can create a new option in the configuration, so when the option is equal is to "true" the code to remove the duplicate jobs is executed.

We can provide a configuration name like for example "avoid_concurrency_duplicates" and I can write a comment explaining the purpose of the configuration option.