Closed azrulamir closed 5 years ago
As far as I know, it is impossible for a job being picked by 2 workers at the same time.
There could be a chance that a job takes longer than config('queue.connections.redis.retry_after')
seconds and get retried by another worker.
So my question, is horizon are fully compatible to run on horizontal scaling setup?
I don't know, but can you describe what is your setup? Does each instance have it's own redis server? If yes, how are they connected to each other?
There could be a chance that a job takes longer than config('queue.connections.redis.retry_after') seconds and get retried by another worker.
From the recent jobs section in horizon dashboard, it doesn't show any jobs that took longer times to process, all of them are running below 1 second runtime. But after viewing this issue here #158, i increased the retry_after settings to 900 but still the issue is there.
I don't know, but can you describe what is your setup? Does each instance have it's own redis server? If yes, how are they connected to each other?
Each instance are connecting to a central/single redis-server instance. I can see from Horizon dashboard that it can detect all of the running instance.
My application is a simple voting api system that collects vote requests from client and process it via laravel queue horizon. Each job requests has a validation logic inside the handle() function but with the duplicates happening, it somehow doing one single request multiple time simultaneously. Appreciate your help.
Did you set the settings correctly? You can choose at:
return 'environments' => [
'local' => [
'supervisor-1' => [
'connection' => 'redis',
'queue' => [ 'default' ],
'balance' => false,
'processes' => 3,
'tries' => 3,
],
],
],
https://laravel.com/docs/5.6/horizon
See the section Configuration
Below are my current configurations :
'production' => [ 'Voting-Engine' => [ 'connection' => 'redis', 'queue' => ['default'], 'balance' => 'false', 'processes' => 1, 'tries' => 1, ], ],
So you have multiple php servers connecting to a single redis server. Here are 2 causes that come into my mind:
Other than that, it seems to me than picking a job twice is impossible.
@azrulamir What is your retry_after
values on your config/queues.php
? If you set big number, you may need to also set timeout
in the config/horizon.php
just few seconds behind that number.
I have proposed to expose timeout
value along with middleware
on #319
Are you sure you are not simply dispatching the job multiple times, for example by events or observers?
Why don't setup the retry_after in horizon.php configuration file?
This is my configuration today:
'production' => [ 'default' => [ 'connection' => 'redis', 'queue' => ['default'], 'balance' => 'simple', 'processes' => 1, 'tries' => 3, ], 'marketplace' => [ 'connection' => 'redis', 'queue' => ['marketplace'], 'balance' => 'simple', 'processes' => 1, 'tries' => 3, ], 'autoresponder' => [ 'connection' => 'redis', 'queue' => ['autoresponder'], 'balance' => 'simple', 'processes' => 1, 'tries' => 3, ], 'clicks' => [ 'connection' => 'redis', 'queue' => ['clicks'], 'balance' => 'simple', 'processes' => 2, 'tries' => 3, ], ],
The retry_after should be configurable by queue, right?
As explained by others above the problem is probably using the same redis queue on multiple apps. It should be impossible for Horizon to process the same job twice. If you still believe this to be a problem please provide more details and preferable the configuration of your app(s).
What about having multiple worker servers that process the same queue? I guess I observed that jobs have been proccessed multiple times.
My setup: 1 webserver, 1 redis server, 3 Worker servers that process the same queue.
Maybe this is closed but since I use horizon (a week ago), time to time, some jobs are processed twice at the same time. Also I know that a screenshot it's not an evidence but I can't reproduce it (only appear time to time)
I have 2 backends with 1 worker each listening the same queue
Btw, with normal laravel workers (I've used it for more than a year) I have never had this problem.
@dpde @sergiq Are the followings helpful:
thanks @halaei for your answer, but:
Also never happened with the normal laravel workers (maybe in more than 1 year).
@sergiq Can you explain your screenshot? What are the rows and columns and how does it show jobs are picked twice?
The screenshot is small part of a job information (uuid). this picture shows when the job is executed (the first field is just an auto increment and the last one is who performs the action).
At this moment I removed horizon and started using again the laravel queue in production environment. I gonna investigate more deeper what's going on in the code and why.
Horizon redis queue driver is just a wrapper for laravle redis queue driver that actually doesn't touch queues but fire some events for monitoring. It also changes the job IDs to incrementing integers. I don't see how possibly these changes can cause jobs being handled twice. To me the screenshot doesn't reject 2 jobs being simultaneously handled by 2 workers. It would be a lot helpful if you could provide more information.
Hi @halaei thanks for your response :) As I mentioned, a team gonna set up the same environment in dev as we have in production and try to detect when and why. I changed this morning the workers in production to use the laravel's one and everything is working ok (but it's to early to assume it). Let's see if have the same behavior that have in the last year with 0 duplicateds
This is almost definitely happening to me but only on my Homestead instance and not on my production server. I'm going to try updating Homestead, since it's only happening there.
For anyone interested, here's my list of symptoms:
if (! Storage::exists($folder) && ! Storage::makeDirectory($folder))
line that results in a "mkdir(): File exists" error. That's a hell of a race condition even if two jobs are being processed at the same time, but I get it about 10 times every night while overnight jobs are processing.I know this isn't the best way to "prove" that two workers are picking up the same job at the same time, but there's not a lot else that makes any kind of sense.
As stated above, this is only happening on my Homestead, and it only started about 2 weeks ago after I upgraded from Laravel 5.6 to Laravel 5.8 (and Horizon from 2.x to 3.2.x). My Homestead and production servers are both running off the same composer lock file with Laravel 5.8.18 and Horizon 3.2.1. My Homestead is a little older, so I am going to attempt to upgrade that.
Updating Homestead seems to have corrected the issue for me, whatever it was.
I'm also seeing this issue on my production environment. The queue jobs have additional checks in place to ensure the job isn't processed more than once, but because 2 workers have picked up the same job at the same time, the database record to ensure the job doesn't get processed twice isn't obeyed because it hasn't yet been written by either job. This isn't a new issue and has been happening for a long time.
Horizon v5.7.3
Here's some of my queue config:
'balance' => 'auto',
'processes' => 7,
'tries' => 0, // indefinitely
// 5 seconds under the queue's retry_after to avoid overlap
'timeout' => (Carbon::SECONDS_PER_MINUTE * 10) - 10, // Just under 10 mins
@bkuhl We run 300 jobs / second on our stack and have never had this issue. It highly depends on how you have setup retry_after on connection and timeout etc on your jobs.
What is your retry_after setting on the queue connection to redis? this HAS to be slighly longer than the longest running job on the queue and has to be also slighly longer than the timeout that is set.
Our queues operate in bursts and may be loaded with 500k jobs with 11 concurrent workers. We only experience this issue when the queue gets initially loaded with jobs. Once there's plenty of jobs on the queue it doesn't happen. Average execution time on the jobs being duplicated is 2-3 seconds.
The connection's retry_after
is set to 10 mins atm, which is 10 seconds longer than the timeout.
@bkuhl how sure are you that you're not by mistake actually indeed dispatching certain jobs multiple times? are the job id's actually the same or different?
That's a good question, and one that I should have investigated more before posting here. I'll have to do some digging on that and get back to you, I'm only assuming that's the case.
@bkuhl sure, get back if you have any 'hard evidence' (which sometimes is extremely hard to obtain!). Happy to help. We do a lot of logging, for example you can register a callback with Queue::before() and log the ID oft he job somewhere in order to see if actually this is the case that workers are picking up the same job with the same id (which i doubt to be honest).
I can confirm that some "duplicate jobs" on my end have unique ID
for each job, suggesting it's not the workers picking up duplicates, but an issue on the pushing side.
If you figure out what's causing the extra pushes, please come back and let us (at least me) know what it was. I've never been able to figure it out, but I would suspect is has something to do with the communication between Laravel and Redis.
Has anyone in this thread experienced this with a different queue provider?
@bkuhl we're one step further! :) I'd also be interested to hear if and when you figure out what is causing it. We have never experienced this and have bulk-pushed up to 500'000 jobs sometimes in one go (parallel), which never led to duplication that we know of.
I'm going to remove my other comments to keep this thread clean, but I've confirmed it was a logic issue on my end :-\
@bkuhl After reading your now-deleted comment yesterday, I gotta know what the logic issue was, lol
Haha, With Laravel 8 I had upgraded to use batches for these jobs, and I was supplying some jobs when the batch was created, but also re-adding those same jobs via $batch->add()
in certain conditions, leading to a handful of jobs being added twice. Definitely a 🤦 moment, but entirely my fault.
Dangit. You were my big hope for finally diagnosing why this happens to some people. Thanks for explaining.
Ah, sorry. I'm still monitoring things for the next few days so maybe something will come to light. Here's my troubleshooting steps in case this may help folks...
I noticed the UUIDs on my jobs were unique. At first I thought it was retries, but noticed all attempts
were 1
, so I was able to rule that out. Secondly, I wondered where the UUIDs were coming from, so I ended up adding logging to where the UUID was generated here. This enabled me to confirm that multiple jobs were indeed being fired by my app's logic. To figure out when/how I modified this code to conditionally throw exceptions to expose the stack trace (could also use xdebug or more extensive logging) enabling me to see what was causing the events.
Is this issue still happening? I'm about to move my setup to a multiserver setup with multiple worker servers.
Once I run Horizon all the worker servers, how will I be able to access the /horizon
UI web page from the app servers are behind a load balancer? The worker servers don't have any access to HTTP, they just run jobs.
No, I mentioned in my last comment this was an issue in my app's logic.
No, I mentioned in my last comment this was an issue in my app's logic.
Yes I’ve read that. But how are you able to access the Horizon URL when Horizon is running only on the worker servers?
I'm happy to help, but this discussion is completely off-topic for this thread so you should post on StackOverflow.
I am running horizon on horizontal scale setup with multiple server running a same instance. I found that once in a while there are jobs that's being processed multiple times and somehow passed my constraint checking.
I suspect this could be cause by that job being processed by multiple workers at the same time, hence it manage to pass the constraint check I have on the job handle() function.
So my question, is horizon are fully compatible to run on horizontal scaling setup?