Closed crash13override closed 6 years ago
So the job is running, then marked as failed, then marked as completed?
yes, exactly! and the weird thing is that after a few minutes, they are removed from the failed jobs list. (not sure if this happens when I deploy an update and it runs horizon:terminate).
after some further testing, I noticed that the list of the failed jobs was emptied after it run the same queue job, but without errors because with a very small file.
after that, for some mysterious reason now the encoding jobs are no longer listed under the "failed jobs" section, but I can still see the icon of the job in the "recent jobs" section turning red with a cross and then finally green.
Since all the code is running just fine, it's no big deal as long as it keeps my failed jobs list empty. I'll let you know in case I manage to reproduce the error in a more precise way
I attach a few screenshots showing the job that goes from yellow to red and then green
Another weird thing is happening with failed jobs. Previously I said that the list of failed jobs was emptied, but actually it's not. I still receive the failed jobs via AJAX, but for some reason the frontend doesn't want to show them. Here you have a screenshot showing how the list is empty even though I receive a JSON full of failed jobs.
My setup is the following
Server A MySql and Redis DB server
Server B Webserver
Server C Encoding server
Server B and Server C are both running "php artisan horizon" and each one with it's own queue (default for webserver and encode for encode server). I run "php artisan horizon:screenshot" just on the webserver.
So the webserver fires event on both queues (default and encode), but listens just for the default one. The encode server just listens on the encode queue.
Am I missing something somewhere? Should my setup be different? Or could it be just a bug of Horizon?
Thanks again for your help!!!
I'm still experiencing the problem with the missing failed jobs. I tried to debug it quickly but didn't manage yet. Does this happen to other people as well or is it just me?
I'm not quite seeing that.
Report job which takes a few mins to run (does some API calls, db queries and mails an Excel file) where after a little time the job is marked failed with a MaxAttemptsExceededException and stays failed.
Yet after some time, the report pops in my Inbox.
Have moved the reporting queue back to database.
Actually no, after walking away from it for a bit, it has done exactly as you've described.
A job that was marked failed became green. The earlier one in this screenshot
failed jobs page became empty, but the dashboard reports I had a failed job
and then the second attempt failed
but it didn't populate into the failed job page clicked on the error within recent jobs
After about 5-10 mins, that second one changed to green
dashboard still flagged the failed ones that have turned green after a while, the failed jobs page remains empty
and the job did work, have the report in my Inbox
We have exactly the same problems. Have you found a solution?
We also have the same problem for all of our longtime running jobs.
I'm also having a similar problem.. might be the same. I have a lot of jobs that fail due to "A queued job has been attempted too many times or run too long. The job may have previously timed out."
Increasing the tries or timeout time has done nothing to help. And the jobs are succeeding.. IE no other errors are thrown that I can see.
I haven't found a helpful way to get a more specific output of what's causing the jobs to fail
I think it may not be horizon related. Moved the job to the database driver and see the same behaviour whereas I have very long running jobs on the database driver on a 5.1 project which are fine.
May try moving those 5.5 long running jobs to the 5.1 project
@vesper8 Same issue here. Only difference is that this works fine in my valet dev environment, but jobs fail every time when deployed to Forge.
I'm not 100% but I think it could be related to this issue in my case
After setting a higher value (1800) for config->queue->connections->redis->retry_after everything seems to be running ok and it doesn't run the jobs twice.
Still testing but so far so good...
I think that it's a bit misleading the word 'retry_after' because if it's like @denaje says in his last message, than it behaves more like a secondary "timeout" instead. So it's a bit like overriding the "timeout" value in the horizon config file.
Or am I getting this wrong perhaps @themsaid?
Please give it a try and let me know if it fixes the issue for you as well.
Thanks!
@crash13override I'm testing 1800 on retry_after too. Just to know, how much timeout you have set on your config/horizon.php?
I've set 1800 as well
ah, the penny drops.
Have created another connection in queue config with longer retry_after to use for my reports queue.
Thank you
Exactly the problem I'm having, but I couldn't figure out where to set the retry_after value. Now I know and will give this a shot!
Thanks!
I also am having this issue.
Several people with similar problems are running Windows, which does not support timeouts. Some screenshots in this issue shows unix paths, so that's not the issue. However, what version of PHP are you using? Timeouts require PHP 7.1 or newer. Can you confirm that you've met this requirement, or are you using PHP 7.0?
I notice now that I've jumped repositories. While laravel/framework will accept PHP 7.0 (in composer.json), Horizon will not.
I can confirm that I have PHP 7.1 so the issue it's not related to the PHP version, but to the "retry_after" value after all.
Please everyone make sure the retry_after value is greater than the time it takes a job to run, this is mentioned in the queue documentation already.
I'm having this same issue still on Laravel 5.7+, PHP 7.1+. I have jobs that run ~6 min, with a $timeout set to 30 min. Not using Horizon. Using SQS + Heroku + Supervisor. Job falls into my failed_jobs table, but I get an email that my report was generated successfully right around that time. It seems to think it exceeded timeout or max tries, but the jobs finished without exceptions.
Is there another issue like this open right now? Can this one be re-opened if necessary?
@zlanich
@sisve I apologize for the post in a laravel/horizon thread, but this was the only thread I could find on the internet where someone else was having this same issue. After looking at the retry_after value, I recall that SQS does not support a retry_after value, so I'm not sure what to do here.
It does not make any sense to me why Laravel/SQS would be allowing/attempting a retry if the job is still running/etc. I'm not sure how you would handle long-running jobs with Laravel/SQS under these circumstances.
If anyone can help, I'd be hugely appreciative, as this application runs our city's entire mobile parking infrastructure! Also, if anyone knows of a non-horizon thread that I missed with someone else having this same issue, please let me know!
Thanks again.
@sisve
Several people with similar problems are running Windows, which does not support timeouts.
Does this only apply to Horizon, or to any job queues in standalone Laravel too? I looked at the Laravel queue documentation and I don't see anything about this limitation on Windows, but I am seeing a related problem on my Windows server.
Thanks.
I was able to adjust my Amazon SQS Visibility Timeout to fix this issue, since the retry_after option is not supported for SQS. This isn't ideal, but it did solve my issue (for all intents & purposes). I feel like Laravel Core should do some sort of coalesce on the retry_after and timeout so it doesn't do funky stuff like this. Am I crazy?
@JeremyHargis Timeouts require the pcntl extension for php. This extension isn't available on Windows. (This also implies that timeouts will not work on a *nix-system that uses php without pcntl either.)
This applies to Laravel's queue system and isn't Horizon specific.
Yeah, I've spent an hour trying to debug why my job fails, until I found this issue. I think this have to be highlighted in the documentation next to each timeout mention.
I had the same issue with too low retry_after
Thank you for saving my life ... beer on me ;)
Thanks a lot @themsaid ,
It's much more clear now in the docs and it will help a lot of people to avoid running in the same problem for sure!
Thanks again and keep up with the awesome work you've been doing on Laravel!
even though timeout is smaller than retry_after, once a job failed, subsequent jobs are immediately failed after very short life span. 0secs. I use horizon with supervisor. Basically it is an endless loop, all new jobs which scheduled at the regular interval, are failed right away with the message "has been attempted too many times or run too long. The job may have previously timed out."
manually triggered jobs are running properly
Same here, the solution from the docs (about having more of retry_after than timeout) didn't help.
No exception is thrown, when I manually run the job through tinker it's lightning fast and successful.
It would have saved tons of time for us if laravel would have added an assertion of retry_after >= timeout
. Is there a use case where timeout
can be higher than retry_after
?
It would have saved tons of time for us if laravel would have added an assertion of
retry_after >= timeout
. Is there a use case wheretimeout
can be higher thanretry_after
?
Were you able to resolve this @anstap?
It would have saved tons of time for us if laravel would have added an assertion of
retry_after >= timeout
. Is there a use case wheretimeout
can be higher thanretry_after
?Were you able to resolve this @anstap?
In our case there were tons of jobs accessing same table reading/writing so when I executed jobs manually those always worked but when a big batch entered the queue it still failed. In the end we've decided to remove the moderation code which was checking records before insert and simply use upsert
in combination with unique key constraints. That removes the need for each job to check the data validity. It's still not ideal for us because we've sacrificed a small feature but In case you have same issue I don't think there's much you can do.
I have a few queue jobs that needs some time to be run because they are encoding videos. They typically last a couple minutes.
When I fire the job, it runs correctly and I can see that it's being executed in the "Recent Jobs" section of Horizon. I can see it running, then it gets a red cross icon as if it failed, and then immediately after it becomes green marking it as completed. The problem is that I can see it also in the "Failed" section, where it says that the job has failed throwing a "Illuminate\Queue\MaxAttemptsExceededException".
At the moment I have 2 processes for the queue. If I set it to just 1 process, then the problem is not happening anymore.
I also tried setting the "timeout" property for the job to 1800 seconds, but it looks like it doesn't care.
Could it be that if I have 2 processes the second one tries to run the job even if the first one is still running it? Is there something I need to consider for especially long jobs? I have another queue with 10 processes running small tasks and that is not giving any problem at all.
Thanks a lot for your help and congrats for the amazing library!!!