Closed graemlourens closed 10 months ago
HI there,
I added the following tests to verify the issue and was unable to replicate the problem, jobs
and failed_jobs
updated properly. We need to be able to replicate the issue in order to solve the problem.
Hey there, thanks for reporting this issue.
We'll need more info and/or code to debug this further. Can you please create a repository with the command below, commit the code that reproduces the issue as one separate commit on the main/master branch and share the repository here?
Please make sure that you have the latest version of the Laravel installer in order to run this command. Please also make sure you have both Git & the GitHub CLI tool properly set up.
laravel new bug-report --github="--public"
Do not amend and create a separate commit with your custom changes. After you've posted the repository, we'll try to reproduce the issue.
Thanks!
@crynobone Thank you for the time invested.
We'll as a next step try and reproduce this in a fresh laravel project to rule out the fact that this has something to do with our environment and get back to you asap.
@crynobone
We have successfully reproduced the mentioned issue in a fresh standalone laravel project. Attached is a ZIP file with a docker environment ready for testing
execute the command (in root directory:
docker-compose build
docker-compose up -d
docker-compose exec php bash
php artisan migrate
php artisan queue:work
open browser http://localhost
There will be one job dispatched which timesout inside transaction and one without
After queue:work dies (because of 'killed') you can go to http://localhost:8080 to verify that there is NO job logged in failed_jobs table
You can run php artisan queue:work
again to process the second job, and verify that this one WILL be logged in failed_jobs table
Thank you for your consideration & time
Can you push the repository to github as suggested above?
@graemlourens It seems that you didn't specify https://laravel.com/docs/10.x/queues#max-job-attempts-and-timeout and I don't see how it would goes to failed_jobs
without the option as it would goes back to jobs and ready for another attempt after each failure.
@crynobone We did not specify $tries as by default its 1 and therefore the setting was unnecessary, and we did set the $timeout on the job properly
However just to rule out any error on our side, we added tries=1 on the job, as well as also on the worker, and it had no influence, as suspected.
One Job is logged (the one without transaction) and one job is NOT logged (the one with transaction)
Both jobs are configured exactly the same, the only difference in them is that one has the sleep inside of a transaction and the other does not.
I'm not sure how better we can show you this bug. I highly suspect that you did not execute the steps we provided to reproduce the error or else you could confirm this behaviour by proof of one entry in the failed_jobs table.
If you reconsider to reopen this ticket, we'll be glad to push it as a repository as you requested.
@crynobone we've had further success in this matter:
Strangely the bug is fixed if you use 'Batchable' on the Job.
The reason the job is then logged correctly is because of Illuminate\Queue\Jobs\Job. In the 'fail' method there is a check on line 195:
if ($e instanceof TimeoutExceededException &&
$commandName &&
in_array(Batchable::class, class_uses_recursive($commandName))) {
$batchRepository = $this->resolve(BatchRepository::class);
In this case you're running 'rollBack' on the $batchRepository. Even if the job is not attached to a batch, this fixes the problem.
However i don't think its correct for us to have to set all jobs as batchable to resolve this issue. I still believe this should also be handled for jobs without batches.
How do you see this?
Tries doesn't have a default of 1
: https://github.com/laravel/framework/blob/047edbc3a370ccf8be58f35f2e262d59210a3852/src/Illuminate/Queue/Jobs/Job.php#L278
@crynobone ok, still however as mentioned, we set tries to 1 and it still happens.
Please also refer to my temporary solution that if you use 'Batchable' on the Job, it works, without it does not.
How do you feel about this?
I cannot replicate the issue and your reproduction code doesn't show what you are claiming is true. This is why we advised sending the repository to GitHub.
@crynobone you are confirming you downloaded our ZIP, ran the commands we provided, and you did NOT see the outcome we described? I do not see how this is possible. I can provide a screenshare movie if you prefer.
Or are you telling me that you will only reproduce it when we have pushed it to repository on github?
We'll need more info and/or code to debug this further. Can you please create a repository with the command below, commit the code that reproduces the issue as one separate commit on the main/master branch and share the repository here?
Having everything in a single commit allow us to completely eliminate any other possibility outside just the affected code instead having to verify evey files inside the zip file.
@crynobone a coworker of mine has uploaded the repo as requested: https://github.com/GrzegorzMorgas/laravel-bug-report
We were a little confused if to commit everything as ONE commit or the base commit of base setup and then a second commit with the changes required to reproduce the issue. The instructions are a little ambigous to us:
"commit the code ... as one separate commit" but then "Do not amend and create a separate commit with your custom changes"
Please let us know if what we created is ok and you can reproduce or you would like the repository in a different way
@graemlourens use the command from here: https://github.com/laravel/framework/issues/49389#issuecomment-1862206409
Then commit all custom changes separately.
@driesvints @crynobone Understood. We've done as requested:
https://github.com/GrzegorzMorgas/bug-report
execute the command (in root directory of the folder):
docker-compose build
docker-compose up -d
docker-compose exec php bash
cp .env.example .env
composer install
php artisan migrate
php artisan queue:work
open browser http://localhost
There will be one job dispatched which timesout inside transaction and one without
After queue:work dies (because of 'killed') you can go to http://localhost:8080 to verify that there is NO job logged in failed_jobs table
You can run php artisan queue:work
again to process the second job, and verify that this one WILL be logged in failed_jobs table
If you 'use Batchable' on the Job, then also the one timeouting with transaction suddenly works. We have traced this back to
if ($e instanceof TimeoutExceededException &&
$commandName &&
in_array(Batchable::class, class_uses_recursive($commandName))) {
$batchRepository = $this->resolve(BatchRepository::class);
In Illuminate\Queue\Jobs\Job
Because there is a rollback there, it seems to work. However we believe it should not be necessary to set all jobs as Batchable to get around this bug.
Thank you for your consideration & time
Thanks @graemlourens. Might be a while before we can dig into this because we're all heading into holidays.
I have reviewed this issue and it may not be an actual bug after all. This happens because of the async signal handler.
src/Illuminate/Queue/Worker.php
Line 167 registerTimeoutHandler
will kill the current process if timeout reached. At this point it will try to log the failed job in the database but since the transaction was opened, the insert command will not persist.
Here are two solutions:
public function failed(\Throwable $exception): void
function inside your job to rollback the transaction. I tested this and it works fine (failed job is logged).registerTimeoutHandler
function. I dont think this should fall under the scope of the framework though and should be handled within the job failed
function instead (point 1)@gkmk Thank you for your insight and suggestions.
The reason i would tend to shy away from option 1) is because then behaviour would be different between jobs that are batchable, and ones that are not. This would be an unnecessary inconsistency in my opinion. (See my previous comment)
On a more subjective note, we have > 100 jobs, most of them have transactions and do not require failed methods. Option 2) would add failed methods just to handle this specific case to not loose jobs, which seems to be quite an overhead.
Will be interesting to see what the core of laravel developers have to say.
is because then behaviour would be different between jobs that are batchable, and ones that are not. This would be an unnecessary inconsistency in my opinion.
Yes, this is intended because Batchable Jobs offer catch
and finally
which need to be called within the same process while normal queue typically rely on n+1 (n = number of retries) to throw error.
would add failed methods just to handle this specific case to not loose jobs, which seems to be quite an overhead.
The documentation already cover this in detail:
@crynobone @driesvints @themsaid we disagree with your assessment and rest our case. We hope that in future there will be a fix for this inconsistency which will reduce the chance of loosing jobs, which we find is crucial to be able to rely on laravel queues as a major component for larger applications.
for anybody else worried about this: Our solution will most likely be to declare all Jobs as Batchable, which seems to not have any ill-effect and even if the job is not dispatched within a batch it therefore handles the transaction correctly so that the job is logged properly in case of timeout within a transaction.
for anybody else worried about this: Our solution will most likely be to declare all Jobs as Batchable,
Read the documentation, the worker can be restarted using a process manager (which should be the default usage in any production applications anyway), or use queue:listen
(uses more CPUs if you insist to not going with process manager).
@crynobone we use laravel horizon which automatically handles that. But please be aware: If the job had $tries=1 (which most of our jobs do) the job is NOT retried, and NOT logged, and therefore lost.
Please correct my if i'm wrong. This is the whole point of this ticket - loosing jobs
Laravel Version
10.37.3
PHP Version
8.2.7
Database Driver & Version
MariaDB 10.3.27
Description
We had a 'phantom' problem that jobs that failed, were not logged to the failed_jobs table. After some investigation we found out that by mistake we were doing a heavy process within a db transaction which lead the specific job to timeout.
After testing we found out that the issue only happens if the jobs timeout during an open db transaction
I reported this in laravel/horizon but it turns out it has nothing to do with horizon but seems to be a framework issue.
Steps To Reproduce
We would expect a job in the failed_jobs table, however this is not the case, and the JobFailed event is also not dispatched