laravel / horizon

Dashboard and code-driven configuration for Laravel queues.
https://laravel.com/docs/horizon
MIT License
3.84k stars 645 forks source link

Jobs causing memory overflow do not consider $maxExceptions (and are retried endlessly if $tries=0) #1346

Closed graemlourens closed 5 months ago

graemlourens commented 10 months ago

Horizon Version

5.21.2

Laravel Version

10.29.0

PHP Version

8.2.7

Redis Driver

PhpRedis

Redis Version

6.0.1

Database Driver & Version

No response

Description

We ran into a bug which we believe did not exist in previous versions. We can not pinpoint at what point this bug appeared, and can also not see if this is an horizon issue or framework issue.

A job which causes a memory overflow, will be retried endlessly if $tries=0, despite having $maxExceptions=1 We also tried out what happens to jobs which timeout, however there it works as expected.

We'd expect a job which dies because of memory overflow to fail on second pickup by the worker, because $maxExceptions is set to 1 (and i'd argue that a memory overflow constitutes an exception, or is our assumption wrong here?)

Steps To Reproduce

Create a job with:

Set 'retry_after' to something low, so you don't have to wait endlessly until workers pick up the job again

the handle() method of the job should look like this:

ini_set('memory_limit', '60M');

$leak = [];
$ticks = 0;
while(true){
    $leak[$ticks] = file_get_contents(__FILE__);
    $ticks++;
    unset($leak[$ticks]);
}

in our stack the initial memory used by a queue worker is approximately 55MB. Maybe you'll need to adjust the memory limit to something closer to your 'initial' memory usage by the worker.

If you dispatch the job and see what horizon does, is that it will

driesvints commented 10 months ago

Thanks @graemlourens. Would appreciate any help here from a community member.

github-actions[bot] commented 10 months ago

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

graemlourens commented 10 months ago

@driesvints thank you. Could you please classify this as bug? As its easily reproduceable i guess you were also able to reproduce it by now.

driesvints commented 10 months ago

I haven’t been able to reproduce this no. I currently don’t have time to check into this.

nearueng commented 9 months ago

This one is pretty interesting and I've been wondering about why some of our horizon workers get into this loop. I recently added $tries = 1 to a few of them and it seemed to have stopped, but the ones that did not have this, are still running into this.

I've just added $tries = 1 to these other jobs too and will report back if this was indeed the bug that was causing it.

driesvints commented 8 months ago

Moved to https://github.com/laravel/framework/issues/49389

graemlourens commented 8 months ago

@driesvints You referenced a wrong ticket.

The issue here is about memory overflow not considering maxExceptions.

The issue you said it was moved to is about something else. (about timeout & jobs not being logged)

Please re-open, or move to a new ticket as these 2 things are totally separate issues.

driesvints commented 8 months ago

@graemlourens what did you mean with this then:

I reported this in laravel/horizon but it turns out it has nothing to do with horizon but seems to be a framework issue.

graemlourens commented 8 months ago

@driesvints that was https://github.com/laravel/horizon/issues/1200 which i reported over a year ago (i'm a patient person :) ) that issue i have now reported to laravel framework directly.

This one you just closed here, is still open. However it could also well be that this has nothing to do with horizon but only laravel. I'm happy to look into that, but ask you to reopen as long as it is not clear if this is a horizon or laravel issue. The fact remains this situation is unhandled.

driesvints commented 8 months ago

Ok re-opened.

driesvints commented 5 months ago

Hi @graemlourens. I'm sorry but this has been open for quite a while already with no activity and you're the only one experiencing this. That's why I'm going to close this one. If you ever figure out a solution we'd appreciate a PR, thanks.

graemlourens commented 5 months ago

@driesvints thats fine. Turns out its not a horizon issue, but in laravel framework its self and therefore an issue will have to be created there separately.

It can however can be reproduced by anyone, i guess we just stumble on these rare edge cases as we have tons of jobs flowing through our systems daily.

I'll hope to have some time and patience to create the ticket in laravel framework repo some time. However i'm not very motivated right now as my last ticket there regarding a serious queue issue was just closed without appropriate investigation which is really frustrating.

driesvints commented 5 months ago

@graemlourens to be fair, I don't think it's worth it since you're the only one experiencing this right now. It's highly unlikely we can find the time soon to deep dive into this. I know a fair amount of other apps which have tons of jobs as well going through Horizon which don't experience this at all.

graemlourens commented 5 months ago

@driesvints i understand. It would still have been nice, as so easily reproduceable, to at least receive an acknowledgment that it is indeed the case, despite maybe you not being able to deep dive into it.

An acklowledgment/reproducing the error, or indeed reporting that you are not able to reproduce the error at all, is valuable information, which i am mostly missing when i report issues in laravel or horizon.

Thank you for your time

shamil8124 commented 5 months ago

just fyi, when we were running a few million horizon jobs a day, my workers would definitely stall and not gracefully exit. This would spike CPU usage.

I had to htop + manually kill the process once every few days.

Definitely possible it was due to the nature of our app since I'm sure others are running higher workloads.

We had a few different servers running horizon processes and a single redis node.

If I run into this issue again, I'll see about creating an issue too.

Just thought I'd chime in before this one is fully closed.