laravel / ideas

Issues board used for Laravel internals discussions.
939 stars 28 forks source link

[Proposal] Allow queue jobs to be released back to the queue without increasing the attempt count #1381

Open matthew-muscat opened 6 years ago

matthew-muscat commented 6 years ago

This idea is related to #735

For the sake of clarity — i'm referring to this as "requeuing"

Background

Currently, when using release() within a job, the job is released back on the queue, with it's attempt count being incremented.

The idea would be to introduce a method similar to release(), with the job being released back into the queue, but it's attempt count not incrementing.

Why

Our use case is that jobs handle API interactions for an integration service, with queues providing a mechanism for a exponential backoff in case of failures (via release() calls with an increasing delay value) as well as a guarantee that the job is attempted at least X times before failing permanently. This is currently handled via release() calls when the job fails, as well as a max attempt count within the job specification (ie: via the tries property).

To improve the queue processing logic with relation to API calls, we want to be able to..

Example

Redis::throttle('key')->allow(10)->every(60)->then(function () {
    try {
        // API logic
    }
    catch (Exception) {
        // Release the job back into the queue
        return $this->release(10);
    }
}, function () {
    // Could not obtain lock...
    return $this->requeue(10);
});

Approach / Naming

Some ideas on the approach / naming of this method..

New Method added to trait InteractsWithQueue Create a new method name that is similar to release's functionality, but releases the job back into the queue without an attempt counter increase.

The method's signature will be similar to that of release(), with a single delay parameter being available.

Adjust release method to accept additional parameter Adjusts the release() method to accept a parameter that would determine if an attempt count is being considered for the job release.

ie:

/**
 * Release the job back into the queue.
 *
 * @param  int    $delay
 * @param  bool   $reattempt
 * @return void
 */
public function release($delay = 0, $reattempt = false);

Additional notes

Attempt count increments Attempt counts on jobs are incremented when they are "reserved" in the queue — this is a good practice, but does mean that we would likely need to decrease the attempt count when requeuing the job.

This is due to us only knowing if the job's attempt should be considered within the job handler logic

Attempt limit + time expiry If introduced, it would be recommended that it's usage is constrained by both a "attempt limit" and "time expiry" — ensuring the job does not remain in the queue if it's unable to obtain a lock for a period of time.

matthew-muscat commented 6 years ago

Approach

Abstract Classes, Contract and Trait Changes

Job Contract

https://github.com/laravel/framework/blob/master/src/Illuminate/Contracts/Queue/Job.php#L34

InteractsWithQueue Trait

https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/InteractsWithQueue.php#L57

Job Abstract Class

https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/Jobs/Job.php#L112

Queue Implementations

Database

Job Class https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/Jobs/DatabaseJob.php#L50

Job Record https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/Jobs/DatabaseJobRecord.php#L34

Queue Implementation https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/DatabaseQueue.php#L144

Redis

Job Class https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/Jobs/RedisJob.php#L93

Queue Implementation https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/RedisQueue.php#L274

Sync

Job Class https://github.com/laravel/framework/blob/master/src/Illuminate/Queue/Jobs/SyncJob.php#L47

Others (SQS, Beanstalkd) I'm not too sure how this will be handled in the SQS and Beanstalkd implementations.. I'll need to take a closer look at how these implement attempt counters and if they can be manipulated

mfn commented 6 years ago

but releases the job back into the queue without an attempt counter increase. … Avoid job processing being considered as an attempt when it fails due to throttling, as the api attempt was not made … If introduced, it would be recommended that it's usage is constrained by both a "attempt limit" and "time expiry"

This would add quite some complexity to a system already having lots of variables.

One hing for sure: you can't introduce such a feature without additional safe-guard limits against this, so I understand this.

But: can't you achieve the same with a high number of retries (i.e. 100) ?

matthew-muscat commented 6 years ago

This would add quite some complexity to a system already having lots of variables.

We're not really adding complexity here — as this is already available within the framework.

For example — allowing jobs to fail on both Max Attempts + Timeouts would look like the following in the job file...

<?php

namespace App\Jobs;

class ProcessItem implements ShouldQueue {

    // Limit the job to 5 attempts
    public $tries = 5;

    ....

    public function retryUntil()
    {
        // Allow the job to retry until two hours have elapsed
        return now()->addHours(2);
    }
}

But: can't you achieve the same with a high number of retries (i.e. 100) ?

In our use case, we want to avoid a situation of making 100+ api calls on retry if the api call was to fail for some reason, but still provide at least 5 actual attempts of the api call being made.

ghost commented 5 years ago

I was looking for the same thing .. for now I am deleting the job and dispatching it again inside the job.

sebastianwebb commented 5 years ago

@matthew-muscat I would love to see your suggestion added. I posted in the Laravel forum requesting advice on a workaround for this, along with my review of other solutions that aren't quite there (such as using a database queue driver or re-dispatching jobs).

I've just reread your posts in more detail, and I think you've touched on the main issue I'm currently struggling with:

need to consider differences of reserved job and payload data, if any (currently assumed to include the adjusted attempt counts)

I'm having trouble figuring out how to decrement the attempt count on my horizon/redis managed job payload. I tried creating a function for updating the the payload string in redis directly but this is getting overwritten / not not working for some reason.

public static function adjustRedisJobAttempts($change, $job_id)
{
    // job key
    $key = 'horizon:'.$job_id;

    // get the Redis job hash
    $job_meta = Redis::HGETALL($key);

    // convert payload json string to PHP array
    $payload = json_decode($job_meta['payload']);

    // increment/decrement attempts
    $payload->attempts = Helper::adjustNumber($change, $payload->attempts);

    // set new attempt number
    Redis::hmset($key, 'payload', json_encode($payload));

    // return revised attempts number
    return $payload->attempts;

}

I'm feeling around in the dark here. I'm new to Laravel. Did you manage to figure out a way to decrement the attempt? I also don't want to make tons more API calls than necessary, and I feel your solution is the only one with no down sides.

Thanks, Sebastian

StefanPrintezis commented 5 years ago

As stated earlier in the topic we would need some kind of new function that doesn't increase the attempt. Since the attempt is determined when popping the job of the queue, we would need to decrement. Decrementing conditionally, usually means we shouldn't increment automatically, but i'm not sure if that's what we want here.

In the DurationLimiterBuilder the LimiterTimeoutException is thrown if the maximum locks are reached in the decay. The closure $failure in the then() function is called, but it's actually not a failure, but a 'throttled'. The job didn't fail it just didn't run because of being throttled.

Somehow i'm thinking this functionality should be added on a Queue level. Instead of an implementation in the handle() method of a queue. It's a different approach, but now it's kind of a wacky integration.

If a Queue is throttle aware it could be able to only give a job within the limit when a getNextJob is requested.

I could make pull request, but it feels like an entirely different approach to adding an alternative to the release function. Any thoughts on the 'Throttle aware Queue' approach?

taylorotwell commented 5 years ago

I'm OK with this concept in general and would be OK with it being PR'd. As noted, you would definitely want to time constrain the re-attempts using retryUntil. I'm fine with a new method like requeue() or something I think?

shawnhind commented 5 years ago

@matthew-muscat I was looking into testing your proposed implementation by overriding the queues with my own custom ones. One problem I've come across is that the lua scripts match based on the payload. So for example the release lua script seems to delete the job and then add the same payload onto another queue. If you simply pass the $job->getPayload instead of $job->getReserved during the release process as you suggested it seems like it won't delete the reserved job. I'm having a problem where jobs are being tried multiple times still even after failing and so far my best guess is that, that is what's causing it.

I could be very wrong but it seems to me that an extra lua script call would be needed to specifically requeue instead of release as well.

judgej commented 4 years ago

With Laravel 6 job middleware, it makes sense for the job attempts count to be incremented only when the job actually runs. Rate limiting that is implemented in the middleware should not count up the attempts if it decides no attempt is to be performed. However, I understand the count increment is performed when the job is popped from the queue, so the job count has already increased before the middleware even decides whether the job can run.

Is that how it works? Is the middleware considered to be a part of the job, rather than a precursor to the job? Or have I got my assumptions wrong?

shawnhind commented 4 years ago

@judgej Good question. I'd also like to know this. Did you end up finding this out?

After thinking about this more, I'm fairly sure that the same problem will still exist with the middleware. The attempts are increased when the job is marked as reserved. I think the job would still have to be marked as reserved even when the middleware is running so nothing else is also going to run it. This means that even if it never makes it through the middleware the attempts would still be increased unless something decrements it afterwards.

judgej commented 4 years ago

I have not done any experiments to find out yet. Possibly if pushing the job back onto the queue in middleware, it is possible to reset or decrement the run count explicitly? The run count is there somewhere in the job object, so hopefully there is a way to access it. Maybe a decorator could do that for now, while seeing if this will work. Don't know, and won't have any time to find out for some weeks yet.