laravel / ideas

Issues board used for Laravel internals discussions.
939 stars 28 forks source link

Bulk queue: handle jobs in batch #410

Open halaei opened 7 years ago

halaei commented 7 years ago

See bulk queue wiki. Sometimes, it will be nice to handle queued jobs, that are possibly pushed one-by-one, in batch. Examples in my mind are:

  1. Sending events to webhook URLs via asynchronous HTTP requests: Webhook URLs are provided by users and they are not reliable and fast enough for each to have a single dedicated worker.
  2. Sending SMS to a list of contacts. SMS providers usually make it possible to send up to 100 SMS in a single HTTP request. So, if there are 60 SMS in the queue, sending them via 60 HTTP requests will be a waste.

Sample code in mind:

queue.php file:

        'batch-queues' => [
            'webhook' => [
                'connection'     => 'redis',
                'queue'          => 'webhook',
                'handler'        => BatchWebhookCaller::class,
                'max_batch_size' => 50,
            ],
            'sms' => [
                'connection'     => 'redis',
                'queue'          => 'sms',
                'handler'        => BatchSMSSender::class,
                'max_batch_size' => 100,
            ],
        ],

Batch worker classes;

class BatchWebhookCaller
{
    /**
     * @param Collection|WebhookEvent[] $events
     */
    public function handle(Collection $events)
    {
        //Send async requests via guzzle or anything similar.
        //Wait for the responses.
        //Mark events that are not successfully informed to the corresponding webhook as failed to be retried later.
    }
}

class BatchSmsSender
{
    /**
     * @param Collection|Sms[] $messages
     */
    public function handle(Collection $messages)
    {
        //send the batch of sms via a single SOAP request to the SMS provider.
    }
}

Console commands:

$ php artisan queue:batch webhook
$ php artisan queue:batch sms
tomschlick commented 7 years ago

There is nothing stopping you from doing this with the existing jobs system. You just create a new BatchSMSJob and pass it the list of contacts to send to...

halaei commented 7 years ago

@tomschlick Only if you can create a BatchSmsJob beforehand. In cases when jobs are queued one-by-one, you cannot do this. For example, if an SMS is send to a user once he/she logs in or whatever, having a high traffic application, it will be necessary to batch process multiple SMS in the queue that are possibly pushed one-by-one.

tomschlick commented 7 years ago

Ah so you're looking to batch the same type of jobs together that are not submitted to the queue together...

That's a lot more complicated of an operation as the queue would have to read all of the pending items in the queue to know which ones to batch together.

halaei commented 7 years ago

To make it simple, my suggestion is to have specific type of queues with predefined batch Handlers. So everything pushed to 'webhook' queue will be processed by BatchWebhookCaller::handle(Collection $events). The code that pushed to queue must be responsible for not pushing jobs that BatchWebhookCaller can't handle.

sisve commented 7 years ago

But can this be built generic enough to be provided as a framework functionality; instead of lets say a simple sms_to_send table that you fetch sms from yourself to send them in batches?

tomschlick commented 7 years ago

I agree with what @sisve said.

If you need to do this its better to be done with some intermediary step like storing them in the database & having a cron run every minute to throw them into the queue in batches.

halaei commented 7 years ago

@sisve Sms and webhooks was some examples. I think we can come up with a general solution that can be added to Laravel 5.5 (illuminate\queue). It should be driver based, as laravel queue currently supports multiple drivers. So maybe I prefer redis, and you like database driver and someone else go for SQS or his custom driver.

halaei commented 7 years ago

I have needed this feature. I think many others also like to have it in their framework?

tomschlick commented 7 years ago

It does present some new problems though, like what if one of the items in the batch causes an exception?

Does the whole thing fail? On retry do you exclude the one that previously failed or do you try it again?

These are just a lot easier to handle on a case by case basis (single jobs) vs a batch operation.

halaei commented 7 years ago

@tomschlick If an item causes an exception but the rest of items are successfully handled, it will be the responsibility of the batch handler to catch that exception and do something about it (mark it as failed will be an option). If any exception is raised by the batch handler, it means that all the jobs are failed. Whether or not and how and when retry the failed jobs is the responsibility of the Laravel queue system, just like it already is in the Laravel <= 5.4.

I agree the code of a batch handler is more complicated. For one thing it must iterate over some items. This will be the cost I like to pay in order to resolve performance issues and not waiting for a single slow network/IO operation when I can do 100 of them at once.

expertcoder commented 7 years ago

This suggested feature might also be useful when updating an Elastic search index. Elastic search it is better to use the batch update feature. https://www.elastic.co/blog/found-keeping-elasticsearch-in-sync#the-bulk-api-a-must-for-most-applications

fedeisas commented 7 years ago

I've been working into implementing this in my project.

I ended up creating BatchWorker that extends Worker. On the while(true) loop I store my jobs on a protected $batchJobs = [] property. When conditions are met (this is the batch size is reached or there have been enought loops without new jobs), I wrap all this Jobs into a SyncJob and execute runJob(). Is a super nasty implementation, but enough to start tinkering.

lk77 commented 7 years ago

i think it would be useful to have a function or a closure, called before handle, with a condition, like Sms::all()->count() > 100, to execute the job when needed. a delay would be used to define the checking frequency of the condition, like the schedule. it would be more generic, and the dynamic payload will be generated in the handle, from a sms table for example.

Another solution could be to update the payload of the job, to add new sms, until reached the limit

sisve commented 7 years ago

An important decision; are we batching anything on a given queue causing a batch to contain jobs of different types, or are we limiting the batching to job types?

A simple scenario; the default queue with the following content, in this order:

  1. SendSmsJob
  2. SendSmsJob
  3. SendEmailJob
  4. SendSmsJob
  5. SendEmailJob

Should this result in ...

  1. One batched job containing all five jobs?
  2. Two batched jobs, one containing three SendSmsJob and one containing 2 SendEmailJob?
  3. Four batched jobs; one contains the first two SendSmsJob, one SendEmailJob, one SendSmsJob, one SendEmailJob?
halaei commented 7 years ago

@sisve I think it will be simpler and cleaner to limit to have jobs of the same type on each queue.

fedeisas commented 7 years ago

Yes, I agree that you have to push only one type of job to a special queue that process in batch.

Also, in my handler, I can check if the payload has batch key so I can process batch records or single ones.

My use case is ElasticSearch _bulk indexing.

I added two options to my command: max-ticks-idle and batch-size. That way I can trigger processing whenever the batch-size is reached, or when a number of ticks have been reached and no new messages have been produced (ie. my batch-size is 50, but I have 49 records waiting for indexing).

cyrrill commented 4 years ago

@halaei this is now available in Laravel 8

halaei commented 4 years ago

@cyrrill no. This is actually a different concept with a similar name.

cyrrill commented 4 years ago

@halaei Ok, thanks for clarifying, I landed here from a Google search. Curious, 3 years after this ticket was opened, is there some code regarding this feature to see?

I'm looking into sending batches of message jobs, and would be interested in seeing various existing implementations.

paras-malhotra commented 4 years ago

See also https://github.com/gzigzigzeo/sidekiq-grouping