laravel / framework

The Laravel Framework.
https://laravel.com
MIT License
32.22k stars 10.9k forks source link

Adding extra jobs to batch doesn't use specified queue #34296

Closed Krunch closed 4 years ago

Krunch commented 4 years ago

Description:

I have a single job (Job1) within a batch that can add extra jobs (Job2) based on some internal logic. It adds it to the same batch, but for some reason does not use the queue that I specified for the job. Let's say Job1 was added to queue "first" and I wanted to spawn a bunch of Job2 jobs in a different queue called "second" within the same batch, it would not do that. In reality, it will execute all Job2 jobs in the "first" queue. I inspected the payload of each jobs, and the proper queues are defined in the payload, but for some reason the rows being added to the mysql table reference the wrong queue in the queue column.

Steps To Reproduce:

Job1 is called this way:

Bus::batch([
    new Job1
])->finally(function (Batch $batch) {
    ...
})->name('My job')->onQueue('first')->allowFailures()->dispatch();

Job2 are added this way (Inside Job1):

$this->batch()->add([
    (new Job2)->onQueue('second')
]);

In both Job1 and Job2 I made sure to add the use Batchable; trait.

I believe this is a bug that needs to be addressed as you should be able to add new jobs to other queues within the same batch.

Other observations

If I instead try to call Job 1 this way:

Bus::batch([
    (new Job1)->onQueue('first')
])->finally(function (Batch $batch) {
    ...
})->name('My job')->allowFailures()->dispatch();

It will not in fact push Job1 on "first" queue, but on the "default" queue. Again, the payload in the database shows properly the queue being "first", but the actual queue column in the table shows "default".

taylorotwell commented 4 years ago

All jobs within a batch must run within the same queue by design. It didn't seem to make sense to allow jobs to run across queues in the same batch, which usually is handling the same overall unit of work. Do you have a specific need for this?

Krunch commented 4 years ago

Yes @taylorotwell . We are running a pull/push system where in this case, queue "first" is our pull jobs that can create any number of push jobs onto the "second" queue. This allows us to do parallel execution for both pull and push as the pull job could take a very long time to complete. Having the push jobs on a second queue allows us to push data as they are added in parallel. Also both queues are served by different number of workers. Just as an example, 1 pull job could create thousands of push jobs in our case.

I believe this is an artificial limitation that should not exist. Let the developers decide which queues they want to put their batched jobs on. I don't think there is any technical limitation in removing the 1 queue restriction other than that it was decided that way. This would also bring it in line with how chained jobs work and how each job can define which queue it wants to run on (without this limitation).

This would of fit perfectly within our use case if it let us use whatever queues we define. But with this limitation batched jobs are useless for us.

Thank you for considering our use case and I hope this change can be added.

taylorotwell commented 4 years ago

The thing is batched jobs are pushed in bulk. There is no way to bulk push to multiple queues. For example if you add 1,000 jobs to a batch we want to queue that all at once not make 1,000 calls to the queue system.

Krunch commented 4 years ago

Wouldn't it be possible to just group the jobs within a batch by queue and bulk add that way? For example, if you batch 500 jobs half going to one queue while the other half going to another that would be 2 bulk inserts.

Krunch commented 4 years ago

@taylorotwell I added a PR for the required changes (#34334). This will also minimize the number of queue system calls required for large number of jobs. I hope this change can be considered, thank you.

murrant commented 3 years ago

Hmm, we also would like this functionality. We have a batch of work that is dispatched periodically, but some of the work needs to be pinned to specific workers. We use separate queues to achieve the pinning. We also want to know when the entire batch is complete. Using one batch per worker kind of works, but it is very messy to determine when all those batches are complete.

IllyaMoskvin commented 2 years ago

We'd like this functionality, too. We do scheduled imports of paginated API data in batches using queues. We keep our imports grouped by batches, so that we can e.g. cancel the whole batch when we detect that something is majorly wrong. We'd also like to prioritize the importing of already-downloaded data over the downloading of new data.

In our application, DownloadPage adds ImportData to its batch.

For a partial work-around, we are currently doing this in DownloadPage:

$batch = $this->batch();
$batch->options['queue'] = 'high';
$batch->add(new ImportData);

Now, we can queue many DownloadPage jobs at once, and ImportData jobs will be executed as soon as they become available. Previously, in that scenario, all DownloadPage jobs would be executed first, before any ImportData jobs.

rosenD00M commented 1 year ago

All jobs within a batch must run within the same queue by design. It didn't seem to make sense to allow jobs to run across queues in the same batch, which usually is handling the same overall unit of work. Do you have a specific need for this?

@taylorotwell Not sure if this will be seen as I know this issue is a few years old and has been closed for sometime, but I have a use case for this that I think makes sense.

Lets say you have a retail company with hundreds of stores and you're building jobs to pull in all UNIQUE items the company sells, import each stores current inventory and import all their recent sales history. My current use case is as follows:

  1. JobA Import unique company items
  2. Add chained jobs B & C to the batch from within JobA
  3. JobB import inventory
  4. JobC import sales history

In my use case the batch represents the Company as a whole. I do this because I'm able to get notified when the whole company is imported (i,e, the batch has completed). The issue is that since each store could sell slightly different inventory from each other I have hundreds of JobA in the batch which means that jobs B & C aren't starting until there's less of JobA than there are workers. Because of this and the amount of data jobs B & C are inserting/updating I'm getting a lot of deadlocks. What I was looking to do was to push the initial batch with just the JobA's in it to a low priority queue and when the chained jobs B & C get added to the batch, from within JobA, they'd get dispatched, in BULK, to a higher priority queue so they can be processed as they come across rather than waiting for all the JobA jobs to finish. My hope is this would lighten the load of how many of these inserts/updates in jobs B & C would be running at one time

The thing is batched jobs are pushed in bulk. There is no way to bulk push to multiple queues. For example if you add 1,000 jobs to a batch we want to queue that all at once not make 1,000 calls to the queue system.

In my use case I wouldn't expect each individual job to be added to their respective queues, but something like below where It pushes to a different queue for all jobs being added would fit the current architecture I think of bulk dispatching. My thinking is the jobs that were previously pushed already exist in the queue and at that point does that batch really care what queue it was previously pushed to? At that point I'd imagine the only thing that actually cares about queues are the workers themselves.

$this->batch()->add($jobs)->onQueue('someQueue')

If you or anyone else happens to see this let me know your thoughts on something like this or maybe some suggestions of a workaround

cvsouth commented 1 week ago

+1 for being able to run a batch of jobs across multiple queues