tarantool / queue

Create task queues, add and take jobs, monitor failed tasks
Other
237 stars 52 forks source link

Add tube:put_many() #124

Open rybakit opened 4 years ago

rybakit commented 4 years ago

Consider adding a new method to insert multiple tasks atomically (all or none). The method's signature may look like the following:

tube:put_many({ {task_data1 [, options1]} [, {task_data2 [, options2]} [, ...]] })
Totktonada commented 3 years ago

Why not define a function for this purpose? Whether bulk task putting is so common scenario?

rybakit commented 3 years ago

Defining a function is certainly an option, and that's what I do in my internal projects. But this becomes a problem for an open-source library (like this one) that depends on tarantool/queue, because as a maintainer I have to ask every user of my library to copy-paste that function into their Lua script or let the library provide that function, and make sure it is compatible with the tarantool/queue releases. It makes the developer experience of library consumers more complicated than it can be. And also creates an extra burden for me as the maintainer - I have to write tests for this function, I have to know all the nuances of how it works on memtx/vinyl (if there are any), I may not be good enough in Lua, etc.

In terms of API design, calling this function from outside the queue interface looks more like a workaround than a thoughtful decision. For example, I maintain a PHP queue whose interface is identical to the one in Lua. With your approach, I can't just pass an instance of a queue to a method if I want to put two tasks in the queue atomically:

public function foo(Queue $queue) {
    ....

    return $queue->putMany([$task1, $task2]);
}

Instead, I have to keep the reference to the "connector" instance and do something like:

public function foo(Queue $queue, Client $client) {
    ....

    $queueName = $queue->getName();

    return $client->call('put_many', $queueName, $task1, $task2);
}

Or expose the details of the queue implementation by giving access to the underlying connector instance:

public function foo(Queue $queue) {
    ....

    $queueName = $queue->getName();
    $client = $queue->getClient();

    return $client->call('put_many', $queueName, $task1, $task2);
}

Regarding whether this is a common scenario, I can't speak for others, but in all my projects that were using the queue, I had to have the ability to insert multiple tasks atomically - either all or none, which seems like a very common use case to me.

Totktonada commented 3 years ago

Thanks for the detailed response!

What worries me: whether this API would look lopsided? Should other operations (say, deletion of a task) support batching?

Personally I had a case, when I had a need to delete a bunch of tasks that are collected using a specific condition. But it was from inside the same tarantool instance, so there were no problems neither with performance, nor with atomicity.

Let's look the queue API in context of using it from a connector. Since we want to have a general and solid API on each abstraction level, batching and atomicity of a set of operations fit better to the protocol level.

Tarantool's binary protocol is asynchronous, so when the question is about performance, a client may send a bunch of requests and wait them all afterwards. Atomicity is more complex question. We still have no interactive transactions support in the protocol, but we surely should. It is under design now.

So, if we're about an abstract flavor of the API, then put_many() would not look perfect. However we can discuss it as workaround for the absense of interactive transactions support. At least this feature will not land to the current LTS release series (1.10).

The question is the following. Whether the problem is so common for applications that it deserves a workaround right within the queue API? I would look at some use cases (at least sketchily described) prior making any decision.

rybakit commented 3 years ago

What worries me: whether this API would look lopsided? Should other operations (say, deletion of a task) support batching?

So, if we're about an abstract flavor of the API, then put_many() would not look perfect.

I do agree with you. A possible solution could be to introduce a generic function (eg queue.atomic()) that can process any queue operations atomically in a batch (even from different queues), or create a batchQueue driver decorator?

I would look at some use cases (at least sketchily described) prior making any decision.

I can't recall use cases completely as I no longer work on those projects, but in one case it was something about sending confirmation/notification emails after a successful booking (about 3 different emails for the same event). Without atomicity we had to check if every next push() failed and delete the tasks that have already been pushed and pray that they are not already taken. Of course, we could come up with workarounds to bypass these problems, or write and maintain a Lua function that does the trick, but again, all of these complications can be avoided if the queue module provides the ability to handle multiple operations atomically. Maybe interactive transactions are the best way to go to keep the API clean, idk. So please feel free to close this ticket if you think so. Btw, do you know if begin/commit transaction requests can also be put into a batch or should they be sent separately? In other words, will it be 3 calls or 1?

Totktonada commented 3 years ago

Your case looks valid for me. A simple workaround in the queue implementation looks okay for me. I'll add the issue into our backlog.

Btw, do you know if begin/commit transaction requests can also be put into a batch or should they be sent separately? In other words, will it be 3 calls or 1?

@mary3000 working on https://github.com/tarantool/tarantool/issues/2016, so maybe she may reveal details about the future implementation.