Add batch insert/upsert/insert_objects/upsert_objects

no1seman commented 3 years ago

When perform huge cold data loading it willbe great to have an availability of inserting/upserting data by batches (list of tuples/objects). This functions will be called directly from cartridge-java or any other clients.

olegrok commented 3 years ago

Totktonada commented 3 years ago

@unera Please, highlight what is priority of the feature?

unera commented 3 years ago

both inserts.

no1seman commented 3 years ago

I should say that this task is not so easy as may seems to be. First of all @olegrok mentioned some additional functinality we need in vshard, also need to support this feature in some popular language connectors: Java/Go/python ... The main problem is how to report caller about successed or failed operations from the batch. Seems this feature must be additionally triaged

Totktonada commented 3 years ago

We can implement batching without cluster wide consistency guarantees for now (it requires 2PC, distributes transactions or something of this kind), but with detailed reporting about errors. Is there a need in such step toward?

Totktonada commented 3 years ago

@dsharonov agreed on that and highlighted that we should return an array of errors.

denesterov commented 3 years ago

There is another use case, much more common (and, I think, important) than bulk operations.

If you have a slightly complex data structure, not just Key/Document, you are in trouble with crud.

For example, we have an customer record in one space and his orders in second space, both sharded identically. Lets say we need to close one order and create another in one move, or just store customer record with all his orders at once, not interlacing with other changes / read operations.

CRUD cannot do this.

akudiyar commented 2 years ago

@denesterov

For example, we have an customer record in one space and his orders in second space, both sharded identically. Lets say we need to close one order and create another in one move, or just store customer record with all his orders at once, not interlacing with other changes / read operations.

I have a proposal of function registration API as a basis for implementing such cases: tarantool/cartridge#1799

Totktonada commented 2 years ago

Well, it seems we missed a moment to discuss the task. I should clarify what we're going to implement here.

insert_many() and insert_object_many()
replace_many() and replace_object_many()
upsert_many() and upsert_object_many()

All failed or not performed operations will be reported in an array of errors. Presence of errors means fail or partial success. Each error will contain a tuple (an object, an upsert operation). So it'll be possible to proceed with partial success (say, retry failed operations).

We'll add options to control behaviour of operations on storages:

stop_on_error (default: false). By default we'll continue with next operations on an error and report all errors in the result. If the option is set, we'll stop on a first error and report errors regarding the failed operation and all not performed ones.
rollback_on_error (default: false). By default all succeeded operations will be committed (even if there are some failed ones). If the option is set, any failed operation will lead to rollback on a storage, where the operation is failed. All operations will be reported as failed in the case.

(Maybe we'll change option names and defaults. This message is to give the idea.)

Insert and replace are enough to implement importing data into an empty sharded tarantool cluster.

We didn't take into account CDC usage scenario, which requires JDBC style batching with ability to mix different operations in a batch. We can work on it in a separate issue.

Sorry if you did expected more wide scope of work.

tarantool / crud

Add batch insert/upsert/insert_objects/upsert_objects #193