ostafen / clover

A lightweight document-oriented NoSQL database written in pure Golang.
MIT License
633 stars 54 forks source link

Implement batch insert #120

Closed letterbeezps closed 1 year ago

letterbeezps commented 1 year ago

Use badgerDb's WriteBatch API to implement batch insertion. However, batch insert does not support uniqueness checks on data.

ostafen commented 1 year ago

Hi, @letterbeezps, thank you for this. I'm a bit concerned about merging this, because it sounds like a feature which is too much specific to the badger engine.

For example, the comment on the InsertBatch method says:

// This provides a way to conveniently do a lot of writes, batching them up as
// tightly as possible in a single transaction. Unlike Insert opertaion, this
// opertaion will not check the uniqueness of the data, it will overwrite the
// data with the same _id.

By reading this, one could ask why, in general, an InsertBatch method should not check uniqueness, which is something due to the fact that badger is used. Since clover is meant to support any key value store, this assumption doesn't sound a good option

letterbeezps commented 1 year ago

@ostafen ok, I see what you mean, This PR of mine raised some features of badger to clover, which is really not an appropriate approach.

letterbeezps commented 1 year ago

The simplest and most direct solution is to divide a large data set into several subsets, and then perform multiple insert operations. But it's up to the user to do it

Shane-XB-Qian commented 1 year ago

The simplest and most direct solution is to divide a large data set into several subsets, and then perform multiple insert operations. But it's up to the user to do it

maybe this feat can move to individual func e.g 'LoadCollection()' in some sql databases import vs load was impl by different approach // but not sure here/nosql if this can be impl/supported by all store

-- shane.xb.qian

ostafen commented 1 year ago

@letterbeezps, yes, the problem is not providing the InsertBatch() method on its own, but implement it in a way that the behavior is the same on different storage engines. Of course, if we cannot stay transactional, then it's not different from let this responsibility to the user, as you said