ostafen / clover

A lightweight document-oriented NoSQL database written in pure Golang.
MIT License
633 stars 54 forks source link

using BadgerDB: DropCollection returns a "Txn is too big to fit into one request" #147

Open willie68 opened 4 months ago

willie68 commented 4 months ago

I just do a simple test. Importing 1.000.000 simple documents and than i call DropCollection. The error "Txn is too big to fit into one request" returns. I think the error comes from the Clover DropCollection Implementation, which simply tries to delete all documents in one BadgerDB transaction. However, BadgerDB transactions are limited. In my case, the end is around 35,000 documents. Specifically, this is 15% of the table size. For more information: https://github.com/dgraph-io/badger/issues/1325

ostafen commented 4 months ago

Hey, @willie68, the quickest option I see is to delete documents in batches inside DropCollection(). However, such an approach would make clover a bit tight to a specific storage engine, in this case badgerdb, so I'm not sure about the benefit of implementing it, as in general different storage engine may have very different characteristics and limitations. Since you can achieve the sam, by just calling clover Delete() methods and selecting documents in batches (using offset and limit), I would suggest you to do so. Does this help you?

willie68 commented 4 months ago

Thank you for the fast feedback.

First I tried bbolt. But I already failed at Query. Just a simple search with results, err := db.FindAll(q.NewQuery(dbTable).Where(q.Field("datatime").Lt(queryTime))) failed. (There was an index on datatime) In addition, the import performance was not sufficient for 1,000,000 data records. That's why I try to use BadgerDB. (I have already used it successfully in other projects)

DropCollection Of course I could do this manually. Since I only have one collection at the moment, it's easier for unit tests to simply delete the file system. In the main application DropCollection will never be executed.

However, I think it is important that functions offered, should work with all options. Maybe you can possibly extend the store interface, and put the special implementation into the store/badger/badger.go

PS.: Just found the problem of time queries in bbolt. But the performance issue remains. 100.000 simple records inserts bbolt: 4m22.2910021s badger: 2.9658109s