Closed Sora233 closed 3 months ago
Thanks, but I do not know whether it is necessary, even though it has a little performance improvement, it also takes more memory to hold the pending map.
If increase 1e5
to 1e7
in example, the program likely runs for more than one hour.
Maybe a better trade-off is only using the map for more than 100(or any reasonable number) pendings.
If increase
1e5
to1e7
in example, the program likely runs for more than one hour. Maybe a better trade-off is only using the map for more than 100(or any reasonable number) pendings.
I have rethought the issue, we can hold the map to improve performance, but we do not need to record the key in the map. Because if the key size is large, it may have more memory cost.
So we can record the key hash in the map, like map[hash(key)][index]
, the hash function is not necessary to be persistent, because we only need it when the batch exists, if the batch commit or rollback, it will no longer use.
So we can use a memory hash algorithm for this, you can see the usage in badger: https://github.com/dgraph-io/badger/blob/main/txn.go#L396
So we can use a memory hash algorithm for this, you can see the usage in badger: https://github.com/dgraph-io/badger/blob/main/txn.go#L396
I have implemented this hash mechanism by looking into the code you provided.
For now,
Batch.pendingWrites
prevents for large batch put. Actually, Put in batch isO(N^2)
. (Maybe not only Put, I havn't check other operations yet.) https://github.com/rosedblabs/rosedb/blob/a776163adb0c23b8b2cf1982464a659602fbf435/batch.go#L121-L126 try example:It shows 22sec+ on my pc.
Solution:
I add a
pendingWritesMap
to help finding out the key inpendingWrites
. I also add a benchmark for batch.Before:
After: