ideawu / ssdb

SSDB - A fast NoSQL database, an alternative to Redis
http://ssdb.io/
BSD 3-Clause "New" or "Revised" License
8.19k stars 1.4k forks source link

dbsize grows instead of reduce when removing large amount of data! #1332

Open saveriocastellano opened 4 years ago

saveriocastellano commented 4 years ago

I'm experiencing a very weird behaviour of ssdb.. I have a large ssdb database (about 100Gb) that contains millions of keys, hashes and sorted sets. Because I have embedded in the key names a count that represents the time, I'm able to know what key/hash/sorted-set I need to remove (because the data can be expired given it is too old) from the name of the key.

So I have created a scripts that cycles through the key names (using scan, zlist, and hlist commands) and determines if the keys has to be removed from its name.

I have run the script and it has removed millions of keys, hashes and sorted sets, however very surprisingly the size of the database has increased instead of decreasing!

I do understand that every now and then ssdb triggers compresssion, and i do understand that ssdb doesn't write things immediately in disk... but still after running my script for so long and removing so much data i really expect the db size to decrease.

Can you please suggest what could be the cause?

saveriocastellano commented 4 years ago

could this be related to the issue of leveldb described here: https://github.com/google/leveldb/issues/603

?

saveriocastellano commented 4 years ago

I also saw this issue and test case:

https://github.com/ideawu/ssdb/blob/master/deps/leveldb-1.20/issues/issue178_test.cc

so I wonder, is this related to the problem I described?

ideawu commented 4 years ago

Hi, this issue is related to leveldb. As I know, deleted keys will not be actually removed from disk immediately, they may be removed after a compaction or may not. It depends on leveldb's policy.

saveriocastellano commented 4 years ago

while it is acceptable that they are not removed immediately, for me it is very crucial to make sure that they will eventually be removed.. otherwise an ever growing database will be unusable for me. Could you please tell me how to set leveldb policy in a way that deleted keys will be removed for sure?

ideawu commented 4 years ago

Manually invoke a compact operation will actually remove deleted keys from disk, execute compact in ssdb-cli. PS: do compact when server is not busy.

saveriocastellano commented 4 years ago

Unfortunately my server is busy 24h a day :(

Actually I’m running 2 masters in “mirror” and My server only uses one of them actively. So I was wondering, will it be better to run “compact” on the second master, the one that it is not actively used? Will this immediately affect the performance on the primary master? And after having compacted the data on the second master, will the data on the first master also reduce through syncing ?

ideawu commented 4 years ago

Compact on a server will not affect on another server.

saveriocastellano commented 4 years ago

Does the “compact” command of ssdb support key ranges and type (kv,hash,sorted set, list)? If it does then I could slowly split the compaction in smaller ranges in order not to affect the performance too much

saveriocastellano commented 4 years ago

@ideawu just as it might be useful for others, considering that I'm running with a master-master configuration with sync=mirror, I was able to solve this by doing the following:

after doing the above the size of my database was reduced from all the data that I had deleted.