Closed qrdlgit closed 1 year ago
not sure why this happens.
try compacting the db perhaps?
this is not a plyvel behaviour btw, but a leveldb thing, so maybe you can find other avenues with more knowledgeable people… curious to learn more though 🙃
>>> import plyvel
>>> db = plyvel.DB("./db_copy.lvl")
>>> db.compact_range()
Just keeps growing and growing... sigh. Already gone from 51 files to 705, and 'du' reports 7603876 -> 9256684
'compacting'... perhaps they should rename this process to make it more clear.
Odd that I can't find a lot of discussion about this problem. Does leveldb not get a lot of usage? It was easier to install than rocksdb.
Note: I re-ran this on a separate system with more storage space and at the end of the compaction, it ended up OK (did increase by 600mb, but probably much more performant). 3704 files, however.
Note that it blew up all my storage space on the limited space system. I had 19GB free on that system, and it temporarily used it all up, starting with a 7GB and 51 files.
So, lesson learned - make sure you compact before copying to a limited storage space for usage.
My guess is that my massive batch writes were creating really un-optimal ldb files. Very fast to write, but not great for gets, and compaction fixes that, but not using temporary space in a very controllable / friendly way.
you could also try increasing the block size. see notes about that here, which mention bulk scans explicitly:
I have a large leveldb created by plyvel using reasonably large batch writes.
I've copied it over to a new system for get only purposes - no writes / puts.
I use code like this:
While iterating in this manner it has grown from 7GB to over 14GB, adding 100s of new files in the process. Maybe this is 'compacting', but it's threatening to use up all the limited space I have in that particular location, and it seems a bit unreasonable considering I'm just doing get calls.
Is it because I'm enumerating the data?
My work around is to treat it in batches, deleting everything and copying from over the entire database after each batch. Crazy, but I see no alternative.