leveldb created from large batch writes temporarily grows too much from 'compacting': lesson - compact before moving to system with low storage

qrdlgit commented 10 months ago

I have a large leveldb created by plyvel using reasonably large batch writes (~100K keys per batch, results in ~1.2GB per ldb file).

I've copied the DB over to a new system for get only purposes - no writes / puts.

I use code like this:

db = plyvel.DB('./db.lvl', create_if_missing=True)
for enum_i,(k,v) in enumerate(db):
    kd = k.decode("utf-8") 
    vd = json.loads(v.decode("utf-8"))

While iterating in this manner the db.lvl directory has grown from 7GB to over 14GB, adding 100s of new ldb files in the process. Maybe this is 'compacting', but it's threatening to use up all the limited space I have in that particular location, and it seems a bit unreasonable considering I'm just doing get calls.

Is it because I'm enumerating the data?

It would be great to be able to open the db without compacting turned on. I only need the above enumeration for a specific task, after that I will be doing more infrequent gets where I can open/close the db quickly to avoid compaction.

My workaround is to create another line by line file, which is a basic duplication of the entire 10GB db, but fortunately I have more read only storage.

qrdlgit commented 10 months ago

>>> import plyvel
>>> db = plyvel.DB("./db_copy.lvl")
>>> db.compact_range()

Just keeps growing and growing... Already gone from 51 files to 1899 files, and 'du' reports 7603876 -> 11213796

'compacting'... perhaps renaming the process to make it more clear?

The pre-compacted form is sufficiently performant for my purposes. It'd be great to disable this in order to make a space for time tradeoff.

qrdlgit commented 10 months ago

OK .. this was blowing up my system with a limited storage space (it used all 19GB from 7GB!), so I copied it to a system with more storage space and ran the compact. In the end - that worked, and it only grew by 600mb.. 3704 files, but that shouldn't be an issue.

Lesson learned: compact before moving to a system with limited storage space.

google / leveldb

leveldb created from large batch writes temporarily grows too much from 'compacting': lesson - compact before moving to system with low storage #1145