wbolster / plyvel

Plyvel, a fast and feature-rich Python interface to LevelDB
https://plyvel.readthedocs.io/
Other
531 stars 76 forks source link

Question: Tiny amount of data backed up by 1000x filesystem contents? #153

Closed dspicher closed 1 year ago

dspicher commented 1 year ago

Sorry for posting this general LevelDB question here.

I have a tiny LevelDB:

db = plyvel.DB('.')
print(sum(map(lambda el: len(el[0]) + len(el[1]), db)))
# prints 25148

These 25'000 bytes are backed up by 44 .ldb files totalling 69M.

Is this enormous overhead in any way expected?

wbolster commented 1 year ago

that is a bit large indeed 🙃

i suspect that database has seen quite some writes earlier on?

can you make a backup copy, then try this on one of the copies?

db = plyvel.DB('.')
db.compact_range()
db.close()
dspicher commented 1 year ago

Yeah, I already tried this, should have mentioned it. This induces a small diff in .log and MANIFEST files, but no big change in the serialization size.

wbolster commented 1 year ago

what happens if you manually copy the contents to another database, e.g. something like this?

db1 = plyvel.DB('.')
db2 = plyvel.DB('../new', create_if_missing=True, error_if_exists=True)
for key, value in db1:
    db2.put(key, value)
db1.close()
db2.close()
dspicher commented 1 year ago

This did the trick, we are now down to 50KB :partying_face:

Thank you so much for the help! :+1:

wbolster commented 1 year ago

cool, still weird. don't think it's a plyvel issue though.

dspicher commented 1 year ago

Absolutely not, the DB was not created with plyvel, hence my apology in the first message :sweat_smile: