twmht / python-rocksdb

Python bindings for RocksDB
BSD 3-Clause "New" or "Revised" License
274 stars 89 forks source link

Disable Caching / Memory Leak? #95

Open quantology opened 3 years ago

quantology commented 3 years ago

I've been using python-rocksdb for a large kv store (~100 million records). I'd like to disable any caching by rocksdb, since the access of those records is pretty random, and I'd like to minimize the memory footprint. I've found that, as the db is accessed more over time, the memory usage seems to continue to grow (either because of a memory or due to caching). My current solution is to restart the server whenever the memory reaches some threshold, but obviously that is non-ideal.

What is the recommended configuration (e.g. rocksdb.Options) for running python-rocksdb with the minimal memory footprint possible? Is there a way to entirely disable caching, so I can determine if there is a deeper memory leak at the root of this issue?

I'm currently using:

opts = rocksdb.Options()
opts.table_factory = rocksdb.BlockBasedTableFactory(
        block_cache=None,
        no_block_cache=True,
    )

Thanks for any advice!

iFA88 commented 3 years ago

Hey. My biggest DB takes 900GB storage with more than 100 billion keys and many CF's. I have also disabled block_cache for the most CF's where i dont need any cached data. The process runs now 7 days and uses 5gb RES with a lot of iterration/puts/gets. I recommend to set for database option the following too: