Closed Akhilesh53 closed 11 months ago
You probably need to install the snappy lib/packages on your machine.
Thanks @aureliar8 I just need to mention the path separately for all the folders. It worked then
@aureliar8 @yihuang @kingster I am facing one problem. I am using rocksdb for clustering of strings for similarity/ dedupe purposes. when I start the clustering process, memory consumption is 0, but as the process proceeds further memory increases slow and steadily and reached till max limit, due to which OS auto kills the process. RocksDB Details.zip You can see the logs and profiling details in this zip file. Can you suggest me something to resolve this issue ?
What's the max limit value in your case ?
According to the go profiles you send, your pure go program seems to make a lot of short lived allocation. (High alloc_space & low inuse_space). So this shouldn't impact negatively the memory footprint of the process. This indicates that most the memory footprint comes from cgo code that the go profiler can't observe, so probably rocksdb itself.
I can see in the rocksdb logs that you use a LRU BlockCache with a capacity of 3GB. I can't comment if this is a correct value but this can explain a memory increase of 3GB between after the start of the process.
Maybe lots of sst files, there are some amount of memory needed for each opened sst files, you can set max open files option.
Thanks @aureliar8 @yihuang
What's the max limit value in your case ?
Around 62 GB of memory is free. One process can use 100% of the memory available.
I am using the below-mentioned configuration to create rocksdb.
bbto := grocksdb.NewDefaultBlockBasedTableOptions()
//todo:
// checkout the value for LRUCache and options
bbto.SetBlockCache(grocksdb.NewLRUCache(31457280))
filter := grocksdb.NewBloomFilter(10)
bbto.SetFilterPolicy(filter)
opts := grocksdb.NewDefaultOptions()
opts.SetBlockBasedTableFactory(bbto)
opts.SetCreateIfMissing(true)
opts.EnableBlobFiles(true)
opts.EnableBlobGC(true)
opts.IncreaseParallelism(4)
opts.SetMaxWriteBufferNumber(4)
opts.SetMinWriteBufferNumberToMerge(1)
opts.SetRecycleLogFileNum(4)
opts.SetWriteBufferSize(134217728)
opts.SetWritableFileMaxBufferSize(0)
opts.CompactionReadaheadSize(2097152)
opts.SetMaxBackgroundJobs(2)
opts.SetMaxTotalWalSize(1073741824)
opts.SetBlobCompactionReadaheadSize(2097152)
opts.SetDbLogDir(dataDir + "/" + name)
opts.SetInfoLogLevel(grocksdb.InfoInfoLogLevel)
opts.SetStatsDumpPeriodSec(180)
opts.EnableStatistics()
opts.SetLevelCompactionDynamicLevelBytes(false)
opts.SetMaxOpenFiles(5)
for i := 00; i <= 99; i++ {
db, err := NewRocksDB(BasePath+"/table_"+idx, "Pentagram")
PentagramDB[i] = db
db1, err := NewRocksDB(BasePath+"/table_"+idx, "Cluster")
ClusterDB[i] = db1
}
Your comments will be insightful if you can recommend what should be optimal values for options considering 62 GB of free space.
In the documentation, it is mentioned that
This fork contains no defer in codebase (my side project requires as less overhead as possible). This introduces a loose convention of how/when to free c-mem, thus breaking the rule of [tecbot/gorocksdb](https://github.com/tecbot/gorocksdb).
Is this in any way affecting memory consumption? If yes, what will be the alternative to this?
I find it hard to believe that this go code creates a rocksdb instance that generates the logs you previously send.
In the go code
bbto.SetBlockCache(grocksdb.NewLRUCache(31457280)) // 30MiB
In the rocksdb logs
Block cache LRUCache@0x24cd180#2984523 capacity: 3.00 GB ...
If each rocksdb instance is indeed having a cache of 3.00GB, then the total memory needed by this LRU cache is 200*3GB = 600GB
You could try to rerun your experiment with a single rocksdb instance and see where the memory usage stops. Then you'll need to have this low enough so that it can be multiplied by 200.
Alternatively you can change a bit the architecture of your code by having less rocksdb instances. The column family feature might heal you at creating disjoint "tables" in a single rocksdb instance.
Or I think it's also possible to make these these 200 rocksdb instance share their ressources (caches, buffer) but you'd have to look at the documentation.
@aureliar8 Based on the comments received from your side I changed the configuration. The logs I have shared previously had different configs as mentioned in the rocksdb log file.
Plus I am setting this in readoptions ro := grocksdb.NewDefaultReadOptions() ro.SetFillCache(false)
- In the documentation, it is mentioned that [...] Is this in any way affecting memory consumption? If yes, what will be the alternative to this?
This should have no significant impact
I have experimented by 5 different approaches for only one table (one table will create two rocksdb instance)which has a number of records
part 1: Flush after all records processed. Quick but memory also increasing rapidly
part 2: Flush after every 1000 records processed. Quick but memory also increasing rapidly
part 3: Flush after every insert and after every 1000 records
Too Slow
part 4: Flush after every insert
Slow and steadily memory increasing
part 5: No manual flush
Quick but memory also increasing rapidly
You can see the logs in attached file. RocksDB Details.zip
Rocksdb Configuration.
bbto := grocksdb.NewDefaultBlockBasedTableOptions()
//todo:
// checkout the value for LRUCache and options
bbto.SetBlockCache(grocksdb.NewLRUCache(31457280))
filter := grocksdb.NewBloomFilter(10)
bbto.SetFilterPolicy(filter)
opts := grocksdb.NewDefaultOptions()
opts.SetBlockBasedTableFactory(bbto)
opts.SetCreateIfMissing(true)
opts.EnableBlobFiles(true)
opts.EnableBlobGC(true)
opts.IncreaseParallelism(4)
opts.SetMaxWriteBufferNumber(4)
opts.SetMinWriteBufferNumberToMerge(1)
opts.SetRecycleLogFileNum(4)
opts.SetWriteBufferSize(64 << 20)
opts.SetWritableFileMaxBufferSize(0)
opts.CompactionReadaheadSize(2097152)
opts.SetMaxBackgroundJobs(2)
opts.SetMaxTotalWalSize(1073741824)
opts.SetBlobCompactionReadaheadSize(2097152)
opts.SetDbLogDir(dataDir + "/" + name)
opts.SetInfoLogLevel(grocksdb.InfoInfoLogLevel)
opts.SetStatsDumpPeriodSec(180)
opts.EnableStatistics()
opts.SetLevelCompactionDynamicLevelBytes(false)
opts.SetMaxOpenFiles(5)
We are also experiencing suspected memory leaking with our rocksdb based app, haven't investigated deep yet though.
@Akhilesh53
Please use Column Family instead of creating 100 instances of RocksDB like this:
for i := 00; i <= 99; i++ {
db, err := NewRocksDB(BasePath+"/table_"+idx, "Pentagram")
PentagramDB[i] = db
db1, err := NewRocksDB(BasePath+"/table_"+idx, "Cluster")
ClusterDB[i] = db1
}
I am runnig this comand as per Documentation:
But Geeting this error
What is the alternate to this ?