Error: Library Not Found

Akhilesh53 commented 1 year ago

I am runnig this comand as per Documentation:

CGO_CFLAGS="-I/opt/homebrew/Cellar/rocksdb/8.5.4/include" \
CGO_LDFLAGS="-L/opt/homebrew/Cellar/rocksdb/8.5.4 -lrocksdb -lstdc++ -lm -lz -lsnappy -llz4 -lzstd" \
  go build

But Geeting this error

/usr/local/go/pkg/tool/darwin_arm64/link: running clang failed: exit status 1
ld: library not found for -lsnappy
clang: error: linker command failed with exit code 1 (use -v to see invocation)

What is the alternate to this ?

aureliar8 commented 1 year ago

You probably need to install the snappy lib/packages on your machine.

Akhilesh53 commented 1 year ago

Thanks @aureliar8 I just need to mention the path separately for all the folders. It worked then

Akhilesh53 commented 1 year ago

@aureliar8 @yihuang @kingster I am facing one problem. I am using rocksdb for clustering of strings for similarity/ dedupe purposes. when I start the clustering process, memory consumption is 0, but as the process proceeds further memory increases slow and steadily and reached till max limit, due to which OS auto kills the process. RocksDB Details.zip You can see the logs and profiling details in this zip file. Can you suggest me something to resolve this issue ?

aureliar8 commented 1 year ago

What's the max limit value in your case ?

According to the go profiles you send, your pure go program seems to make a lot of short lived allocation. (High alloc_space & low inuse_space). So this shouldn't impact negatively the memory footprint of the process. This indicates that most the memory footprint comes from cgo code that the go profiler can't observe, so probably rocksdb itself.

I can see in the rocksdb logs that you use a LRU BlockCache with a capacity of 3GB. I can't comment if this is a correct value but this can explain a memory increase of 3GB between after the start of the process.

yihuang commented 1 year ago

Maybe lots of sst files, there are some amount of memory needed for each opened sst files, you can set max open files option.

Akhilesh53 commented 1 year ago

Thanks @aureliar8 @yihuang

What's the max limit value in your case ? Around 62 GB of memory is free. One process can use 100% of the memory available.
I am using the below-mentioned configuration to create rocksdb.

         bbto := grocksdb.NewDefaultBlockBasedTableOptions()
    //todo:
    // checkout the value for LRUCache and options
    bbto.SetBlockCache(grocksdb.NewLRUCache(31457280))
    filter := grocksdb.NewBloomFilter(10)
    bbto.SetFilterPolicy(filter)
    opts := grocksdb.NewDefaultOptions()
    opts.SetBlockBasedTableFactory(bbto)
    opts.SetCreateIfMissing(true)
    opts.EnableBlobFiles(true)
    opts.EnableBlobGC(true)
    opts.IncreaseParallelism(4)
    opts.SetMaxWriteBufferNumber(4)
    opts.SetMinWriteBufferNumberToMerge(1)
    opts.SetRecycleLogFileNum(4)
    opts.SetWriteBufferSize(134217728) 
    opts.SetWritableFileMaxBufferSize(0)
    opts.CompactionReadaheadSize(2097152)
    opts.SetMaxBackgroundJobs(2)
    opts.SetMaxTotalWalSize(1073741824)
    opts.SetBlobCompactionReadaheadSize(2097152)
    opts.SetDbLogDir(dataDir + "/" + name)
    opts.SetInfoLogLevel(grocksdb.InfoInfoLogLevel)
    opts.SetStatsDumpPeriodSec(180)
    opts.EnableStatistics()
    opts.SetLevelCompactionDynamicLevelBytes(false)
    opts.SetMaxOpenFiles(5)

Also I forgot to mention this thing that we are creating around 200 tables with this configuration.

for i := 00; i <= 99; i++ {
        db, err := NewRocksDB(BasePath+"/table_"+idx, "Pentagram")
        PentagramDB[i] = db

        db1, err := NewRocksDB(BasePath+"/table_"+idx, "Cluster")
        ClusterDB[i] = db1
    }

Your comments will be insightful if you can recommend what should be optimal values for options considering 62 GB of free space.

In the documentation, it is mentioned that This fork contains no defer in codebase (my side project requires as less overhead as possible). This introduces a loose convention of how/when to free c-mem, thus breaking the rule of [tecbot/gorocksdb](https://github.com/tecbot/gorocksdb).

Is this in any way affecting memory consumption? If yes, what will be the alternative to this?

aureliar8 commented 1 year ago

I find it hard to believe that this go code creates a rocksdb instance that generates the logs you previously send.

In the go code

bbto.SetBlockCache(grocksdb.NewLRUCache(31457280)) // 30MiB

In the rocksdb logs

Block cache LRUCache@0x24cd180#2984523 capacity: 3.00 GB ...

If each rocksdb instance is indeed having a cache of 3.00GB, then the total memory needed by this LRU cache is 200*3GB = 600GB

You could try to rerun your experiment with a single rocksdb instance and see where the memory usage stops. Then you'll need to have this low enough so that it can be multiplied by 200.

Alternatively you can change a bit the architecture of your code by having less rocksdb instances. The column family feature might heal you at creating disjoint "tables" in a single rocksdb instance.

Or I think it's also possible to make these these 200 rocksdb instance share their ressources (caches, buffer) but you'd have to look at the documentation.

Akhilesh53 commented 1 year ago

@aureliar8 Based on the comments received from your side I changed the configuration. The logs I have shared previously had different configs as mentioned in the rocksdb log file.

Plus I am setting this in readoptions ro := grocksdb.NewDefaultReadOptions() ro.SetFillCache(false)

aureliar8 commented 1 year ago

In the documentation, it is mentioned that [...] Is this in any way affecting memory consumption? If yes, what will be the alternative to this?

This should have no significant impact

Akhilesh53 commented 1 year ago

I have experimented by 5 different approaches for only one table (one table will create two rocksdb instance)which has a number of records

part 1: Flush after all records processed. Quick but memory also increasing rapidly
part 2: Flush after every 1000 records processed. Quick but memory also increasing rapidly

part 3: Flush after every insert and after every 1000 records
        Too Slow

part 4: Flush after every insert 
        Slow and steadily memory increasing

part 5: No manual flush

            Quick but memory also increasing rapidly

You can see the logs in attached file. RocksDB Details.zip

Rocksdb Configuration.

bbto := grocksdb.NewDefaultBlockBasedTableOptions()
        //todo:
        // checkout the value for LRUCache and options
        bbto.SetBlockCache(grocksdb.NewLRUCache(31457280))

        filter := grocksdb.NewBloomFilter(10)
        bbto.SetFilterPolicy(filter)

        opts := grocksdb.NewDefaultOptions()
        opts.SetBlockBasedTableFactory(bbto)
        opts.SetCreateIfMissing(true)
        opts.EnableBlobFiles(true)
        opts.EnableBlobGC(true)
        opts.IncreaseParallelism(4)
        opts.SetMaxWriteBufferNumber(4)
        opts.SetMinWriteBufferNumberToMerge(1)
        opts.SetRecycleLogFileNum(4)
        opts.SetWriteBufferSize(64 << 20)
        opts.SetWritableFileMaxBufferSize(0)
        opts.CompactionReadaheadSize(2097152)
        opts.SetMaxBackgroundJobs(2)
        opts.SetMaxTotalWalSize(1073741824)
        opts.SetBlobCompactionReadaheadSize(2097152)
        opts.SetDbLogDir(dataDir + "/" + name)
        opts.SetInfoLogLevel(grocksdb.InfoInfoLogLevel)
        opts.SetStatsDumpPeriodSec(180)
        opts.EnableStatistics()
        opts.SetLevelCompactionDynamicLevelBytes(false)
        opts.SetMaxOpenFiles(5)

yihuang commented 1 year ago

We are also experiencing suspected memory leaking with our rocksdb based app, haven't investigated deep yet though.

linxGnu commented 1 year ago

@Akhilesh53

Please use Column Family instead of creating 100 instances of RocksDB like this:

for i := 00; i <= 99; i++ {
        db, err := NewRocksDB(BasePath+"/table_"+idx, "Pentagram")
        PentagramDB[i] = db

        db1, err := NewRocksDB(BasePath+"/table_"+idx, "Cluster")
        ClusterDB[i] = db1
}

linxGnu / grocksdb

Error: Library Not Found #127