rosedblabs / rosedb

Lightweight, fast and reliable key/value storage engine based on Bitcask.
https://rosedblabs.github.io
Apache License 2.0
4.58k stars 632 forks source link

Ascend's execution efficiency is suboptimal #265

Closed taroim closed 1 year ago

taroim commented 1 year ago

Utilizing the Iterator to iterate through all Keys based on the v2.2.2

package main

import (
    "fmt"
    "time"

    "github.com/rosedblabs/rosedb/v2"
)

func main() {
    options := rosedb.DefaultOptions
    options.DirPath = "./tmp/rosedb"

    db, err := rosedb.Open(options)
    if err != nil {
        panic(err)
    }
    defer db.Close()

    for i := 0; i < 100000; i++ {
        _ = db.Put(utils.GetTestKey(i), utils.RandomValue(100*1024))
    }

    t := time.Now()
    iter := db.NewIterator(rosedb.DefaultIteratorOptions)
    for ; iter.Valid(); iter.Next() {
        fmt.Printf("k=%s\n", iter.Key())
    }
    fmt.Printf("completed in %d ms\n",
        time.Now().Sub(t).Milliseconds(),
    )
}
// execution result: completed in 2524 ms

Utilizing the db.Ascend to iterate through all Keys based on the v2.3.0

package main

import (
    "fmt"
    "time"

    "github.com/rosedblabs/rosedb/v2"
)

func main() {
    options := rosedb.DefaultOptions
    options.DirPath = "./tmp/rosedb"

    db, err := rosedb.Open(options)
    if err != nil {
        panic(err)
    }
    defer db.Close()

    for i := 0; i < 100000; i++ {
        _ = db.Put(utils.GetTestKey(i), utils.RandomValue(100*1024))
    }

    t := time.Now()
    db.Ascend(func(k []byte, v []byte) (bool, error) {
        fmt.Printf("k=%s\n", k)
        return true, nil
    })
    fmt.Printf("completed in %d ms\n",
        time.Now().Sub(t).Milliseconds(),
    )
}
// execution result: completed in 78560 ms

Why does Ascend take 31 times more time than Iterator? Could it be because the Ascend method by default returns the V? In most cases, obtaining only the Key is sufficient.

roseduan commented 1 year ago

You can add a BlockCache in open options.

options.BlockCache = rosedb.MB * 64

because you iterate all data in wal, block cache fits for the case(sequential reading).

amityahav commented 1 year ago

@taroim it makes sense that Ascend will take more time since it also fetches values from disk where the iterator solely iterate over the in-memory index (i didnt really read the code so i assume it). some possible optimizations that i can think of is that it worth checking the implementation of the BTREE and see if it all leaf nodes (the ones that contain the KV pairs) are chained so it easy to iterate over them and not going back to their parents. also reading from disk already optimized by the kernel's page cache where it fetches disk pages and by that we gain alot of cache hits when reading sequentially. and as @roseduan it wont harm if we add another applicative layer of caching that is doing the same. BTW in the current version the functionallity to solely iterate over keys still exists?

EDIT: ok so after giving it a second thought this is not actually the case. There may be a lot of random io going on when ascending the btree which is a performance penalty. I'll explain, so basically the keys are ordered by bytes.compare function in the btree and not by their insertion time. This means that fresh keys can be placed adjacent to old keys which their values are deep in the wal files. So when ascending the btree each key may point to a different block in the wal file thus creating a lot of random io. So I'm not sure enabling the block cache will help since it will cause trashing as well. @roseduan please correct me if I'm wrong

taroim commented 1 year ago

@roseduan Add the BlockCache option will lead to increased memory overhead. Could you consider implementing a method similar to Redis' "KEYS," which allows searching for keys matching a provided pattern?

roseduan commented 1 year ago

@roseduan Add the BlockCache option will lead to increased memory overhead. Could you consider implementing a method similar to Redis' "KEYS," which allows searching for keys matching a provided pattern?

OK, search keys are easy to support. I will open an issue for that.