ostafen / clover

A lightweight document-oriented NoSQL database written in pure Golang.
MIT License
666 stars 55 forks source link

about the function count( ) #58

Closed jinzhongjia closed 2 years ago

jinzhongjia commented 2 years ago

bro, the fun count( ) , when there is a lot of data, it takes a long time to return. and i see the source code ,it is implemented by findall( ) and len( )the return it takes a lot time when the data is big such as ,now i have about 40,000 pieces of data. it takes more than a second... maybe we should change the fun implementation

func (here *Db) SearchContent(names []string, num int, pg int) ([]*clover.Document, int) {
    var name string
    for i, v := range names {
        if i < len(names)-1 {
            name += "(.*" + regexp.QuoteMeta(v) + ".*)|"
        } else {
            name += "(.*" + regexp.QuoteMeta(v) + ".*)"
        }
    }

    query := here.content.Where(clover.Field("name").Like(name))

    startT := time.Now()

    docs, _ := query.Skip(num * pg).Limit(num).FindAll()
    fmt.Printf("time.Since(startT): %v\n", time.Since(startT))
    startU := time.Now()

    pgCount, _ := query.Count()
    fmt.Printf("time.Since(startU): %v\n", time.Since(startU))

    return docs, int(math.Floor(float64(pgCount/num) + 0.0/2.0))
}

it print this

time.Since(startT): 204.9991ms
time.Since(startU): 1.40169s
ostafen commented 2 years ago

Yes, I was aware about this, and this should be definitely optimized. Do you want to provide a PR? The Count() function should be added to the StorageEngine interface{} and be implemented for the two storages currently available

jinzhongjia commented 2 years ago

Yes, I was aware about this, and this should be definitely optimized. Do you want to provide a PR? The Count() function should be added to the StorageEngine interface{} and be implemented for the two storages currently available

Of course I want to provide pr, but I need to wait for me to read the code and ask by the way, will there be a cache for the query results? I noticed that the first query takes much longer than the later query.

jinzhongjia commented 2 years ago

For example, I used 190ms for my first query, and then only used 120ms

ostafen commented 2 years ago

Badger caches data internally

jinzhongjia commented 2 years ago

For some reason, my skip function doesn't seem to work. The code is as follows

func (here *Db) SearchContent(names []string, num int, pg int) []*clover.Document {
    var name string
    for i, v := range names {
        if i < len(names)-1 {
            name += "(.*" + regexp.QuoteMeta(v) + ".*)|"
        } else {
            name += "(.*" + regexp.QuoteMeta(v) + ".*)"
        }
    }
    fmt.Println(num, pg)
    query := here.content.Where(clover.Field("name").Like(name))

    startT := time.Now()

    docs, _ := query.Skip(num * pg).Limit(num).FindAll()
    fmt.Printf("time.Since(startT): %v\n", time.Since(startT))

    return docs
}
ostafen commented 2 years ago

What do you mean by "doesn't seem to work"? What is your expected output and what are you getting?

jinzhongjia commented 2 years ago

What do you mean by "doesn't seem to work"? What is your expected output and what are you getting?

I expected to skip a few doc, but it didn't. And then the findall function returns random disorder?

jinzhongjia commented 2 years ago

I see my problem. I didn't sort.

jinzhongjia commented 2 years ago

But when I sort it, its query speed slows down again.

ostafen commented 2 years ago

Naturally, you have to take into account the cost of sorting

jinzhongjia commented 2 years ago

Okay, I'll start looking at the source code tomorrow. It is 50ms before sorting and 1.3s after sorting.😔

ostafen commented 2 years ago

I found what the problem is. I'll release soon a fix for this

jinzhongjia commented 2 years ago

I found what the problem is. I'll release soon a fix for this

thanks ,bro

ostafen commented 2 years ago

Can you post the snipped before and after sorting?

jinzhongjia commented 2 years ago

Is it time-consuming to sort screenshots, or something else? The part I want to sort is the above part of the code. I need to sort and query the data with a specified number of pages.

jinzhongjia commented 2 years ago

But when I don't sort, skip won't achieve the effect of the next page as I thought.

jinzhongjia commented 2 years ago

I pulled the version I just submitted, then tested the count elapsed time and found that it didn't reduce much.

ostafen commented 2 years ago

Actually this is the best we can do. Without indexes (which are not currently supported), there is no faster way than iterate on each record of the collection.