timshannon / badgerhold

BadgerHold is an embeddable NoSQL store for querying Go types built on Badger
MIT License
523 stars 51 forks source link

Optimisation of 'badgerhold.And' query #93

Open kaustubh-murumkar opened 1 year ago

kaustubh-murumkar commented 1 year ago

Hello guys, I am using badgerhold as an in-memory database. The size of the database is around 200 records. The use case involves filtering based on multiple fields as an and operation. Consider the following sample code that demonstrates the issue:


type SampleRecord struct {
    Name        string
    Tag         []string
    FeatureFlag string
}

func main() {
    options := badgerhold.DefaultOptions
    options.InMemory = true
    store, err := badgerhold.Open(options)
    if err != nil {
        panic(err)
    }
    fmt.Println(store)

    for i := 0; i < 10; i++ {
        newRecord := SampleRecord{
            Name:        fmt.Sprintf("sample%d", i),
            Tag:         []string{"tag1", "tag2"},
            FeatureFlag: fmt.Sprintf("featureFlag%d", i),
        }
        store.Insert(newRecord.Name, newRecord)
    }

    var result SampleRecord
    count, err := store.Count(&result,
        badgerhold.Where("Name").MatchFunc(func(ra *badgerhold.RecordAccess) (bool, error) {
            record := ra.Record().(*SampleRecord)
            fmt.Println("comparing ", record.Name, "with", "sample5")
            return strings.EqualFold(record.Name, "sample5"), nil
        }).
            And("FeatureFlag").MatchFunc(func(ra *badgerhold.RecordAccess) (bool, error) {
            record := ra.Record().(*SampleRecord)
            // check feature flag
            fmt.Println("checking feature flag for ", record.Name)

            return true, nil
        }),
    )
    fmt.Println(count, err)
}

the output of the above code shows the pattern that the query applied on both fields is being run, for an and operation. This is sub optimal since the second query does not need to be executed if the first query has failed. Especially since the second query it quite computationally expensive in our case. This behaviour has caused latency increase in our API.

Is there an alternative to this or a quick fix that we can do on our side? If not, please provide a fix/udpate for this.

Badgerhold version: v4.0.2

Thanks, kaustubh-murumkar

timshannon commented 1 year ago

Yep, you're right, it doesn't short circuit the logic like you'd expect. That can definitely be done, but it may end up being a little tricky. The current logic stores the query criteria in a map fieldCriteria map[string][]*Criterion which isn't in a guaranteed order.

I'll look into this more, but as a work around you can use a function in your query to run the logic exactly how you want: Where("field").MatchFunc(func(ra *RecordAccess) (bool, error))

tauraamui commented 1 year ago

@kaustubh-murumkar if you would like to use something which already has this very simple/basic short circuit implemented already and has the exact same backing DB, check out https://github.com/tauraamui/kvs