timshannon / badgerhold

BadgerHold is an embeddable NoSQL store for querying Go types built on Badger
MIT License
519 stars 51 forks source link

UpdateMatching does not support custom StorerIndexFunc #110

Open ljfuyuan opened 5 months ago

ljfuyuan commented 5 months ago

When using an index, UpdateMatching creates an Iterator to query all index keys and uses matchsAllCriteria to compare whether the index meets the query criteria. Then, decode the value key[len(prefix):] for matching

// indexed field, get keys from index
    prefix = indexKeyPrefix(typeName, query.index)
    i.iter.Seek(prefix)
    i.nextKeys = func(iter *badger.Iterator) ([][]byte, error) {
        var nKeys [][]byte

        for len(nKeys) < iteratorKeyMinCacheSize {
            if !iter.ValidForPrefix(prefix) {
                return nKeys, nil
            }

            item := iter.Item()
            key := item.KeyCopy(nil)
            // no currentRow on indexes as it refers to multiple rows
            // remove index prefix for matching
            ok, err := s.matchesAllCriteria(criteria, key[len(prefix):], true, "", nil)
            if err != nil {
                return nil, err
            }

But Criterion.test always uses Default Decode.

            // used with keys
            if keyType != "" {
                err := s.decodeKey(testValue.([]byte), recordValue, keyType)
                if err != nil {
                    return false, err
                }
            } else {
                err := s.decode(testValue.([]byte), recordValue)
                if err != nil {
                    return false, err
                }
            }

While indexUpdate uses a custom Storer IndexFunc.

indexKey, err := index.IndexFunc(indexName, value)
    if err != nil {
        return err
    }
    if indexKey == nil {
        return nil
    }

    indexValue := make(KeyList, 0)

    indexKey = append(indexKeyPrefix(typeName, indexName), indexKey...)
timshannon commented 4 months ago

Do you have an actual issue you are running into with a replicatible code snippet?

The Storer interface doesn't actually define any decoding or encoding, it just deals with returning bytes from a type so you can define anything that describes the atomic instance of your type; i.e. what makes this item unique.

The assumption is that the same decoder and encoder you specify in your DB connection options would be used in your custom Storer implementation using the default AnonStorer as an example as to how to implement it:

storer.indexes[indexName] = Index{
                IndexFunc: func(name string, value interface{}) ([]byte, error) {
                    tp := reflect.ValueOf(value)
                    for tp.Kind() == reflect.Ptr {
                        tp = tp.Elem()
                    }

                    return s.encode(tp.FieldByName(name).Interface())
                },
                Unique: unique,
            }

So I'm trying to understand what scenario you're trying to solve where you are using a different encoder / decoder for your indexes then for the rest of your data?

ljfuyuan commented 4 months ago

Hi , I try to customize the Storer to construct a specific Index, but it seems that UpdateMatching is not working correctly, the following is the pseudo code

type Sample struct {
    Id        uint32
}

func (t *Sample) Type() string {
    return "Sample"
}

func (t *Sample) Indexes() map[string]badgerhold.Index {
    return sampleIndex
}

sampleIndex := make(map[string]badgerhold.Index)
sampleIndex["Id"] = badgerhold.Index{
    IndexFunc: func(name string, value interface{}) ([]byte, error) {
        if t, ok := value.(*Sample); ok {
            return badgerhold.DefaultEncode(t.Id)             // <- works fine

            // id := make([]byte, 4)                          // <- did not work here
            // binary.LittleEndian.PutUint32(id, t.Id)
            // return id, nil
        }

        return nil, errors.New("error")
    },
}
timshannon commented 4 months ago

Yep because you are just encoding it as bytes which is very different from an encoder / decoder that handles generic types:

image

ljfuyuan commented 4 months ago

Sorry, I don't get you, Since the Storer interface is provided, and indexUpdate use IndexFunc to build indexKey, why the method fetchIndexValues use DefaultEncode to build indexKey ?

If I understand Storer.Index.IndexFunc incorrectly, what is the purpose of Storer's existence?

timshannon commented 4 months ago

From https://pkg.go.dev/github.com/timshannon/badgerhold#Storer

image

... why the method fetchIndexValues use DefaultEncode to build indexKey ?

Because it needs to match the same encoding as the rest of the database so that values can be properly compared.

It's kind of like saying "why can't I compare the word four and the number 4. They are the same value."

They may both represent the same value, but they are encoding that value in different ways. In order for the database to store things in order for quick searching, all values need to be encoded the same.

binary.LittleEndian.PutUint32(id, t.Id) and Gob encode return two very different byte arrays.

ljfuyuan commented 4 months ago

Yes, I understand what you said.

But, what I want to express is that when indexFunc is used by indexUpdate to adding/updating an index, the same indexFunc should also be used during fetchIndexValues , instead of using the DefaultEncode, it seems they cannot generate the same result, so they never match.

Did I miss something?

timshannon commented 4 months ago

I may be missing something too, or missunderstanding what you're trying to accomplish.

the default / example Storer interface (i.e. the example off of which to build your own interface) encodes the index values before returning them. The fetchIndexValues function assumes that the values will be encoded, and decodes them properly.

As long as you decode your values before returning them in your custom Storer type everything should be fine. However I don't believe have any tests confirming that to be true. If you have a code snippet that shows otherwise, please share it. The previous one you sent was not encoding the index values correctly, it was using an entirely different encoding, and thus will not work.

I do acknowledge, however, that the Storer interface is a bit of a leaky abstraction, and relies on too much required behavior to be implemented to work properly. If I were re-doing it now, I would completely change it.

However doing so now would be backwards incompatible, and I'm not sure such a huge breaking change is worth it to simply clarify the interface.