timshannon / badgerhold

BadgerHold is an embeddable NoSQL store for querying Go types built on Badger
MIT License
514 stars 52 forks source link

Keys prefixes collision during iteration #70

Closed b-tarczynski closed 2 years ago

b-tarczynski commented 2 years ago

Inserting values of a type whose name has a prefix of another type name causes problems when iterating over keys. It tries to include a type with a longer name as well. One way to resolve it would be to store the key with an additional separator : after the type name in badger.

Code that reproduces issue:

type TestStruct struct {
    Value int
}

type TestStructCollision struct {
    Value int
}

func TestBadgerKeyPrefixCollision(t *testing.T) {
    options := bh.DefaultOptions
    options.InMemory = true
    store, err := bh.Open(options)
    require.NoError(t, err)
    defer func() {
        err = store.Close()
        require.NoError(t, err)
    }()

    for i := 0; i < 5; i++ {
        err = store.Insert(i, TestStruct{Value: i})
        require.NoError(t, err)

        err = store.Insert(i, TestStructCollision{Value: i})
        require.NoError(t, err)
    }

    query := bh.Where(bh.Key).In(0, 1, 2, 3, 4)
    results := make([]TestStruct, 0, 5)
    err = store.Find(
        &results,
        query,
    )
    require.NoError(t, err)
}
timshannon commented 2 years ago

The indexes are prefixed and split on ":" already (https://github.com/timshannon/badgerhold/blob/master/index.go#L107) however I'm betting there needs to be better handling when iterating through records to account for this scenario.

Thanks for the repro. Hopefully I can make a backwards compatible change.

timshannon commented 2 years ago

So we have a couple options here. The fix doesn't require a backwards incompatible API change, but it does create a backwards incompatible storage change as users of the library will have this prefix stored.

I can 1) Rev the major version of the library to v4 or 2) default to a new prefix, and if a valid entry isn't found for a new prefix, try the old prefix.

I'm leaning towards 2 right now.

b-tarczynski commented 2 years ago

I think the first option is better, because the second would slow down all reads from badger. Also the second one would make code more complex and could lead to unexpected behaviours. Another way to handle this can be done by introducing a config option that chooses keys without or with additional separator.

I prefer your first approach the most.