scylladb / scylla-go-driver

Experimental, high performance Scylla Driver, University of Warsaw students' project
Apache License 2.0
152 stars 7 forks source link

session: query parameter bindings #121

Open mmatczuk opened 2 years ago

mmatczuk commented 2 years ago

Investigate and implement efficient query parameter binding. Using Bind(...interface{}) is no go due to performance inefficiencies in Go.

mmatczuk commented 2 years ago

We should support primitive types at this point.

Kulezi commented 2 years ago

Binding should check if it matches column count and column types in case of queries for which we know column metadata (prepared)

Kulezi commented 2 years ago

Current state

Currently the driver doesn't provide a way to bind values to a query, with one exception:

   func (q *Query) BindInt64(pos int, value int64) *Query

The idea behind this was that the user would use it for chaining like this:

    q := session.Query("INSERT INTO table (pk, v1, v2) VALUES (?,?,?)")
    res, err := q.BindInt64(0, 1).BindInt64(1, 2).BindInt64(2, 3).Exec()
    ...

Also what's important, is that the query struct can be reused between synchronous executions across the same goroutine. In case of fixed-size types this allows to reuse the slices to which the values were serialized in the last query, avoiding some allocations.

Binding primitive values is easy and can be done with dedicated methods, problems arise when we consider binding collection types, one could implement all possible variants of BindStringList, BindIntList, etc. however it's tedious and stupid, not to think about nested lists.

Binding any type

One could also want methods like:

   func (q *Query) BindAny(pos int, value any) *Query
   func (q *Query) BindRow(values ...any) error

However using the any type goes along with a lot of additional allocations, type switches, and use of reflections to make it work for nested collection types.

General binding using generics

I also tried another way, go does not permit parametrized struct methods, however types can be parametrized. Consider the following code:

Code ```go type Bindable interface { Type() frame.Option // CompareType returns true if this type can be bound to a marker of given type CompareType(*frame.Option) bool // Raw returns a slice with the binary representation of the bindable value. // See: https://github.com/apache/cassandra/blob/afa7dfb5a48ecb56abc2d8bbb1de0fc8f1ca77b9/doc/native_protocol_v5.spec#L1050 Raw() []byte // AppendTo returns a slice with appended [bytes] representing the binary representation of the bindable value. AppendTo(dst []byte) []byte } func (q *Query) BindAny(pos int, value Bindable) *Query type List[T Bindable] []T func (v List[T]) Bytes() []byte { if v == nil { return nil } l := len(v) res := []byte{byte(l << 24), byte(l << 16), byte(l << 8), byte(l)} for _, elem := range v { res = elem.AppendBytes(res) } return res } func (v List[T]) AppendBytes(dst []byte) []byte { return append(dst, v.Bytes()...) } func (v List[T]) CompareType(typ *frame.Option) bool { if typ.ID != frame.ListID || typ.List == nil { return false } if len(v) == 0 { var elem T return elem.CompareType(&typ.List.Element) } return v[0].CompareType(&typ.List.Element) } ```

This way binding could be done by the end user like this:

    q.Bind(scylla.Int(2), scylla.Text("abcd"), scylla.List[scylla.List[scylla.Text]]{"ab", "cd"})

This removes the need for reflections, and allows binding of any cql type to non-prepared queries as the syntax supplies the metadata needed to serialize the type.

AppendBytes and CompareType methods are there to reduce the number of allocations when serializing collection types and type-checking in case of prepared queries.

goos: linux
goarch: amd64
pkg: github.com/mmatczuk/scylla-go-driver
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
BenchmarkBindStringListConcrete-16      10305943               126.9 ns/op           128 B/op             5 allocs/op
BenchmarkBindStringListGeneric-16        8219995               143.1 ns/op           152 B/op          6 allocs/op
BenchmarkBindStringListAny-16            1941020               672.7 ns/op           304 B/op         15 allocs/op

BenchmarkBindInt64Concrete-16           74998750                14.82 ns/op            8 B/op          1 allocs/op
BenchmarkBindInt64ConcreteReuse-16      237995461                4.955 ns/op           0 B/op          0 allocs/op
BenchmarkBindInt64Bindable-16           42766458                27.84 ns/op           16 B/op          1 allocs/op
BenchmarkBindInt64Any-16                30776538                37.53 ns/op           15 B/op          1 allocs/op

I think the typed approach could be the right way to go as it's not much slower than the concrete binding and it has the ability to cover all cql type bindings I think.

The any binding could be a bit better if it would be rewritten to not use gocql and use 1-1 mapping of types, as preparing gocql.TypeInfo from frame.Option adds overhead.

Code used for benchmarks is here: https://github.com/scylladb/scylla-go-driver/tree/pp/bindings

martin-sucha commented 2 years ago

Do you want to support named query parameters (e.g. SELECT * FROM table where a = :a and b = :b) or do you want to ignore the named parameters in favor of positional arguments because of performance? (For the record, I don't need named arguments, we use only positional markers)

Kulezi commented 2 years ago

Do you want to support named query parameters (e.g. SELECT * FROM table where a = :a and b = :b) or do you want to ignore the named parameters in favor of positional arguments because of performance? (For the record, I don't need named arguments, we use only positional markers)

There are plans to integrate the new driver with GocqlX which provides this kind of features. I don't know yet if it will be part of the driver or a separate package like ScyllaX, but it will be supported one way or the other.