scylladb / scylla-rust-driver

Async CQL driver for Rust, optimized for ScyllaDB!
Apache License 2.0
583 stars 102 forks source link

Proper handling for named serialized values in batches #711

Closed piodul closed 10 months ago

piodul commented 1 year ago

Currently, named serialized values provided in batches are not handled properly. They are reported to ScyllaDB as unnamed serialized values, which have slightly different encoding. In the best case, this results in a parse error on the Scylla side.

It's not really possible to implement support for it properly due to the limitations of the protocol. In QUERY and EXECUTE, there is a <flags> field which contains a bit that determines whether serialized values that follow are named or not. However, for BATCH, the <flags> field is placed after all serialized values. There are some problems with that:

  1. Due to how the protocol is organized, the elements of the serialized frame must be parsed in order. Therefore, serialized values are parsed before the <flags> field, even though its "with names for values" bit controls how the serialized values are supposed to be parsed. Effectively, the bit is ignored because it is impossible to support it. This is the case both in ScyllaDB and Cassandra - see https://issues.apache.org/jira/browse/CASSANDRA-10246.
  2. Even if the bit were supported, it is supposed to affect serialized values of all statements in the batch, so this causes problems if named/unnamed serialized values are mixed.

There are some things that we could do:

Keshi commented 1 year ago

I just ran into this issue in production. We are using named serialized values for correctness and maintainability reasons, and some of the queries need to go into a batch, and have been resulting in a fairly unhelpful truncated frame error. I didn't recall seeing any documentation of these limitations in the rust scylla docs, and would probably be a helpful addition since it is not enforced at compile time.

I think that if #2 not super hard to do, even if it is not ideal, it would still be a very welcome addition until there is a better/real solution. Given its limitation, perhaps #1 for unprepared queries, and #2 for prepared queries would be welcome.

Lorak-mmk commented 10 months ago

After serialization refactor, the driver no longer sends named values. It also no longer sends unprepared queries with any values. Basically, solution 2 was implemented, but with a change: unprepared queries with values are silently prepared by a driver, so this is a proper fix on the driver side, not a temporary workaround.