Closed muzarski closed 4 months ago
@Lorak-mmk @wprzytula Any ideas why the unit tests using scylla-proxy fail? In both tests, cass_session_execute_batch
seems to return CASS_ERROR_LIB_UNABLE_TO_CONNECT
for some weird reason.
For some reason, I could not add a driver's dependency with a standard version = "0.13" notation. I had to use a VCS path to a specific commit. The reason is that, otherwise, rust compiler would complain about scylla-rust-wrapper and scylla-proxy using different versions of DbError enum - even though, I changed the version of scylla-proxy, and it points to the commit of 0.13 release (https://github.com/scylladb/scylla-rust-driver/commit/e68adf5dcf4fb4667d37a42a628b106a028240d4).
My idea why this happens is: if you choose a specific commit for scylla-proxy, then scylla-proxy
(together with scylla-cql
it depends on) is fetched from GH. Choosing version
for scylla
instead of a commit, makes scylla
(together with scylla-cql
it depends on) come from crates.io, this way making those DB errors different (coming from two different source code definitions - one in GH scylla-cql
and another in crates.io scylla-cql
).
@Lorak-mmk @wprzytula Any ideas why the unit tests using scylla-proxy fail? In both tests,
cass_session_execute_batch
seems to returnCASS_ERROR_LIB_UNABLE_TO_CONNECT
for some weird reason.
The culprit is this: https://github.com/scylladb/scylla-rust-driver/pull/770. Logs show (I recommend running one of those failing tests in two terminals side by side, one before the commit that updates the driver and one after) that the updated driver, having failed the initial metadata fetch, starts the re-resolution procedure. This leads to the problem. Further investigation TODO.
OK, a solution is to alter the proxy rules in the test code not to drop the control connection, but instead return a DB error on attempt to fetch metadata. This way the driver does not suffer from lack of connections in the pool.
Just change this function in a following way:
fn generic_drop_queries_rules() -> impl IntoIterator<Item = RequestRule> {
[RequestRule(
Condition::RequestOpcode(RequestOpcode::Query),
// We won't respond to any queries (including metadata fetch),
// but the driver will manage to continue with dummy metadata.
// RequestReaction::drop_connection(), <--- this is removed
RequestReaction::forge().server_error(), // <---- this is added instead
)]
}
One integration test failed:
/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure
Expected: CASS_OK
To be equal to: wait_code
Which is: CASS_ERROR_SERVER_INVALID_QUERY [Invalid query]
Invalid query: Database returned an error: The query is syntactically correct but invalid, Error message: Cannot include a counter statement in a logged batch
/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/tests/test_batch.cpp:316: Failure
Expected: number_of_rows
Which is: 100
To be equal to: result.row_count()
Which is: 0
[ FAILED ] BatchCounterThreeNodeClusterTests.Integration_Cassandra_MixedPreparedAndSimple (13988 ms)
Isn't it weird that we get a server error that never occured before, after bumping the driver's version? I'll try to investigate it further.
One integration test failed:
/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure Expected: CASS_OK To be equal to: wait_code Which is: CASS_ERROR_SERVER_INVALID_QUERY [Invalid query] Invalid query: Database returned an error: The query is syntactically correct but invalid, Error message: Cannot include a counter statement in a logged batch /home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/tests/test_batch.cpp:316: Failure Expected: number_of_rows Which is: 100 To be equal to: result.row_count() Which is: 0 [ FAILED ] BatchCounterThreeNodeClusterTests.Integration_Cassandra_MixedPreparedAndSimple (13988 ms)
Isn't it weird that we get a server error that never occured before, after bumping the driver's version? I'll try to investigate it further.
The issue can be reproduced in rust driver:
#[tokio::test]
async fn test_counter_batch() {
setup_tracing();
let session = Arc::new(create_new_session_builder().build().await.unwrap());
let ks = unique_keyspace_name();
session.query(format!("CREATE KEYSPACE IF NOT EXISTS {} WITH REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}", ks), &[]).await.unwrap();
session
.query(
format!(
"CREATE TABLE IF NOT EXISTS {}.t_batch (key int PRIMARY KEY, value counter)",
ks
),
&[],
)
.await
.unwrap();
let q = format!("UPDATE {}.t_batch SET value = value + ? WHERE key = ?", ks);
let mut my_batch = Batch::new(BatchType::Counter);
assert_eq!(BatchType::Counter, my_batch.get_type());
my_batch.append_statement(&q[..]);
my_batch.append_statement(&q[..]);
my_batch.append_statement(&q[..]);
session
.batch(
&my_batch,
((Counter(1), 2), (Counter(1), 2), (Counter(1), 2)),
)
.await
.unwrap();
}
The error:
---- transport::session_test::test_my_batch stdout ----
Unique name: test_rust_1720524333_0
thread 'transport::session_test::test_counter_batch' panicked at scylla/src/transport/session_test.rs:404:10:
called `Result::unwrap()` on an `Err` value: DbError(Invalid, "Cannot include a counter statement in a logged batch")
This seems like a rust-driver issue, since you can clearly see that I've set the BatchType
to BatchType::Counter
. Previous version of the driver does not have this issue.
cc: @Lorak-mmk @wprzytula
One integration test failed:
/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure Expected: CASS_OK To be equal to: wait_code Which is: CASS_ERROR_SERVER_INVALID_QUERY [Invalid query] Invalid query: Database returned an error: The query is syntactically correct but invalid, Error message: Cannot include a counter statement in a logged batch /home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/tests/test_batch.cpp:316: Failure Expected: number_of_rows Which is: 100 To be equal to: result.row_count() Which is: 0 [ FAILED ] BatchCounterThreeNodeClusterTests.Integration_Cassandra_MixedPreparedAndSimple (13988 ms)
Isn't it weird that we get a server error that never occured before, after bumping the driver's version? I'll try to investigate it further.
The issue can be reproduced in rust driver:
#[tokio::test] async fn test_counter_batch() { setup_tracing(); let session = Arc::new(create_new_session_builder().build().await.unwrap()); let ks = unique_keyspace_name(); session.query(format!("CREATE KEYSPACE IF NOT EXISTS {} WITH REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}", ks), &[]).await.unwrap(); session .query( format!( "CREATE TABLE IF NOT EXISTS {}.t_batch (key int PRIMARY KEY, value counter)", ks ), &[], ) .await .unwrap(); let q = format!("UPDATE {}.t_batch SET value = value + ? WHERE key = ?", ks); let mut my_batch = Batch::new(BatchType::Counter); assert_eq!(BatchType::Counter, my_batch.get_type()); my_batch.append_statement(&q[..]); my_batch.append_statement(&q[..]); my_batch.append_statement(&q[..]); session .batch( &my_batch, ((Counter(1), 2), (Counter(1), 2), (Counter(1), 2)), ) .await .unwrap(); }
The error:
---- transport::session_test::test_my_batch stdout ---- Unique name: test_rust_1720524333_0 thread 'transport::session_test::test_counter_batch' panicked at scylla/src/transport/session_test.rs:404:10: called `Result::unwrap()` on an `Err` value: DbError(Invalid, "Cannot include a counter statement in a logged batch")
This seems like a rust-driver issue, since you can clearly see that I've set the
BatchType
toBatchType::Counter
. Previous version of the driver does not have this issue.cc: @Lorak-mmk @wprzytula
Please open an issue in Rust driver repo. Also, specify what version of the driver behaves correctly and what version misbehaves. If you feel like, perform git bisect
to find the culprit.
@wprzytula I opened a PR in rust-driver that will fix the failing test: https://github.com/scylladb/scylla-rust-driver/pull/1038. I think that we should filter out the failing test for now, and include it again when new release of rust-driver with a fix is out (unless we can wait for new release if it happens anytime soon, but I don't think so).
v2: addressed review comments
v2.1: bumped rust driver's version to 0.13.1
v3: as discussed with @Lorak-mmk offline, we decided to follow the same approach as cpp-driver - we don't do any typechecks during serialization. The type checks are supposed to take place during binding (e.g. binding value to tuple or statement). This approach is better, since we can separate typecheck and serialization logic - this is really handful for untyped tuples and collections (cpp-driver allows users to define such objects).
In this PR, we only introduce a naive serialization logic (custom SerializeValue
implementation for CassCqlValue
). During serialization step, we assume that type checks were already done earlier (as of now, this is obviously not true - typecheck logic will be implemented in a follow-up PR).
It's worth noting that we do not lose anything, since type checks were not performed by previous version of rust driver anyway.
Opened an issue on what should be implemented in follow-up PR with typechecks: https://github.com/scylladb/cpp-rust-driver/issues/142.
Fix: https://github.com/scylladb/cpp-rust-driver/issues/126
This PR updates the rust driver's version to 0.13.1
API breaking changes
First commit fixes some minor issues caused by API breaking changes in rust driver.
Serialization
Following commits focus on serialization. This version of rust driver already uses a new serialization framework which validates the types. In result, some integration tests are failing, since cpp-driver uses a bit different rust-to-CQL type mapping. A simple example is a
i64
value which can be mapped to 4 CQL types:bigint/counter/time/timestamp
.In this PR we address and fix this issue. We introduce a
CassCqlValue
enum which is a narrower version of rust driver'sCqlValue
with a customSerializeValue
(still calledSerializeCql
in 0.13) implementation. The serialization DOES NOT contain any typechecks. The typechecks will be introduced in a follow-up PR. We want to follow the same approach as cpp-driver, and do the typechecks during bindings (i.e. binding value to collection, tuple, UDT or statement).Pre-review checklist
.github/workflows/build.yml
ingtest_filter
.~.github/workflows/cassandra.yml
ingtest_filter
.~