scylladb / cpp-rust-driver

API-compatible rewrite of https://github.com/scylladb/cpp-driver as a wrapper for Rust driver.
GNU Lesser General Public License v2.1
16 stars 11 forks source link

cargo: update rust driver's version to 0.13.1 #138

Closed muzarski closed 1 month ago

muzarski commented 1 month ago

Fix: https://github.com/scylladb/cpp-rust-driver/issues/126

This PR updates the rust driver's version to 0.13.1

API breaking changes

First commit fixes some minor issues caused by API breaking changes in rust driver.

Serialization

Following commits focus on serialization. This version of rust driver already uses a new serialization framework which validates the types. In result, some integration tests are failing, since cpp-driver uses a bit different rust-to-CQL type mapping. A simple example is a i64 value which can be mapped to 4 CQL types: bigint/counter/time/timestamp.

In this PR we address and fix this issue. We introduce a CassCqlValue enum which is a narrower version of rust driver's CqlValue with a custom SerializeValue (still called SerializeCql in 0.13) implementation. The serialization DOES NOT contain any typechecks. The typechecks will be introduced in a follow-up PR. We want to follow the same approach as cpp-driver, and do the typechecks during bindings (i.e. binding value to collection, tuple, UDT or statement).

Pre-review checklist

muzarski commented 1 month ago

@Lorak-mmk @wprzytula Any ideas why the unit tests using scylla-proxy fail? In both tests, cass_session_execute_batch seems to return CASS_ERROR_LIB_UNABLE_TO_CONNECT for some weird reason.

wprzytula commented 1 month ago

For some reason, I could not add a driver's dependency with a standard version = "0.13" notation. I had to use a VCS path to a specific commit. The reason is that, otherwise, rust compiler would complain about scylla-rust-wrapper and scylla-proxy using different versions of DbError enum - even though, I changed the version of scylla-proxy, and it points to the commit of 0.13 release (https://github.com/scylladb/scylla-rust-driver/commit/e68adf5dcf4fb4667d37a42a628b106a028240d4).

My idea why this happens is: if you choose a specific commit for scylla-proxy, then scylla-proxy (together with scylla-cql it depends on) is fetched from GH. Choosing version for scylla instead of a commit, makes scylla (together with scylla-cql it depends on) come from crates.io, this way making those DB errors different (coming from two different source code definitions - one in GH scylla-cql and another in crates.io scylla-cql).

wprzytula commented 1 month ago

@Lorak-mmk @wprzytula Any ideas why the unit tests using scylla-proxy fail? In both tests, cass_session_execute_batch seems to return CASS_ERROR_LIB_UNABLE_TO_CONNECT for some weird reason.

The culprit is this: https://github.com/scylladb/scylla-rust-driver/pull/770. Logs show (I recommend running one of those failing tests in two terminals side by side, one before the commit that updates the driver and one after) that the updated driver, having failed the initial metadata fetch, starts the re-resolution procedure. This leads to the problem. Further investigation TODO.

wprzytula commented 1 month ago

OK, a solution is to alter the proxy rules in the test code not to drop the control connection, but instead return a DB error on attempt to fetch metadata. This way the driver does not suffer from lack of connections in the pool.

Just change this function in a following way:

    fn generic_drop_queries_rules() -> impl IntoIterator<Item = RequestRule> {
        [RequestRule(
            Condition::RequestOpcode(RequestOpcode::Query),
            // We won't respond to any queries (including metadata fetch),
            // but the driver will manage to continue with dummy metadata.
            // RequestReaction::drop_connection(), <--- this is removed
            RequestReaction::forge().server_error(), // <---- this is added instead
        )]
    }
muzarski commented 1 month ago

One integration test failed:

/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure
      Expected: CASS_OK
To be equal to: wait_code
      Which is: CASS_ERROR_SERVER_INVALID_QUERY [Invalid query]
Invalid query: Database returned an error: The query is syntactically correct but invalid, Error message: Cannot include a counter statement in a logged batch
/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/tests/test_batch.cpp:316: Failure
      Expected: number_of_rows
      Which is: 100
To be equal to: result.row_count()
      Which is: 0
[  FAILED  ] BatchCounterThreeNodeClusterTests.Integration_Cassandra_MixedPreparedAndSimple (13988 ms)

Isn't it weird that we get a server error that never occured before, after bumping the driver's version? I'll try to investigate it further.

muzarski commented 1 month ago

One integration test failed:

/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure
      Expected: CASS_OK
To be equal to: wait_code
      Which is: CASS_ERROR_SERVER_INVALID_QUERY [Invalid query]
Invalid query: Database returned an error: The query is syntactically correct but invalid, Error message: Cannot include a counter statement in a logged batch
/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/tests/test_batch.cpp:316: Failure
      Expected: number_of_rows
      Which is: 100
To be equal to: result.row_count()
      Which is: 0
[  FAILED  ] BatchCounterThreeNodeClusterTests.Integration_Cassandra_MixedPreparedAndSimple (13988 ms)

Isn't it weird that we get a server error that never occured before, after bumping the driver's version? I'll try to investigate it further.

The issue can be reproduced in rust driver:

#[tokio::test]
async fn test_counter_batch() {
    setup_tracing();
    let session = Arc::new(create_new_session_builder().build().await.unwrap());
    let ks = unique_keyspace_name();

    session.query(format!("CREATE KEYSPACE IF NOT EXISTS {} WITH REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}", ks), &[]).await.unwrap();
    session
        .query(
            format!(
                "CREATE TABLE IF NOT EXISTS {}.t_batch (key int PRIMARY KEY, value counter)",
                ks
            ),
            &[],
        )
        .await
        .unwrap();

    let q = format!("UPDATE {}.t_batch SET value = value + ? WHERE key = ?", ks);

    let mut my_batch = Batch::new(BatchType::Counter);
    assert_eq!(BatchType::Counter, my_batch.get_type());
    my_batch.append_statement(&q[..]);
    my_batch.append_statement(&q[..]);
    my_batch.append_statement(&q[..]);

    session
        .batch(
            &my_batch,
            ((Counter(1), 2), (Counter(1), 2), (Counter(1), 2)),
        )
        .await
        .unwrap();
}

The error:

---- transport::session_test::test_my_batch stdout ----
Unique name: test_rust_1720524333_0
thread 'transport::session_test::test_counter_batch' panicked at scylla/src/transport/session_test.rs:404:10:
called `Result::unwrap()` on an `Err` value: DbError(Invalid, "Cannot include a counter statement in a logged batch")

This seems like a rust-driver issue, since you can clearly see that I've set the BatchType to BatchType::Counter. Previous version of the driver does not have this issue.

cc: @Lorak-mmk @wprzytula

wprzytula commented 1 month ago

One integration test failed:

/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure
      Expected: CASS_OK
To be equal to: wait_code
      Which is: CASS_ERROR_SERVER_INVALID_QUERY [Invalid query]
Invalid query: Database returned an error: The query is syntactically correct but invalid, Error message: Cannot include a counter statement in a logged batch
/home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/tests/test_batch.cpp:316: Failure
      Expected: number_of_rows
      Which is: 100
To be equal to: result.row_count()
      Which is: 0
[  FAILED  ] BatchCounterThreeNodeClusterTests.Integration_Cassandra_MixedPreparedAndSimple (13988 ms)

Isn't it weird that we get a server error that never occured before, after bumping the driver's version? I'll try to investigate it further.

The issue can be reproduced in rust driver:

#[tokio::test]
async fn test_counter_batch() {
    setup_tracing();
    let session = Arc::new(create_new_session_builder().build().await.unwrap());
    let ks = unique_keyspace_name();

    session.query(format!("CREATE KEYSPACE IF NOT EXISTS {} WITH REPLICATION = {{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}", ks), &[]).await.unwrap();
    session
        .query(
            format!(
                "CREATE TABLE IF NOT EXISTS {}.t_batch (key int PRIMARY KEY, value counter)",
                ks
            ),
            &[],
        )
        .await
        .unwrap();

    let q = format!("UPDATE {}.t_batch SET value = value + ? WHERE key = ?", ks);

    let mut my_batch = Batch::new(BatchType::Counter);
    assert_eq!(BatchType::Counter, my_batch.get_type());
    my_batch.append_statement(&q[..]);
    my_batch.append_statement(&q[..]);
    my_batch.append_statement(&q[..]);

    session
        .batch(
            &my_batch,
            ((Counter(1), 2), (Counter(1), 2), (Counter(1), 2)),
        )
        .await
        .unwrap();
}

The error:

---- transport::session_test::test_my_batch stdout ----
Unique name: test_rust_1720524333_0
thread 'transport::session_test::test_counter_batch' panicked at scylla/src/transport/session_test.rs:404:10:
called `Result::unwrap()` on an `Err` value: DbError(Invalid, "Cannot include a counter statement in a logged batch")

This seems like a rust-driver issue, since you can clearly see that I've set the BatchType to BatchType::Counter. Previous version of the driver does not have this issue.

cc: @Lorak-mmk @wprzytula

Please open an issue in Rust driver repo. Also, specify what version of the driver behaves correctly and what version misbehaves. If you feel like, perform git bisect to find the culprit.

muzarski commented 1 month ago

@wprzytula I opened a PR in rust-driver that will fix the failing test: https://github.com/scylladb/scylla-rust-driver/pull/1038. I think that we should filter out the failing test for now, and include it again when new release of rust-driver with a fix is out (unless we can wait for new release if it happens anytime soon, but I don't think so).

muzarski commented 1 month ago

v2: addressed review comments

muzarski commented 1 month ago

v2.1: bumped rust driver's version to 0.13.1

muzarski commented 1 month ago

v3: as discussed with @Lorak-mmk offline, we decided to follow the same approach as cpp-driver - we don't do any typechecks during serialization. The type checks are supposed to take place during binding (e.g. binding value to tuple or statement). This approach is better, since we can separate typecheck and serialization logic - this is really handful for untyped tuples and collections (cpp-driver allows users to define such objects).

In this PR, we only introduce a naive serialization logic (custom SerializeValue implementation for CassCqlValue). During serialization step, we assume that type checks were already done earlier (as of now, this is obviously not true - typecheck logic will be implemented in a follow-up PR).

It's worth noting that we do not lose anything, since type checks were not performed by previous version of rust driver anyway.

muzarski commented 1 month ago

Opened an issue on what should be implemented in follow-up PR with typechecks: https://github.com/scylladb/cpp-rust-driver/issues/142.