scylladb / scylla-rust-driver

Async CQL driver for Rust, optimized for ScyllaDB!
Apache License 2.0
585 stars 108 forks source link

Move set_timestamp out of Query/PreparedStatement #262

Open piodul opened 3 years ago

piodul commented 3 years ago

The driver lacks the ability to use client-side timestamps.

While, of course, it is possible to set a custom timestamp for a statement via the CQL USING TIMESTAMP clause, the Datastax Java driver also allows to:

Source: https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/query_timestamps/

krzysztofgal commented 2 years ago

I have issue with set_timestamp. When constructing batch statements and each statement must have timestamp to force correct order of queries.

When i do query without USING TIMESTAMP UPDATE lesser_jank SET version = ?, config = ? WHERE id = ?" And setup timestamp like:

self.timestamp = AtomicI64::new(chrono::Local::now().timestamp_nanos() / 1000);
...

pub fn add_batch_stmt(&mut self, mut prepared_statement: PreparedStatement) {
        let timestamp = self.timestamp.fetch_add(1, Ordering::SeqCst);
        prepared_statement.set_timestamp(Some(timestamp));
        self.batch.append_statement(prepared_statement);
}

Then I get invalid end state which resembles timestamp collision - according to scylla documentation "If it happens, e.g. two INSERTS have the same timestamp, conflicting cell values are compared and the cells with the lexicographically bigger value prevail."

When i do query with USING TIMESTAMP: UPDATE lesser_jank USING TIMESTAMP ? SET version = ?, config = ? WHERE id = ?" Using same timestamps as above then i get correct end state.

So I am doing something completely wrong, or driver have some bug and set_timestamp() does not set timestamp.

psarna commented 2 years ago

@krzysztofgal I think our API is a little inconsistent here. What will most likely work for you (please try) is calling set_timestamp() directly on self.batch, not on individual statements. Based on code inspection, I think that our code for batches does not check whether the statements inside the batch have a timestamp set (which is a bug), but instead we only check if somebody called set_timestamp() on the batch itself. USING TIMESTAMP, on the other hand, will work regardless because it's treated with higher priority than timestamps set with set_timestamp().

krzysztofgal commented 2 years ago

Thanks for quick reply. I just tried batch.set_timestamp and it produces same invalid state. That is event sourcing system so correct order of queries inside batch is crucial (there can be multiple same queries on same PK in one BATCH). So I will have to do USING TIMESTAMP on each query, which is more room for mistakes :sweat_smile: or change my api to be able to detect duplicate queries and add to BATCH just last of it what actually makes sense anyway :smiley:.

psarna commented 2 years ago

Oh wait, if you need multiple timestamp values inside a batch statement, then I think that CQL does not really allow expressing it without using USING TIMESTAMP in each query. I read through https://github.com/apache/cassandra/blob/c7e7984008062aa2cf2de5a1fc080674bb2139ff/doc/native_protocol_v4.spec#L414-L474 and the frame only accepts a single timestamp - as if all queries inside the batch use a single timestamp. When you call execute() on a single prepared statement, then it also allows setting the timestamp via query_parameters: https://github.com/apache/cassandra/blob/c7e7984008062aa2cf2de5a1fc080674bb2139ff/doc/native_protocol_v4.spec#L403-L409 https://github.com/apache/cassandra/blob/c7e7984008062aa2cf2de5a1fc080674bb2139ff/doc/native_protocol_v4.spec#L337-L343 , but these parameters are part of the EXECUTE (or QUERY) frame, and the BATCH frame does not have any room for per-query timeouts.

So unless I misunderstand the docs, USING TIMESTAMP ? is the correct way of expressing what you want to do, according to the CQL standard. /cc @piodul, I'd appreciate a second opinion.

If I'm right, what we should do in the driver is emitting a warning if somebody set a timestamp via set_timestamp() on a query that exists only inside of a batch - because it will not work as expected and it's a pitfall.

piodul commented 2 years ago

@psarna Your explanation looks correct. I don't see anything about per-query timestamps in a batch in the protocol spec either - so, USING TIMESTAMP ? seems to be the only option.

As a side note, I think we should consider separating the timestamp parameter from the Query/PreparedStatement. This sounds like a parameter you would choose every time you issue a query - like e.g. paging state for which we already have special versions of Session::execute/query. On the other hand, query/prepared statement is a long-lived object so it doesn't make sense to add such parameters to it. Such separation would also reduce confusion like in @krzysztofgal's case as the queries/stmts you add to a batch wouldn't have timestamps.

Perhaps we should tidy up our interface a little - we should consider doing it along with https://github.com/scylladb/scylla-rust-driver/issues/354.

avelanarius commented 9 months ago

Renaming the issue so that it better describes the current state of this issue, which is:

I think we should consider separating the timestamp parameter from the Query/PreparedStatement. This sounds like a parameter you would choose every time you issue a query - like e.g. paging state for which we already have special versions of Session::execute/query. On the other hand, query/prepared statement is a long-lived object so it doesn't make sense to add such parameters to it. Such separation would also reduce confusion like in @krzysztofgal's case as the queries/stmts you add to a batch wouldn't have timestamps.

wprzytula commented 7 months ago

The option to specify timestamp for a single execution only could be included in the Execute trait design (#713). As that option should be available for any kind of statement (Unprepared, Prepared, Bound, Batch), it could be part of the trait, as in:

trait Executable {
    async fn execute(...) { ...}

    async fn execute_with_timestamp(...) { ... }
}