scylladb / java-driver

ScyllaDB Java Driver for ScyllaDB and Apache Cassandra, based on the DataStax Java Driver
Apache License 2.0
62 stars 37 forks source link

Prefer primary replica for the token for LWT queries #24

Closed kostja closed 1 year ago

kostja commented 4 years ago

The way Paxos protocol works is that if two queries attempt to update the same key from different coordinators, they will start two independent Paxos rounds. Each round will be assigned a timestamp, and the coordinator who has the highest timestamp will win. The issue is, the loser only queues up after the winner (using a semaphore) if both rounds are coordinated by the same node. If rounds are started at different nodes, the only option for the user is to sleep increasingly random interval and retry (this is what our implementation does).

If the key is contended and the driver is neither shard nor token aware, this leads to a lot of retries to make an update. If the key is contended and the driver is token and shard aware, it will send the query to one of the replicas for the partition, but will round-robin between replicas.

This still means there will be at least 50% of loser queries, which will retry before they can commit. Retries against a contended key in multi-DC setup lead to increasingly growing delays and time outs. If it takes 100 milliseconds to do a round due to network topology, and the key is contended, we get as few as 1 query per second for such a key. If all queries queue up on the same coordinator, we can get to up to 10 QPS per key.

This is why for LWT queries the driver should choose replicas in a pre-defined order, so that in case of contention they will queue up at the same replica, rather than compete: choose the primary replica first, then, if the primary is known to be down, the first secondary, then the second secondary, and so on.

This will reduce contention over hot keys and thus increase LWT performance.

If LOCAL_SERIAL serial consistency is used, we should prefer the primary in the local DC, because only local DCs endpoints will participate in the Paxos round. If SERIAL consistency is specified in a multi-DC setup, we should use any DCs primary, but consistently use the same primary for all queries on all clients. The key to avoiding contention is in all clients consistently choosing the same replica for the same key, if it's available/alive.

See also https://github.com/scylladb/gocql/issues/40

kostja commented 4 years ago

Cassandra issue: https://issues.apache.org/jira/browse/CASSANDRA-15746

avelanarius commented 2 years ago

Support for this optimization is already implemented in Java Driver 3.x, via those commits (no PR was made, it was directly committed by @haaawk):

https://github.com/scylladb/java-driver/commit/47552b75bf51102f5c01f5c509a97fb887d8239a https://github.com/scylladb/java-driver/commit/d2cfc3000bb3ed5760ea1e205e023508f34eadb2 https://github.com/scylladb/java-driver/commit/f724fdcb3128465cb7e383a3de12d3b874b49698 https://github.com/scylladb/java-driver/commit/674f33f2df5fc5e9eebefe78b5973d2ab83dfbd0 https://github.com/scylladb/java-driver/commit/9aebd649878da24c8c9ed2543af4adef8f595c96 https://github.com/scylladb/java-driver/commit/f91f3b19f34a4142382f3616808b879eb4d0f283 https://github.com/scylladb/java-driver/commit/cd5faa9ec2c0afcf5f79a668b432ba3359491069 https://github.com/scylladb/java-driver/commit/5be42b69e29e2ddad767212d7f931055abea403d https://github.com/scylladb/java-driver/commit/f4aeff60d277504f7763c16c0c4138dd325ecc47

Support for this feature is missing in Java Driver 4.x. It seems that the work can be split into a few smaller subtasks (parsing LwtInfo, parsing the response of prepared statements whether a query is LWT, picking a correct replica when doing a query).

Gor027 commented 1 year ago

@avelanarius It seems that this PR https://github.com/scylladb/java-driver/pull/125 has resolved the issue, so can this issue be closed now?