Closed pdbossman closed 1 week ago
@mykaul what's "high" here? To issue an ERROR message when the DC name isn't correct?
@sylwiaszunejko let's start by understanding what other drivers do and why gocql swallows it or fallback to this kind of behavior (could be that someone asked for it).
For my tests I created a 3-node cluster in a DC named datacenter1
.
Here's the reproducer:
package main
import (
"context"
"fmt"
"log"
"github.com/gocql/gocql"
)
func main() {
cluster := gocql.NewCluster("localhost:9042")
// Correct datacenter name, should work as expected - routes LWT to the primary replica.
//cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("datacenter1"))
// Incorrect case, will fallback to route LWTs to all replicas.
cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DaTaCeNtEr1"))
session, err := cluster.CreateSession()
defer session.Close()
ctx := context.Background()
session.Query("create keyspace example with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }").WithContext(ctx).Exec()
session.Query("create table example.my_lwt_table(pk int, ck int, value text, PRIMARY KEY(pk,ck))").WithContext(ctx).Exec()
for i := 1; i <= 10000; i++ { // 10k is enough to see impact on metrics over several seconds
m := make(map[string]interface{})
// Send all writes to partition "1", but different CKs.
applied, err := session.Query("INSERT INTO example.my_lwt_table (pk, ck, value) VALUES (?, ?, ?) IF NOT EXISTS", 1, i, "a").WithContext(ctx).MapScanCAS(m)
if err != nil {
log.Fatal(err)
}
fmt.Println(i, applied, m)
}
}
@pdbossman do I understand correctly that you want the driver to check id the dc passed to the policy exists in the cluster? I checked in other drivers (python-driver, rust driver, java-driver 4.x) and I can't find any checking like that. I guess every driver assumes that the user will pass the dc in the correct case.
In the original issue, someone claimed that with Python driver it immediately fails with an error.
@roydahan Could you provide error message? I cannot find this part of the code
No one provided the error message. Best would be to try quickly reproduce it with Python driver and see it.
OK, I see. In python-driver it will fail, but not because the dc name provided is wrong, but because in python-driver there is an additional field in the DCAwareRoundRobinPolicy
named used_hosts_per_remote_dc
and it is 0 by default, so if there is no hosts in the local dc (it is because the name is wrong) the driver cannot contact any node so the request will fail. But if we use different number e.g. TokenAwarePolicy(DCAwareRoundRobinPolicy("WRONG_DC", used_hosts_per_remote_dc=3))
everything will work as in gocql
.
In java-driver 4.x there is also similar mechanism as in python-driver, there is a field usedHostsPerRemoteDc
so if it is set to 0, request will fail but not because of the wrong dc.
In rust driver there is permit_dc_failover
option in the DefaultPolicy
. If it is set to true driver can use replicas in dcs other than local. By default it is false.
So rather than checking if the dc is correct we need some kind of control over the use of remote dcs.
The behavior that was most undesirable was the loss of lwt optimized routing. This resulted in extremely high contention and it was very difficult to identify the cause. In addition, monitoring was displaying the dc as upper case data center, while the data center was defined as lower case in the schema.
@sylwiaszunejko - the regretful default behavior should be not to connect to the DC, if the DC name is different than entered. Let's make sure this is the case.
@sylwiaszunejko - the regretful default behavior should be not to connect to the DC, if the DC name is different than entered. Let's make sure this is the case.
After your comments here https://github.com/scylladb/gocql/pull/202 I thought that you agree that by default the driver should use remote hosts if there is no host available in the DC provided, now I am confused to be honest.
Or do you mean we should add checking if the DC name provided is present in the cluster even though there is no such check in any other driver?
@sylwiaszunejko - the regretful default behavior should be not to connect to the DC, if the DC name is different than entered. Let's make sure this is the case.
After your comments here #202 I thought that you agree that by default the driver should use remote hosts if there is no host available in the DC provided, now I am confused to be honest.
Or do you mean we should add checking if the DC name provided is present in the cluster even though there is no such check in any other driver?
Indeed, the latter. It's a configuration error that should be caught immediately and fixed by the user.
Other drivers might need fixes as well.
Even if we choose to do this enhancement of checking the DC name, it should be a new RFE, cross drivers that should start a discussion and identify possible corners we don't think of.
I think that the current PR should be merged as is, without changing the default and possibly breaking availability for current users.
I think this was fixed by https://github.com/scylladb/gocql/pull/206
Please answer these questions before submitting your issue. Thanks!
ScyllaDB 2024.1.4 goCQL v1.14.1
When data center in schema is lower case and goCQL is passed an uppercase dc, lwt routing optimization is lost. Eg. cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DC1"))
I recommend you set it as this: cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("dc1"))
goCQL should error with incorrect data center name.
cc: @hopugop can you attach the reproducer please.