Closed ouamer-dahmani closed 3 weeks ago
Could you please provide your ClusterConfig
including HostSelectionPolicy
and retry policy.
Hello!
It is equivalent to the following. I used high values to see if it would help pass through the potential instability.
cluster := gocql.NewCluster(cfg.Hosts...)
cluster.Keyspace = cfg.Keyspace
cluster.Timeout = 5 * time.Second
cluster.RetryPolicy = &gocql.ExponentialBackoffRetryPolicy{
Min: 500 * time.Millisecond,
Max: 5 * time.Second,
NumRetries: 5,
}
cluster.Consistency = gocql.LocalQuorum
cluster.Authenticator = cfg.Authenticator
cluster.PoolConfig.HostSelectionPolicy = gocql.RoundRobinHostPolicy()
cluster.DisableInitialHostLookup = false
cluster.DisableShardAwarePort = true
@ouamer-dahmani , what most likely happens is this:
RoundRobinHostPolicy
to find one that has connections to it and could be used to execute query. Since it finds no such hosts, it end up returning &Iter{err: ErrNoConnections}
It works the same way on modern version as well, so you can't fix it by upgrading the driver. I would suggest to manually retry on this error, until we fix retry logic
I am closing this issue in favor of https://github.com/scylladb/gocql/issues/326. But feel free to continue discussion here if it is related to given case.
Hello,
I am encountering issues where queries are not being retried despite a retry policy being configured when creating a new
Cluster
object.Reads and writes work fine but then at some point we get errors on some of them:
gocql: no hosts available in the pool
. Delving in the code I see that it should indeed retry the queries (I forced a query execution error in the debugger).I then added logging to the cluster:
The logger gets called for queries that succeed but never for those that fail. I wonder if it is because the queries are not even ran once due to no hosts being in the connection pool?
I sometimes see connection events before the failures (can be a few milliseconds or minutes) but that is not always the case and they are not error logs either.
Connect: Dial Duration: 5.383348ms, Host: 10.173.92.242
I know that the network on my kubernetes cluster is a bit flaky sometimes but I assume this should be taken care of gracefully with reconnections on the connection pool and retries on the queries.
I am running version
v1.13.0
of the driver. I see thatv1.14.X
have changes around connections but am unsure they are related to the issues I am having and have held off on updating due to lack of time to test it out.