Closed mikliapko closed 1 month ago
From SM logs:
May 13 09:51:39 {"L":"INFO","T":"2024-05-13T09:51:39.603Z","M":"Schema up to date","keyspace":"scylla_manager","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:39 {"L":"INFO","T":"2024-05-13T09:51:39.627Z","M":"Initializing the clusters configuration cache","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:39 {"L":"INFO","T":"2024-05-13T09:51:39.632Z","N":"cluster","M":"Creating new Scylla HTTP client","cluster_id":"c0e3ca23-81c5-4275-a1d6-9e8a94f305b2","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:39 {"L":"INFO","T":"2024-05-13T09:51:39.639Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"927.211008ms","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:40 {"L":"INFO","T":"2024-05-13T09:51:40.567Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"2.000883071s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:42 {"L":"INFO","T":"2024-05-13T09:51:42.569Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"4.495824842s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:47 {"L":"INFO","T":"2024-05-13T09:51:47.066Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"7.07442966s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:54 {"L":"INFO","T":"2024-05-13T09:51:54.142Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"13.149685425s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:52:07 {"L":"INFO","T":"2024-05-13T09:52:07.293Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"32.521097873s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:52:39 {"L":"INFO","T":"2024-05-13T09:52:39.816Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"35.929166916s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:53:15 {"L":"INFO","T":"2024-05-13T09:53:15.745Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"31.28088056s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:53:47 {"L":"INFO","T":"2024-05-13T09:53:47.027Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"27.507559245s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:54:14 {"L":"ERROR","T":"2024-05-13T09:54:14.537Z","N":"cluster","M":"Cannot find known hosts using coordinator host","host":"","error":"giving up after 10 attempts: dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/scyllaclient.(*retryableOperation).submit\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/retry.go:177\ngithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient.retryableTransport.Submit\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/retry.go:155\ngithub.com/scylladb/scylla-manager/v3/swagger/gen/scylla/v1/client/operations.(*Client).StorageServiceHostIDGet\n\tgithub.com/scylladb/scylla-manager/v3/swagger/gen/scylla/v1/client/operations/operations_client.go:11791\ngithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient.(*Client).Datacenters\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/client_scylla.go:161\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).discoverHosts\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).CreateClientNoCache\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:162\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateSingle\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:131\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateAll.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:182\nruntime.goexit\n\truntime/asm_amd64.s:1695\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).CreateClientNoCache\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:164\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateSingle\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:131\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateAll.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:182"}
May 13 09:54:14 {"L":"INFO","T":"2024-05-13T09:54:14.541Z","N":"cluster.client","M":"Measuring datacenter latencies","dcs":["us-east"],"_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:54:14 {"L":"INFO","T":"2024-05-13T09:54:14.560Z","M":"Clusters config cache initialized","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
As it can be seen above, SM tries to connect to the host represented by an empty string ""
. It comes from:
func (s *Service) CreateClientNoCache(ctx context.Context, clusterID uuid.UUID) (*scyllaclient.Client, error) {
s.logger.Info(ctx, "Creating new Scylla HTTP client", "cluster_id", clusterID)
c, err := s.GetClusterByID(ctx, clusterID)
if err != nil {
return nil, err
}
config := scyllaclient.DefaultConfigWithTimeout(s.timeoutConfig)
if c.Port != 0 {
config.Port = fmt.Sprint(c.Port)
}
config.AuthToken = c.AuthToken
config.Hosts = append([]string{c.Host}, c.KnownHosts...) // <- bug
This can't happen with clusters added to SM 3.2, as it persists the --host
flag value in DB. That's not the case for SM 3.1, where only obtained KnownHosts
were saved and --host
discarded. This results in SM trying to connect to an empty host first, which slows down the process of creating scylla client and creates lots of error logs. The client is eventually created when SM moves to connecting to KnowHosts
, so this is not a correctness issue.
Steps:
systemctl start scylla-manager
sctool version
Actual result: Error
Error: Get "http://127.0.0.1:12345/api/v1/version": dial tcp 127.0.0.1:12345: connect: connection refused
while getting the version of client.Expected result: In older versions it was enough to sleep 30 seconds before running
sctool version
to get the versions. So, expected the same behavior here as well.Additional info:
3.2.7-0.20240509.769fc4f1a
;