scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
48 stars 33 forks source link

scylla manager not working adding the cluster #3902

Open vignesh-v3 opened 1 week ago

vignesh-v3 commented 1 week ago

I have created the auth token from the scylla manager node and then added it in all the scylla manager agent nodes. once that is done i used the below command in the scylla manager node

sctool cluster add --host <scylla manager agent ip> -name scylla-cluster  --auth-token "my auth token"

which i assume it may create a new scylla cluster and add a scylla manager agent host. but unfortunately the above command fails

image

the scylla manager node is provisioned on a cloud provider and the scylla manager agents are provisioned on a baremetal servers. I have also opened the inbound connections on port 10001 and port 9042 on all the scylla manager agent to allow scylla manager to reach. It would be great if i can get some help. TIA

karol-kokoszka commented 1 week ago

@vignesh-v3 any chance to attach scylla-manager logs from the time when the cluster was being added ?

vignesh-v3 commented 1 week ago

@vignesh-v3 any chance to attach scylla-manager logs from the time when the cluster was being added ?

scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"dial tcp x.x.x.x8:10001: i/o timeout","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"dial tcp x.x.x.x:10001: i/o timeout","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"dial tcp x.x.x.66:10001: i/o timeout","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"dial tcp x.x.x.70:10001: i/o timeout","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:49:51.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceScyllaReleaseVersionGet","wait":"1s","error":"after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x4","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x6","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x2","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x0","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x0","err":"giving up after 2 attempts: dial tcp x.x.x.x:10001: i/o timeout","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x6","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x8","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Host check FAILED","hosts":"x.x.x.x2","err":"giving up after 2 attempts: after 30s: context deadline exceeded","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
scylla-manager[30951]: {"L":"INFO","T":"2024-06-25T14:50:22.989Z","N":"cluster.client","M":"Done checking hosts connectivity","_trace_id":"zqRjmuVPS_OuDHgU1olaeg"}
karol-kokoszka commented 1 week ago

This is just a suspicion, but I feel there is some networking problem in your setup. As per my understanding nc -vz ... is just kind of port scanning, so it checks if the port is open. If port is open, then it closes the connection without sending anything. When the TCP connection is created from go app, then apart from checking if port is open, it exchanges the data as well.

You can try running from the same host where the manager is running

import (
    "fmt"
    "net"
    "time"
)

func main() {
    timeout := 10 * time.Second // Increase as needed
    conn, err := net.DialTimeout("tcp", "x.x.x.x:10001", timeout)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    defer conn.Close()
    fmt.Println("Connection successful")
}
vignesh-v3 commented 1 week ago

This is just a suspicion, but I feel there is some networking problem in your setup. As per my understanding nc -vz ... is just kind of port scanning, so it checks if the port is open. If port is open, then it closes the connection without sending anything. When the TCP connection is created from go app, then apart from checking if port is open, it exchanges the data as well.

You can try running from the same host where the manager is running

import (
    "fmt"
    "net"
    "time"
)

func main() {
    timeout := 10 * time.Second // Increase as needed
    conn, err := net.DialTimeout("tcp", "x.x.x.x:10001", timeout)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    defer conn.Close()
    fmt.Println("Connection successful")
}
image
karol-kokoszka commented 5 days ago

@vignesh-v3 are these blurred IP the correct IPs ? Do they belong to the cluster you want to add ?

vignesh-v3 commented 3 days ago

@karol-kokoszka yes they are correct, they are public ips

Do they belong to the cluster you want to add ?

what do you mean by that. i intend to add the nodes to the cluster thats what its failing.

karol-kokoszka commented 1 day ago

@karol-kokoszka yes they are correct, they are public ips

Do they belong to the cluster you want to add ?

what do you mean by that. i intend to add the nodes to the cluster thats what its failing.

Whenever you want to add cluster to manager, you just provide one of the host that is going to be used as a coordinator https://manager.docs.scylladb.com/stable/sctool/cluster.html#host

Having this host, manager calls it to discover nodes and then, it is probing all of these nodes against availability of the manager agent here.

https://github.com/scylladb/scylla-manager/blob/8d9190b5a0e12e0ec0ef611aa9295e703b30d741/pkg/service/cluster/service.go#L464-L502

vignesh-v3 commented 1 day ago

@karol-kokoszka how do we configure a coordinator scylla host?

karol-kokoszka commented 1 day ago

This is the coordinator https://manager.docs.scylladb.com/stable/sctool/cluster.html#host