scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
55 stars 93 forks source link

Can't connect via SSH to the monitoring node based on Rocky 9 image #7678

Open mikliapko opened 2 months ago

mikliapko commented 2 months ago

Issue description

Monitoring node setup based on Rocky 9 distribution (ami_id: ami-09fb459fad4613d55, official Rocky 9 image) fails on SSH connection to the node.

12:52:46  Traceback (most recent call last):
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 182, in wrapper
12:52:46      return method(*args, **kwargs)
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 119, in inner
12:52:46      res = func(*args, **kwargs)
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 909, in setUp
12:52:46      self.init_resources()
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 1861, in init_resources
12:52:46      self.get_cluster_aws(loader_info=loader_info, db_info=db_info,
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 1462, in get_cluster_aws
12:52:46      self.monitors = MonitorSetAWS(
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_aws.py", line 1067, in __init__
12:52:46      AWSCluster.__init__(self,
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_aws.py", line 112, in __init__
12:52:46      super().__init__(cluster_uuid=cluster_uuid,
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 3204, in __init__
12:52:46      self.add_nodes(nodes_per_az[az_index], rack=rack, enable_auto_bootstrap=self.auto_bootstrap)
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_aws.py", line 397, in add_nodes
12:52:46      node = self._create_node(instance, self._ec2_ami_username, self.node_prefix,
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_aws.py", line 416, in _create_node
12:52:46      node.init()
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_aws.py", line 507, in init
12:52:46      super().init()
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 363, in init
12:52:46      self.wait_ssh_up(verbose=True)
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 1208, in wait_ssh_up
12:52:46      wait.wait_for(func=self.remoter.is_up, step=10, text=text, timeout=timeout, throw_exc=True)
12:52:46    File "/home/ubuntu/scylla-cluster-tests/sdcm/wait.py", line 86, in wait_for
12:52:46      raise raising_exc from ex
12:52:46  sdcm.exceptions.WaitForTimeoutError: Wait for: manager-installation-manager--monitor-node-0e867e7a-1: Waiting for SSH to be up: timeout - 500 seconds 

https://github.com/scylladb/scylla-cluster-tests/blob/master/sdcm/remote/libssh2_client/session.py#L56 LIBSSH2_ERROR_EAGAIN error returns here during all 500 seconds if waiting.

How frequently does it reproduce?

100% reproducibility.

Installation details

Rocky 9 image ami: 'ami-09fb459fad4613d55'

Logs

fruch commented 2 months ago

Rocky9 has a ssh version that rejects rsa keys by default

With some specific client configuration one can override it, but this option doesn't work with libssh we are using.

So one can use the same key the rocky9 artifact test is using

Or wait for https://github.com/scylladb/scylla-cluster-tests/pull/7478

To clear out all usages of the rsa keys

mikliapko commented 2 months ago

Thanks Israel!

Good to know that we already have a PR to fix the issue. I will just wait for it to be merged.

mikliapko commented 2 months ago

Keeping the issue open until I retest Rocky 9 again after https://github.com/scylladb/scylla-cluster-tests/pull/7478 merge.