scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 95 forks source link

[Manager tests] Azure image for RHEL or Rocky #8226

Open mikliapko opened 3 months ago

mikliapko commented 3 months ago

We are struggling to find working, non-plan requiring RHEL or Rocky image in replacement to deprecated CentOS for Manager testing purposes on Azure.

Two problems we faced:

  1. Majority RHEL and Rocky images requires plan information to be send in instance creation request. Such feature doesn't have a support on SCT side currently. Currently we can create instance which is based only on non-plan requiring images.
  2. The second category of images (which don't require plan info) fails on OS-specific configuration steps: -- image - RedHat:RHEL:9_4:latest; -- job link;
    13:25:11  ERROR: test_backup_feature (mgmt_cli_test.MgmtCliTest)
    13:25:11  ----------------------------------------------------------------------
    13:25:11  Traceback (most recent call last):
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 185, in wrapper
    13:25:11      return method(*args, **kwargs)
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 119, in inner
    13:25:11      res = func(*args, **kwargs)
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 958, in setUp
    13:25:11      self.init_resources()
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 1926, in init_resources
    13:25:11      self.get_cluster_azure(loader_info=loader_info, db_info=db_info,
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 1350, in get_cluster_azure
    13:25:11      self.monitors = MonitorSetAzure(image_id=azure_image_monitor,
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_azure.py", line 336, in __init__
    13:25:11      AzureCluster.__init__(self,
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_azure.py", line 191, in __init__
    13:25:11      super().__init__(cluster_uuid=cluster_uuid,
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 3227, in __init__
    13:25:11      self.add_nodes(nodes_per_az[az_index], rack=rack, enable_auto_bootstrap=self.auto_bootstrap)
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_azure.py", line 205, in add_nodes
    13:25:11      instances = self._create_instances(count, instance_dc, instance_type=instance_type)
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_azure.py", line 246, in _create_instances
    13:25:11      return provision_instances_with_fallback(self.provisioners[dc_idx], definitions=definitions, pricing_model=pricing_model,
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_provision/instances_provider.py", line 53, in provision_instances_with_fallback
    13:25:11      wait_cloud_init_completes(remoter=remoter, instance=v_m)
    13:25:11    File "/home/ubuntu/scylla-cluster-tests/sdcm/provision/helpers/cloud_init.py", line 55, in wait_cloud_init_completes
    13:25:11      raise CloudInitError("Errors during cloud-init provisioning phase. See logs for errors.")
    13:25:11  sdcm.provision.helpers.cloud_init.CloudInitError: Errors during cloud-init provisioning phase. See logs for errors.

    or -- image - resf:rockylinux-x86_64:9-base:9.3.20231113; -- job link

    22:45:39  Command: 'sudo cloud-init --version 2>&1'
    22:45:39  
    22:45:39  Stdout:
    22:45:39  
    22:45:39  
    22:45:39  
    22:45:39  Stderr:
    22:45:39  
    22:45:39  
    22:45:39  
    22:45:39  Exception:  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 588, in run
    22:45:39      self.connect()
    22:45:39    File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 524, in connect
    22:45:39      raise ConnectTimeout(ex_msg) from exc
    22:45:39  
    22:45:39  Failed to connect in 60 seconds, last error: (ConnectError)Error connecting to host '20.42.94.33:22' - timed out
mikliapko commented 3 months ago

@fruch I wonder if it's something the SCT infrastructure team may help us with?

cc: @rayakurl

fruch commented 3 months ago

@fruch I wonder if it's something the SCT infrastructure team may help us with?

cc: @rayakurl

@soyacz

Azure and cloud-init, your favorite cup of tea :), can you help @mikliapko with this one ?

soyacz commented 3 months ago

@fruch I wonder if it's something the SCT infrastructure team may help us with? cc: @rayakurl

@soyacz

Azure and cloud-init, your favorite cup of tea :), can you help @mikliapko with this one ?

Sure, I'll try to look at it later today (I'll run the job and peek into instance to get the idea what's wrong). @mikliapko please remind me tomorrow if I don't do it today.

rayakurl commented 3 months ago

@fruch I wonder if it's something the SCT infrastructure team may help us with? cc: @rayakurl

@soyacz Azure and cloud-init, your favorite cup of tea :), can you help @mikliapko with this one ?

Sure, I'll try to look at it later today (I'll run the job and peek into instance to get the idea what's wrong). @mikliapko please remind me tomorrow if I don't do it today.

Thanks @soyacz , @mikliapko is on a PTO today, so he can contact you only tomorrow anyways :)

soyacz commented 3 months ago

syslog-ng cannot be installed out-of-the box with current EPEL 7 repo setup. Need to switch to the newer one. Update sdcm.provision.user_data.UserDataBuilder.yum_repos with a https://dl.fedoraproject.org/pub/epel/epel{,-next}-release-latest-9.noarch.rpm and epel key to https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9

(possibly need to use http, but I'm not sure, also maybe version 8 would work, but I didn't test it - I tried manually install the above and managed to get syslog-ng there).

maybe something like this will work:

"yum_repos":
    {
        "epel-release": {
            "baseurl": "https://dl.fedoraproject.org/pub/epel/9/Everything/$basearch",
            "enabled": True,
            "failovermethod": "priority",
            "gpgcheck": True,
            "gpgkey": "https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9",
            "name": "Extra Packages for Enterprise Linux 9 - Everything"
        }
    }

@fruch this is one thing that will surely fail when using syslog-ng in artifact tests

fruch commented 3 months ago

syslog-ng cannot be installed out-of-the box with current EPEL 7 repo setup. Need to switch to the newer one. Update sdcm.provision.user_data.UserDataBuilder.yum_repos with a https://dl.fedoraproject.org/pub/epel/epel{,-next}-release-latest-9.noarch.rpm and epel key to https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9

(possibly need to use http, but I'm not sure, also maybe version 8 would work, but I didn't test it - I tried manually install the above and managed to get syslog-ng there).

maybe something like this will work:

"yum_repos":
    {
        "epel-release": {
            "baseurl": "https://dl.fedoraproject.org/pub/epel/9/Everything/$basearch",
            "enabled": True,
            "failovermethod": "priority",
            "gpgcheck": True,
            "gpgkey": "https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9",
            "name": "Extra Packages for Enterprise Linux 9 - Everything"
        }
    }

@fruch this is one thing that will surely fail when using syslog-ng in artifact tests

Why do we need to define it like that ? Why can we just install epel ? and assume it's available ?

soyacz commented 3 months ago

@fruch this is one thing that will surely fail when using syslog-ng in artifact tests

Why do we need to define it like that ? Why can we just install epel ? and assume it's available ?

I tried installing epel with sudo dnf install epel-release and didn't help. I think syslog-ng version provided there in rocky9 by default is old and requires not existing ssl/crypto libraries:

Problem: conflicting requests
  - nothing provides libcrypto.so.10()(64bit) needed by syslog-ng-3.5.6-3.el7.x86_64 from epel-release
  - nothing provides libcrypto.so.10(libcrypto.so.10)(64bit) needed by syslog-ng-3.5.6-3.el7.x86_64 from epel-release
  - nothing provides libssl.so.10()(64bit) needed by syslog-ng-3.5.6-3.el7.x86_64 from epel-release
  - nothing provides libssl.so.10(libssl.so.10)(64bit) needed by syslog-ng-3.5.6-3.el7.x86_64 from epel-release
  - nothing provides libwrap.so.0()(64bit) needed by syslog-ng-3.5.6-3.el7.x86_64 from epel-release
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
2024-08-05 11:08:41,233 - util.py[WARNING]: Failed to install packages: ['syslog-ng']

adding epel like above comment upgraded syslog-ng version and allowed to install it.