openshift-metal3 / dev-scripts

Scripts to automate development/test setup for openshift integration with https://github.com/metal3-io/
Apache License 2.0
92 stars 182 forks source link

RHEL9: libvirt-sock file not found error during cluster bringup #1657

Open pperiyasamy opened 2 months ago

pperiyasamy commented 2 months ago

Describe the bug

OCP cluster installation failed with error:

failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory

To Reproduce

Bring up OCP cluster (4.15 nightly) with following steps:

$ git clone https://github.com/openshift-metal3/dev-scripts && \
cd dev-scripts/
# make
$ diff config_example.sh config_peri.sh
12c12
< export CI_TOKEN=''
---
> export CI_TOKEN='xxxxxxxx'
36,37c36,37
< #
< #export OPENSHIFT_RELEASE_STREAM=4.15
---
> 
> export OPENSHIFT_RELEASE_STREAM=4.15
227c227
< #export IP_STACK=v4
---
> export IP_STACK=v4
294c294
< #export NETWORK_TYPE="OVNKubernetes"
---
> export NETWORK_TYPE="OVNKubernetes
$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="9.4 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.4 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.

Expected/observed behavior

level=debug msg=[INFO] running Terraform command: /home/peri/dev-scripts/ocp/ostest/terraform/bin/terraform init -no-color -input=false -backend=true -get=true -upgrade=false -plugin-dir=/home/peri/dev-scripts/ocp/ostest/terraform/plugins
level=debug
level=debug msg=Initializing the backend...
level=debug
level=debug msg=Initializing provider plugins...
level=debug msg=- Finding latest version of openshift/local/ironic...
level=debug msg=- Finding latest version of openshift/local/libvirt...
level=debug msg=- Installing openshift/local/ironic v1.0.0...
level=debug msg=- Installed openshift/local/ironic v1.0.0 (unauthenticated)
level=debug msg=- Installing openshift/local/libvirt v1.0.0...
level=debug msg=- Installed openshift/local/libvirt v1.0.0 (unauthenticated)
level=debug
level=debug msg=Terraform has created a lock file .terraform.lock.hcl to record the provider
level=debug msg=selections it made above. Include this file in your version control repository
level=debug msg=so that Terraform can guarantee to make the same selections by default when
level=debug msg=you run "terraform init" in the future.
level=debug
level=debug
level=debug msg=Warning: Incomplete lock file information for providers
level=debug
level=debug msg=Due to your customized provider installation methods, Terraform was forced to
level=debug msg=calculate lock file checksums locally for the following providers:
level=debug msg=  - openshift/local/ironic
level=debug msg=  - openshift/local/libvirt
level=debug
level=debug msg=The current .terraform.lock.hcl file only includes checksums for linux_amd64,
level=debug msg=so Terraform running on another platform will fail to install these
level=debug msg=providers.
level=debug
level=debug msg=To calculate additional checksums for another platform, run:
level=debug msg=  terraform providers lock -platform=linux_amd64
level=debug msg=(where linux_amd64 is the platform to generate)
level=debug
level=debug msg=Terraform has been successfully initialized!
level=debug msg=[INFO] running Terraform command: /home/peri/dev-scripts/ocp/ostest/terraform/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-bootstrap-3795376676/terraform.tfvars.json -var-file=/tmp/openshift-install-bootstrap-3795376676/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true
level=error
level=error msg=Error: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory
level=error
level=error msg=  with provider["openshift/local/libvirt"],
level=error msg=  on main.tf line 1, in provider "libvirt":
level=error msg=   1: provider "libvirt" {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failure applying terraform for "bootstrap" stage: error applying Terraform configs: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory
level=error
level=error msg=  with provider["openshift/local/libvirt"],
level=error msg=  on main.tf line 1, in provider "libvirt":
level=error msg=   1: provider "libvirt" {
level=error
level=error
+(utils.sh:1): create_cluster(): auth_template_and_removetmp
+(utils.sh:866): auth_template_and_removetmp(): echo 4
+(utils.sh:867): auth_template_and_removetmp(): generate_auth_template
+(utils.sh:327): generate_auth_template(): set +x
E0502 06:48:12.764378   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:15.836414   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:18.908310   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:21.980273   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:25.052182   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
Unable to connect to the server: dial tcp 192.168.111.5:6443: connect: no route to host

Additional context

The following change in configure host script fixes the problem.

$ git diff
diff --git a/02_configure_host.sh b/02_configure_host.sh
index 4f1ef60..f40d14f 100755
--- a/02_configure_host.sh
+++ b/02_configure_host.sh
@@ -31,6 +31,7 @@ manage_libvirtd() {
           sudo systemctl restart libvirtd.service
         ;;
 esac
+sudo systemctl restart libvirtd.service
 }

 # Generate user ssh key