ocp-power-automation / ocp4-upi-kvm

OCP4 on KVM/Power
Apache License 2.0
17 stars 20 forks source link

Issue running the automation after updating to terraform v0.13.2 #47

Closed mgiessing closed 4 years ago

mgiessing commented 4 years ago

Hello together!

After updating terraform to v0.13.2 and (I think the ignition provider to 2.1.0) the automation fails with the given error message:

[...]
module.install.null_resource.install: Still creating... [43m30s elapsed]
module.install.null_resource.install: Still creating... [43m40s elapsed]
module.install.null_resource.install: Still creating... [43m50s elapsed]
module.install.null_resource.install: Still creating... [44m0s elapsed]
module.install.null_resource.install: Still creating... [44m10s elapsed]
module.install.null_resource.install: Still creating... [44m20s elapsed]
module.install.null_resource.install: Still creating... [44m30s elapsed]
module.install.null_resource.install: Still creating... [44m40s elapsed]
module.install.null_resource.install: Still creating... [44m50s elapsed]
module.install.null_resource.install: Still creating... [45m0s elapsed]
module.install.null_resource.install: Still creating... [45m10s elapsed]
module.install.null_resource.install (remote-exec): fatal: [192.168.88.4]: FAILED! => {"changed": false, "elapsed": 2715, "msg": "timed out waiting for ping module test success: Failed to connect to the host via ssh: ssh: connect to host 192.168.88.4 port 22: Connection refused"}
module.install.null_resource.install: Still creating... [45m20s elapsed]
module.install.null_resource.install (remote-exec): fatal: [192.168.88.3]: FAILED! => {"changed": false, "elapsed": 2716, "msg": "timed out waiting for ping module test success: Failed to connect to the host via ssh: ssh: connect to host 192.168.88.3 port 22: No route to host"}
module.install.null_resource.install (remote-exec): fatal: [192.168.88.5]: FAILED! => {"changed": false, "elapsed": 2716, "msg": "timed out waiting for ping module test success: Failed to connect to the host via ssh: ssh: connect to host 192.168.88.5 port 22: No route to host"}
module.install.null_resource.install (remote-exec): fatal: [192.168.88.6]: FAILED! => {"changed": false, "elapsed": 2716, "msg": "timed out waiting for ping module test success: Failed to connect to the host via ssh: ssh: connect to host 192.168.88.6 port 22: No route to host"}

module.install.null_resource.install (remote-exec): NO MORE HOSTS LEFT *************************************************************

module.install.null_resource.install (remote-exec): PLAY RECAP *********************************************************************
module.install.null_resource.install (remote-exec): 192.168.88.3               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
module.install.null_resource.install (remote-exec): 192.168.88.4               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
module.install.null_resource.install (remote-exec): 192.168.88.5               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
module.install.null_resource.install (remote-exec): 192.168.88.6               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
module.install.null_resource.install (remote-exec): localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=12   rescued=0    ignored=0

Error: error executing "/tmp/terraform_1092551690.sh": Process exited with status 2

I faced a similar error rarely in the "old" installation but I'd say in 90% of my tries it went through. Now I haven't been able to install OCP just once (in like 8-10 tries).

My var.tfvars file looks like this:

### Configure the Libvirt Host values
libvirt_uri     = "qemu+tcp://129.40.94.241/system"
host_address    = "129.40.94.241"
images_path     = "/home/libvirt/openshift-images"

### Configure the Nodes details
bastion_image   = "/root/bastion/rhel-8.2-ppc64le-kvm.qcow2"
rhcos_image     = "/root/rhcos/4.3/rhcos-4.3.18-ppc64le-qemu.ppc64le.qcow2"
bastion         = { memory = 16384, vcpu = 2 }
bootstrap       = { memory = 16384, vcpu = 4, count = 1 }
master          = { memory = 16384, vcpu = 4, count = 3 }
worker          = { memory = 131072, vcpu = 16, count = 4 }
cpu_mode        = ""
network_cidr    = "192.168.88.0/24"
rhel_username   = "root"
rhel_password   = "123456"
public_key_file             = "~/.ssh/id_rsa.pub"
private_key_file            = "~/.ssh/id_rsa"
private_key                 = ""
public_key                  = ""
rhel_subscription_username  = "${RH_USER}"
rhel_subscription_password  = "${RH_PASS}"

### OpenShift variables
openshift_install_tarball   = "https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/4.3.18/openshift-install-linux.tar.gz"
openshift_client_tarball    = "https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/4.3.18/openshift-client-linux.tar.gz"

#release_image_override     = ""

pull_secret_file            = "data/pull-secret.txt"
cluster_domain              = "cecc.ihost.com"
cluster_id_prefix           = "edb"
cluster_id                  = ""

dns_forwarders              = "1.1.1.1; 9.9.9.9"
installer_log_level         = "info"
ansible_extra_options       = "-v"

#helpernode_tag             = "fddbbc651153ef2966e5cb4d4167990b31c01ceb"
#install_playbook_tag       = "90c7cc478c8751d0b22c163e101a0d49e15e2e08"

storage_type                = "nfs"
volume_size                 = "300" # Value in GB

#upgrade_version = ""
#upgrade_channel = ""  #(stable-4.x, fast-4.x, candidate-4.x) eg. stable-4.5
#upgrade_pause_time = "90"
#upgrade_delay_time = "600"

Thanks for you support :)

yussufsh commented 4 years ago

Could please share the tail part of bootstrap domain console messages?

virsh console <bootstrap_domain_name>

I assume you are trying ocp 4.6 version.

yussufsh commented 4 years ago

Now I see you are using ocp 4.3 binaries.

@mgiessing please check out v4.3 tag and work with Terraform 0.12 as latest is not supported with new ignition spec for booting older RHCOS 4.3 images.