ocp-power-automation / ocp4-upi-powervm-hmc

OpenShift on IBM PowerVM servers managed using HMC
Apache License 2.0
6 stars 6 forks source link

Installer failed/timed out at master-0 node config step #5

Open oaomer opened 2 years ago

oaomer commented 2 years ago

I am trying to use this project to install OCP 4.10 on a tech zone/CECC kit. I followed the readme instructions and vars-powervm.yaml looks good to me, yet the installation continuously failed due to timeout during the TASK [nodes-config : Check connection] of master-0 node. The timeout is set to 2700s (=45min) in the vars yaml file and it waited the whole 45 minutes and then it failed.

TASK [ocp-config : Skip config if install workdir exist] **************************************************************************
ok: [129.40.126.241]

TASK [ocp-config : meta] **********************************************************************************************************

PLAY [Check and configure bootstrap node] *****************************************************************************************

TASK [nodes-config : Check connection] ********************************************************************************************
ok: [129.40.126.242]

TASK [nodes-config : Configure node] **********************************************************************************************
[WARNING]: Distribution redhat 4.10 on host 129.40.126.242 should use /usr/bin/python, but is using /usr/libexec/platform-python,
since the discovered platform python interpreter was not present. See https://docs.ansible.com/ansible-
core/2.12/reference_appendices/interpreter_discovery.html for more information.
changed: [129.40.126.242]

PLAY [Check and configure control-plane nodes] ************************************************************************************

TASK [nodes-config : Check connection] ********************************************************************************************
fatal: [129.40.126.243]: FAILED! => {"changed": false, "elapsed": 2715, "msg": "timed out waiting for ping module test: Failed to connect to the host via ssh: ssh: connect to host 129.40.126.243 port 22: Connection refused"}

NO MORE HOSTS LEFT ****************************************************************************************************************

PLAY RECAP ************************************************************************************************************************
129.40.126.241             : ok=141  changed=55   unreachable=0    failed=0    skipped=135  rescued=1    ignored=0   
129.40.126.242             : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
129.40.126.243             : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

[root@p664-bastion ocp4-upi-powervm-hmc]#

No useful info in the logs despite elevating the log level to debug. Only seeing this block of log repeated displayed in /var/log/messages every 30 sec:

Oct 19 01:21:57 p664-bastion systemd[1]: helper-tftp.service: Succeeded.
Oct 19 01:22:27 p664-bastion systemd[1]: helper-tftp.service: Service RestartSec=30s expired, scheduling restart.
Oct 19 01:22:27 p664-bastion systemd[1]: helper-tftp.service: Scheduled restart job, restart counter is at 16735.
Oct 19 01:22:27 p664-bastion systemd[1]: Stopped Starts TFTP on boot because of reasons.
Oct 19 01:22:27 p664-bastion systemd[1]: Started Starts TFTP on boot because of reasons.
Oct 19 01:22:27 p664-bastion systemd[1]: helper-tftp.service: Succeeded.
marcopain commented 2 years ago

I am facing exact same issue. Any solution for this?