Open rakeshk121 opened 11 months ago
Hey, I'll try reproducing with the same release image and get back to you.
This appears to have worked for me:
[m3@localhost dev-scripts]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.okd-2023-08-18-135805 True False 58m Cluster version is 4.13.0-0.okd-2023-08-18-135805
[m3@localhost dev-scripts]$ oc get bmh -A
NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE
openshift-machine-api ostest-master-0 externally provisioned ostest-fr5ld-master-0 true 92m
openshift-machine-api ostest-master-1 externally provisioned ostest-fr5ld-master-1 true 92m
openshift-machine-api ostest-master-2 externally provisioned ostest-fr5ld-master-2 true 92m
openshift-machine-api ostest-worker-0 provisioned ostest-fr5ld-worker-0-jbq5b true 92m
openshift-machine-api ostest-worker-1 provisioned ostest-fr5ld-worker-0-84t6t true 92m
I'll have to dig into the logs you provided to see if there are any clues about why yours is failing and mine isn't.
I'm setting:
[m3@localhost dev-scripts]$ grep -Ev '^#|^$' config_m3.sh
export OPENSHIFT_RELEASE_IMAGE=registry.ci.openshift.org/origin/release:4.13.0-0.okd-2023-08-18-135805
export PULL_SECRET_FILE=pull_secret.json
export OPENSHIFT_RELEASE_TYPE=okd
export IP_STACK=v4
export NUM_EXTRA_WORKERS=2
So we should be deploying the same thing here. I'm running on a CentOS9-Stream host:
[m3@localhost dev-scripts]$ cat /etc/redhat-release
CentOS Stream release 9
I see you're running Rocky 8.8:
❯ grep PRETTY_NAME 06_create_cluster-2023-09-20-082531.log
2023-09-20 08:25:31 +++(/etc/os-release:7): source(): PRETTY_NAME='Rocky Linux 8.8 (Green Obsidian)'
It would probably be helpful if you were able to provide logs from the Bootstrap node, since that is where the ironic
container should be running:
https://docs.okd.io/latest/support/troubleshooting/troubleshooting-installations.html#gathering-bootstrap-diagnostic-data_troubleshooting-installations
Check to see if the Ironic is listening on the bootstrap node:
sudo ss -tpnl | grep 6385
See if there are any restarting containers:
podman ps -a
Check the logs of the Ironic container specifically:
sudo podman logs ironic
That's probably the best place to start trying to narrow things down.
Thanks @bshephar .
Yes , Im setting the variables which matches your settings,
[core@nodea08 dev-scripts]$ grep -Ev '^#|^$' config_core.sh
export OPENSHIFT_RELEASE_IMAGE=registry.ci.openshift.org/origin/release:4.13.0-0.okd-2023-08-18-135805
export PULL_SECRET_FILE=pull_secret.json
export OPENSHIFT_RELEASE_TYPE=okd
export NUM_EXTRA_WORKERS=2
export IP_STACK=v4
Ironic is listening on the bootstrap node:
[core@localhost ~]$ sudo ss -tpnl | grep 6385
LISTEN 0 128 *:6385 *:* users:(("ironic",pid=6379,fd=5),("ironic",pid=6379,fd=4))
I do not see any restarting of the containers.
[core@localhost ~]$ sudo podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1c51a4cb99f1 quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f About an hour ago Up About an hour dnsmasq
5fe7599f4302 quay.io/openshift/okd-content@sha256:a70e232022f49a883e1facb48690d6c16fdbdc79b2ff4fc807bf07825eb7c380 /bin/copy-metal -... About an hour ago Exited (0) About an hour ago coreos-downloader
b17f707d9374 quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f About an hour ago Up About an hour httpd
ef6ba4a14d4e quay.io/openshift/okd-content@sha256:ad2224900eabbb62bc83b7b356a0491bdb5798b57c2351f5df05e01a3b84ac90 About an hour ago Up About an hour image-customization
a8737f33d92a quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f About an hour ago Up About an hour ironic
ffa193396b97 quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f About an hour ago Up About an hour ironic-inspector
d5f382e9cb36 quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f About an hour ago Up About an hour ironic-ramdisk-logs
f7847bdcf80c quay.io/openshift/okd-content@sha256:1a245dbcc0684c6ca15c9ea67fbfa55073c5d672ea7b48f50c14c371b09de558 start --tear-down... 15 minutes ago Up 15 minutes
Attaching the ironic logs here:
Hey @rakeshk121 .
Ok, two thoughts:
1.
Was this IP address reachable at all during the bootstrap process? 192.168.111.5
$ curl -s -o /dev/null -w "%{http_code}" https://192.168.111.5:6443 -k
I originally thought that maybe this just happened at the end of the deployment failure, but I think that VIP should still actually be available even if it does fail:
2023-09-20 09:26:01 E0920 09:26:01.513229 161368 memcache.go:238] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
2023-09-20 09:26:04 E0920 09:26:04.585280 161368 memcache.go:238] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
It looks like Ironic is working there. So, assuming that IP address is indeed reachable during the bootstrap process. We might need a must-gather to see if there is anything else happening on that node. If it's not reachable , then that is the first problem we need to solve.
I am having what seems to be a similar failure. config parameters:
export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift/okd:4.15.0-0.okd-2024-03-10-010116
export PULL_SECRET_FILE=pull_secret.json
export OPENSHIFT_RELEASE_TYPE=okd
export IP_STACK=v4
export NETWORK_TYPE="OVNKubernetes"
export MASTER_DISK=90
export MASTER_VCPU=4
export NUM_WORKERS=0
export NUM_EXTRA_WORKERS=0
Using WORKING_DIR=/home/dev-scripts
I am running on a fresh install of CentOS Stream 9, and the process after make is that step 06 times out after an hour. The bootstrap node comes up, the bootstrap API comes up.
sudo ss -tpnl | grep 6385 returns nothing. sudo podman ps does not show restarting containers (inside or outside the bootstrap node) sudo podman logs ironic returns Error: no container with name or ID "ironic" found: no such container
The virtual machines ostest_master_0 , _1, and_2 are shut down. oc get bmh -A shows three machines on line. oc get po -n openshift-machine-api shows: No resources found in openshift-machine-api namespace
As I am using a current version of yq (v4.44.2) I had to remove the "y" on line 102 of 01_install_requirements.sh 06_create_cluster-2024-06-18-075053.log
Looking at the use of yq in the bash scripts, I think the ones in utils.sh may not work with yq v4 (needing a period before []). This could be the cause of the issue. However, I am not an expert in yq.
In the bootstrap there are fewer podman images running than Rakeshk121 had:
sudo podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3308f5f6df18 quay.io/openshift/okd-content@sha256:90eb227746e445d6e258d3c9aaccbbdeca517ffb0dcaf5b880c2bde4f74aaae2 /bin/rundnsmasq 11 hours ago Up 11 hours dnsmasq
e4b3a442040a quay.io/openshift/okd-content@sha256:90eb227746e445d6e258d3c9aaccbbdeca517ffb0dcaf5b880c2bde4f74aaae2 /bin/runlogwatch.... 11 hours ago Up 11 hours ironic-ramdisk-logs
80e66f86071b quay.io/openshift/okd-content@sha256:90eb227746e445d6e258d3c9aaccbbdeca517ffb0dcaf5b880c2bde4f74aaae2 /bin/runhttpd 11 hours ago Up 11 hours httpd
49d9ecfa58df quay.io/openshift/okd-content@sha256:9f3f8f11fd743a332f8328b774bed1854c5d5d058663eb122289191bcb0cee73 start --tear-down... 3 minutes ago Up 3 minutes cluster-bootstrap
Describe the bug The cluster creation fails with Error:
To Reproduce
As I'm trying to setup OKD , By referring to this commit https://github.com/openshift-metal3/dev-scripts/pull/1578/commits/f9265103273200e2d75fa6c918765433dd85d0d7 , #1578
I have set the following in the
config_core.sh
Expected/observed behavior The cluster is created and can be accessed.
Additional context
Here is the log file: 06_create_cluster-2023-09-20-082531.log