ozchamo / YAKKO

A single physical server KVM based simple/automatic installer for OpenShift clusters
GNU General Public License v3.0
47 stars 10 forks source link

3 master with no worker fails to install. #8

Closed bayeslearner closed 9 months ago

bayeslearner commented 1 year ago

Stuck at:

Jun 20 21:46:41 bootstrap bootkube.sh[14284]:         Pod Status:openshift-kube-apiserver/kube-apiserver        DoesNotExist
Jun 20 21:46:41 bootstrap bootkube.sh[14284]:         Pod Status:openshift-kube-scheduler/openshift-kube-scheduler        Ready
Jun 20 21:46:41 bootstrap bootkube.sh[14284]:         Pod Status:openshift-kube-controller-manager/kube-controller-manager        Ready
Jun 20 21:46:41 bootstrap bootkube.sh[14284]:         Pod Status:openshift-cluster-version/cluster-version-operator        Ready
Jun 20 21:46:56 bootstrap bootkube.sh[14284]:         Pod Status:openshift-kube-scheduler/openshift-kube-scheduler        Ready
Jun 20 21:46:56 bootstrap bootkube.sh[14284]:         Pod Status:openshift-kube-controller-manager/kube-controller-manager        Ready
Jun 20 21:46:56 bootstrap bootkube.sh[14284]:         Pod Status:openshift-cluster-version/cluster-version-operator        RunningNotReady
Jun 20 21:46:56 bootstrap bootkube.sh[14284]:         Pod Status:openshift-kube-apiserver/kube-apiserver        DoesNotExist
ERROR APIServicesAvailable: apiservices.apiregistration.k8s.io/v1.template.openshift.io: not available: failing or missing response from https://10.130.0.54:8443/apis/template.openshift.io/v1: Get "https://10.130.0.54:8443/apis/template.openshift.io/v1": context deadline exceeded
INFO Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3
INFO Progressing: deployment/route-controller-manager: updated replicas is 2, desired replicas is 3
ERROR Cluster operator operator-lifecycle-manager-packageserver Available is False with ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout
INFO Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.19.0
INFO Use the following commands to gather logs from the cluster
INFO openshift-install gather bootstrap --help
ERROR Bootstrap failed to complete: timed out waiting for the condition
ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.

The bootstrap process doesn't appear to have completed successfully.
This process downloads a lot of images from quay.io and can take a long time.

Press <ENTER> to re-issue this stage (wait-for bootstrap-complete) and give it some more time, OR...
Press <CTRL-C> to abort this install and examine then rerun install and return to this point
FATAL invalid log-level: not a valid logrus Level: "info/"

INFO Waiting up to 20m0s (until 5:53PM) for the Kubernetes API at https://api.myoc.lan:6443...
INFO API v1.26.5+7a891f0 up
INFO Waiting up to 30m0s (until 6:03PM) for bootstrapping to complete...
ozchamo commented 1 year ago

I assume you succeeded testing a single node cluster? I often see installs fail when there is not enough CPU power to let the cluster run the install reasonably fast, and so it times out. Just recently I tried to stand up a 3 node cluster in a vanilla Fedora VM (so all Yakko virtualised in a VM!) and inadvertently allocated 6 vCPUs. It didn't finish. Then I realised my mistake as this is technically 3 cores for such a large cluster and pushed the number to 12 vCPUs (this is on an 8-core laptop) and it completed successfully.

bayeslearner commented 1 year ago

Yes for single node cluster. I will stick with single node for now. Maybe I will try again with 3 nodes later.

ozchamo commented 1 year ago

Apologies that this issue went unnoticed. I did find that there was a trailing "/" at the end of the rerun of the openshift installer when things fail (line 5397) so feel free to edit that in your script until ready to try a new version.

Your particular issue - I have only seen when the downloads take a LONG time (and this was through the use of a proxy) so try that in the meantime and see if it cures your ailments.