Closed thiguetta closed 3 years ago
Can you include the must-gather ? https://docs.openshift.com/container-platform/4.5/support/gathering-cluster-data.html#support_gathering_data_gathering-cluster-data
That should help triage the issue.
Since the compute nodes are getting created i would start debugging that,
That should help narrow why no compute nodes joined the cluster.
Check whether you have created a correct API VIP and Ingress VIP A record in DNS.
@thiguetta did you solve this? I have the exact same problem.
Also for me. Everything is up, but my cluster is displaying this message and cannot update.
Conditions
TypeStatusUpdatedReasonMessage
AvailableTrue
Oct 25, 4:20 am
-desired and current number of IngressControllers are equal
ProgressingFalse
Oct 25, 4:20 am
-desired and current number of IngressControllers are equal
DegradedTrue
Oct 25, 2:42 am
IngressControllersDegradedSome ingresscontrollers are degraded: default
Operand Versions
Name | Version
-- | --
operator | 4.5.15
ingress-controller | quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:01a749bd3a30fb059659309a18a4c9376e24d8044c42cbb893566d49a50036c1
ingress pod:
2020-10-26T23:04:03.006Z INFO operator.ingress_controller ingress/controller.go:165 reconciling {"request": "openshift-ingress-operator/default"}
2020-10-26T23:04:03.071Z ERROR operator.ingress_controller ingress/controller.go:232 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False"}
Alright, after reading a lot, found some interesting correlation about the documentation to solve my problem.
Error and success:
2020-10-27T02:22:16.035Z INFO operator.ingress_controller ingress/controller.go:165 reconciling {"request": "openshift-ingress-operator/default"}
2020-10-27T02:22:16.105Z ERROR operator.ingress_controller ingress/controller.go:232 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False"}
2020-10-27T02:22:16.105Z INFO operator.status_controller status/controller.go:90 Reconciling {"request": "openshift-ingress-operator/default"}
2020-10-27T02:22:16.105Z INFO operator.ingress_controller ingress/controller.go:165 reconciling {"request": "openshift-ingress-operator/default"}
2020-10-27T02:22:16.114Z DEBUG operator.init.controller-runtime.controller controller/controller.go:282 Successfully Reconciled {"controller": "status_controller", "request": "openshift-ingress-operator/default"}
This one helped me to find <infrastructureID>
https://docs.openshift.com/container-platform/4.5/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-osp_creating-infrastructure-machinesets
oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster
Now it shows another error
spec:
endpointPublishingStrategy:
loadBalancer:
scope: Internal
type: LoadBalancerService
status:
availableReplicas: 2
conditions:
- lastTransitionTime: "2020-10-25T05:41:38Z"
reason: Valid
status: "True"
type: Admitted
- lastTransitionTime: "2020-10-25T07:20:28Z"
status: "True"
type: Available
- lastTransitionTime: "2020-10-25T07:20:28Z"
message: The deployment has Available status condition set to True
reason: DeploymentAvailable
status: "False"
type: DeploymentDegraded
- lastTransitionTime: "2020-10-25T05:41:41Z"
message: The endpoint publishing strategy supports a managed load balancer
reason: WantedByEndpointPublishingStrategy
status: "True"
type: LoadBalancerManaged
- lastTransitionTime: "2020-10-25T05:41:41Z"
message: The LoadBalancer service is pending
reason: LoadBalancerPending
status: "False"
type: LoadBalancerReady
- lastTransitionTime: "2020-10-27T02:22:16Z"
message: DNS management is supported and zones are specified in the cluster DNS
config.
reason: Normal
status: "True"
type: DNSManaged
- lastTransitionTime: "2020-10-25T05:46:23Z"
message: 'One or more other status conditions indicate a degraded state: LoadBalancerReady=False'
reason: DegradedConditions
status: "True"
type: Degraded
- lastTransitionTime: "2020-10-27T02:22:16Z"
message: The wildcard record resource was not found.
reason: RecordNotFound
status: "False"
type: DNSReady
domain: apps.idocp4.domain.com
endpointPublishingStrategy:
loadBalancer:
scope: Internal
type: LoadBalancerService
observedGeneration: 3
I had the exact same problem as @thiguetta on 4.5.15. The master nodes come up, the worker VMs gets created but never join and the installer gets to 86% and then fail.
I realized that the worker nodes never managed to get their config from the master nodes, "timeout awaiting response headers" from https://*APIVIP*:22623/config/worker.
It takes a while to get a response from the URL above, well above 10s, and the worker nodes have a very aggressive timeout. This is my workaround:
You have a fair amount of time before the installer timeout and crash.
This will give the worker nodes enough time to receive the configuration and for the installation to go through to the end.
Obviously this is a crude workaround and not a solution, maybe this could be implemented in the default configuration? I don't know how to do that.
@fredrik-furtenbach the following command will generate ignition configs:
./openshift-install create ignition-configs
then you'll have:
bootstrap.ign master.ign worker.ign
That's great, thank you @a1ex-var1amov.
So, if those files are present in the installation directory the installer will use them when I run create cluster?
The installer will consume them when you run the create cluster command
> openshift-install create cluster --dir=./ --log-level=info
INFO Consuming Bootstrap Ignition Config from target directory
INFO Consuming Master Ignition Config from target directory
INFO Consuming Worker Ignition Config from target directory
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
Version
Platform:
vSphere IPI
What happened?
created a cluster using the specifications in https://docs.openshift.com/container-platform/4.5/installing/installing_vsphere/installing-vsphere-installer-provisioned.html#installing-vsphere-installer-provisioned but only master nodes came up, cluster failed to wait to compute nodes which never got created.
What you expected to happen?
it was expected the cluster (master and workers) to be up
How to reproduce it (as minimally and precisely as possible)?
openshift-install create cluster
Anything else we need to know?
Enter text here.
References