openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.18k stars 2.31k forks source link

openshift installation hung up on openshift_service_catalog install (OKD 3.11) -Wait for Controller Manager rollout success #11282

Closed lovelife100 closed 4 years ago

lovelife100 commented 5 years ago

Description

OKD 3.11 installation hung up at:

TASK [openshift_service_catalog : Wait for Controller Manager rollout success].

Version

Please put the following version information in the code block indicated below.

Steps To Reproduce
  1. Deploy a cluster with single master node and two infra node
  2. run the command to deploy cluster:
    ansible-playbook -i hosts playbooks/prerequisites.yml
    ansible-playbook -i hosts playbooks/deploy_cluster.yml
Expected Results

Describe what you expected to happen. OKD 3.11 to install and the Service Catalog install to rollout successfully and the pod kube-service-catalog: po/controller-manager-* is ready.

NAMESPACE               NAME                                           READY     STATUS             RESTARTS   AGE
kube-service-catalog    controller-manager-dxdgr                       1/1       Running   0         4h
Observed Results

Describe what is actually happening.

  1. ansible hung on TASK [openshift_service_catalog : Wait for Controller Manager rollout success]
  2. found kube-service-catalog: po/controller-manager-* is not Ready
    NAMESPACE               NAME                                           READY     STATUS             RESTARTS   AGE
    kube-service-catalog    controller-manager-dxdgr                       0/1       CrashLoopBackOff   55         4h

    the log of controller-manager:

    [root@master ~]# oc logs controller-manager-dxdgr -n kube-service-catalog
    I0304 07:52:46.466900       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true]
    I0304 07:52:46.467081       1 feature_gate.go:194] feature gates: map[OriginatingIdentity:true AsyncBindingOperations:true]
    I0304 07:52:46.467100       1 feature_gate.go:194] feature gates: map[NamespacedServiceBroker:true OriginatingIdentity:true AsyncBindingOperations:true]
    I0304 07:52:46.467141       1 hyperkube.go:192] Service Catalog version v3.11.0-0.1.35+8d4f895-2;Upstream:v0.1.35 (built 2019-01-08T23:12:26Z)
    I0304 07:52:46.469361       1 leaderelection.go:185] attempting to acquire leader lease  kube-service-catalog/service-catalog-controller-manager...
    I0304 07:52:46.509838       1 leaderelection.go:194] successfully acquired lease kube-service-catalog/service-catalog-controller-manager
    I0304 07:52:46.509914       1 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-service-catalog", Name:"service-catalog-controller-manager", UID:"b1603e57-3e2e-11e9-9b77-525400a42c80", APIVersion:"v1", ResourceVersion:"28947", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' controller-manager-dxdgr-external-service-catalog-controller became leader
    F0304 07:52:46.557392       1 controller_manager.go:237] error running controllers: failed to get api versions from server: failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: an error on the server ("unable to set dialer for kube-service-catalog/apiserver as rest transport is of type *transport.debuggingRoundTripper") has prevented the request from succeeding
Additional Information

Provide any additional information which may help us diagnose the issue.

OS information: CentOS Linux release 7.4.1708 (Core)

inventory file:
# Create an OSEv3 group that contains the masters, nodes, and etcd groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root

# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true

openshift_deployment_type=origin

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]

debug_level=8
containerized=false
openshift_disable_check=memory_availability,disk_availability,docker_storage,docker_storage_driver,docker_image_availability,package_version,package_availability,package_update
#
# Cert
openshift_hosted_registry_cert_expire_days=36500
openshift_ca_cert_expire_days=36500
openshift_node_cert_expire_days=36500
openshift_master_cert_expire_days=36500

# host group for masters
[masters]
master.example.com

# host group for etcd
[etcd]
master.example.com

# host group for nodes, includes region info
[nodes]
master.example.com openshift_node_group_name="node-config-master"
node1.example.com  openshift_node_group_name="node-config-infra"
node2.example.com  openshift_node_group_name="node-config-infra"
yeganx commented 5 years ago

same problem what is the solution ???

yeganx commented 5 years ago

my ansible versoin is : ansible --version ansible 2.7.7

spock123 commented 5 years ago

Same here.. arghhh

PYLochou commented 5 years ago

Same here also, with one master-infra plus two worker nodes.

# git describe openshift-ansible-3.11.107-1-4-gcb41a644f

# ansible --version ansible 2.7.10

kacperpabian commented 5 years ago

any solution?

ztanaka1971 commented 5 years ago

I had same issue with OCP. My workaround was to disable service catalog.

openshift_enable_service_catalog=false

GiVeMeRoOt commented 5 years ago

Facing the same issue while installing on AtomicOS 7.2. @ztanaka1971 would it be possible to manually install the service catalog after setting up the openshift cluster if we disable it as suggested by you?

ztanaka1971 commented 5 years ago

I didn't tried additional installation of the service catalog, because my installation didn't need the service catalog...

GiVeMeRoOt commented 5 years ago

Okay, thanks @ztanaka1971

arndt-s commented 5 years ago

same here - possible workaround: check the namespace manually and one pod might not be healthy, delete it. Another pods will spawn that will be healthy.

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/11282#issuecomment-667612467): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.