openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.17k stars 2.32k forks source link

openshift_control_plane : Wait for control plane pods to appear #10671

Closed swepps1 closed 5 years ago

swepps1 commented 5 years ago

Description

I want to install Openshift Origin 3.11 but When do I run "deploy_cluster.yaml" I get: TASK [openshift_control_plane : Wait for control plane pods to appear] ** Monday 12 November 2018 14:47:58 +0100 (0:00:00.097) 0:03:50.274 *** FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (58 retries left). FAILED - RETRYING: Wait for control plane pods to appear (58 retries left). FAILED - RETRYING: Wait for control plane pods to appear (58 retries left). FAILED - RETRYING: Wait for control plane pods to appear (57 retries left). FAILED - RETRYING: Wait for control plane pods to appear (57 retries left). FAILED - RETRYING: Wait for control plane pods to appear (57 retries left). FAILED - RETRYING: Wait for control plane pods to appear (56 retries left). FAILED - RETRYING: Wait for control plane pods to appear (56 retries left). FAILED - RETRYING: Wait for control plane pods to appear (56 retries left). FAILED - RETRYING: Wait for control plane pods to appear (55 retries left). FAILED - RETRYING: Wait for control plane pods to appear (55 retries left). FAILED - RETRYING: Wait for control plane pods to appear (55 retries left). FAILED - RETRYING: Wait for control plane pods to appear (54 retries left). FAILED - RETRYING: Wait for control plane pods to appear (54 retries left). FAILED - RETRYING: Wait for control plane pods to appear (54 retries left).

Version

Please put the following version information in the code block indicated below.

If you're operating from a git clone:

Operating sytem: CentOS Linux release 7.5.1804 (Core)

Inventory file: [OSEv3:children] masters nodes etcd lb nfs

[masters] kak-tst-openshift-master1.kak-tst.internal kak-tst-openshift-master2.kak-tst.internal kak-tst-openshift-master3.kak-tst.internal

[etcd] kak-tst-openshift-master1.kak-tst.internal kak-tst-openshift-master2.kak-tst.internal kak-tst-openshift-master3.kak-tst.internal

[lb] kak-tst-openshift-lb.kak-tst.internal

[nodes] kak-tst-openshift-master[1:3].kak-tst.internal openshift_node_group_name='node-config-master' kak-tst-openshift-infra[1:3].kak-tst.internal openshift_schedulable=true openshift_node_group_name='node-config-infra' kak-tst-openshift-node[1:4].kak-tst.internal openshift_schedulable=true openshift_node_group_name='node-config-compute'

[nfs] kak-tst-nfs.kak-tst.internal

[OSEv3:vars] openshift_additional_repos=[{'id': 'centos-paas', 'name': 'centos-paas', 'baseurl' :'https://buildlogs.centos.org/centos/7/paas/x86_64/openshift-origin311', 'gpgcheck' :'0', 'enabled' :'1'}]

ansible_ssh_user=root ansible_become=True ansible_service_broker_install=False openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true']}, {'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/infra=true']}, {'name': 'node-config-compute', 'labels': ['node-role.kubernetes.io/compute=true']}] openshift_deployment_type=origin os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant' osm_default_node_selector='node-role.kubernetes.io/compute=true' openshift_hosted_router_selector='node-role.kubernetes.io/infra=true' openshift_hosted_registry_selector='node-role.kubernetes.io/infra=true' openshift_hosted_router_selector='node-role.kubernetes.io/infra=true' openshift_hosted_router_replicas=1 openshift_hosted_registry_replicas=1 openshift_master_cluster_method=native openshift_master_cluster_hostname=osconsole.kak-tst.internal openshift_master_cluster_public_hostname=osconsole.kak-tst.internal openshift_master_console_port=443 openshift_master_api_port=443 openshift_metrics_install_metrics=True openshift_logging_install_logging=True osm_use_cockpit=true osm_cockpit_plugins=['cockpit-kubernetes'] openshift_master_metrics_public_url=https://hawkular-metrics.apps.kak-tst.internal oreg_url=kak-tst-katello.kak-tst.internal:5000/kak-origin_docker_container-openshift_origin-${component}:${version} openshift_docker_blocked_registries=registry.access.redhat.com,registry.hub.docker.com,github.com,docker.io openshift_docker_insecure_registries=kak-tst-katello.kak-tst.internal:5000,172.30.0.0/16 openshift_docker_additional_registries=kak-tst-katello.kak-tst.internal:5000 openshift_examples_modify_imagestreams=true openshift_master_identity_providers=[{'name': 'freeipa', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['uid']}, 'bindDN': 'uid=admin,cn=users,cn=accounts,dc=kak-tst,dc=internal', 'bindPassword': '***', 'ca': 'ipa-ca.crt', 'insecure': 'false', 'url': 'ldap://kak-tst-ipa.kak-tst.internal/cn=users,cn=accounts,dc=kak-tst,dc=internal?uid?sub?(memberOf=cn=fejleszto_1,cn=groups,cn=accounts,dc=kak-tst,dc=internal)'}] openshift_master_default_subdomain=apps.kak-tst.internal

levysantanna commented 5 years ago

it can be almost anything, one installation was with this error, I got an issue with the identity provider config at my inventory, looking for the logs of the dead containers, using at the masters:

docker ps -a | grep -v CONTAINER| awk '{system("docker logs "$1)}'

swepps1 commented 5 years ago

Hello,

Thank you your help! I have solution. My problem was "ldapprovider ca" option: 'ca': 'ipa-ca.crt'

Thank you!

tech-mint commented 5 years ago

I have same issue so I did not open a new one

Description

On my 3 HA installation I failed at longer scaleup the cluster with new nodes or masters.

Version
ansible 2.6.5
openshift-ansible-3.11.70-1
Steps To Reproduce
  1. `ansible-playbook playbooks/prerequisites.yml'
  2. 'ansible-playbook playbooks/deploy_cluster.yml'
Expected Results

Succesfull installation

Observed Results

Describe what is actually happening.

it happens until it stops the installation

FAILED - RETRYING: Wait for control plane pods to appear (53 retries left).Result was: {
    "attempts": 8,
    "changed": false,
    "invocation": {
        "module_args": {
            "all_namespaces": null,
            "content": null,
            "debug": false,
            "delete_after": false,
            "field_selector": null,
            "files": null,
            "force": false,
            "kind": "pod",
            "kubeconfig": "/etc/origin/master/admin.kubeconfig",
            "name": "master-etcd-master3.test",
            "namespace": "kube-system",
            "selector": null,
            "state": "list"
        }
    },
    "msg": {
        "cmd": "/usr/bin/oc get pod master-etcd-master3.test -o json -n kube-system",
        "results": [
            {}
        ],
        "returncode": 1,
        "stderr": "Unable to connect to the server: dial tcp: i/o timeout\n",
        "stdout": ""
    },
    "retries": 61
}
Additional Information

Provide any additional information which may help us diagnose the issue.

CentOS Linux release 7.6.1810 (Core)

inventory file:

[OSEv3:children]
masters
nodes
etcd
nfs

[masters]
master1.test
master2.test
master3.test
[nodes]
master[1:3].test openshift_docker_options="--log-driver json-file --log-opt max-size=1M --log-opt max-file=3" openshift_node_group_name='node-config-all-in-one'
[etcd]
master[1:3].test
[nfs]
master1.test

[OSEv3:vars]
## general cluster variables
openshift_master_identity_providers=[{'name':'htpasswd_auth', 'login':'true', 'challenge':'true', 'kind':'HTPasswdPasswordIdentityProvider',}]
openshift_master_default_subdomain=apps.okd.test
ansible_ssh_user=root
debug_level=6
openshift_clock_enabled=true
openshift_master_cluster_method=native
openshift_release=3.11

## networking variables
os_sdn_network_plugin_name=redhat/openshift-ovs-subnet
osm_cluster_network_cidr=10.128.0.0/14
openshift_portal_net=172.30.0.0/16
osm_host_subnet_length=9
openshift_node_proxy_mode=iptables

# deployment type
openshift_deployment_type=origin

# preinstall checks
openshift_disable_check=memory_availability,disk_availability

# hosted registry
openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_nfs_directory=/var/nfs-share
openshift_hosted_registry_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=10Gi

# cluster monitoring
openshift_cluster_monitoring_operator_install=false
##openshift_metrics_install_metrics=true
##openshift_metrics_storage_kind=nfs
##openshift_metrics_storage_access_modes=['ReadWriteOnce']
##openshift_metrics_storage_nfs_directory=/var/nfs-share
##openshift_metrics_storage_nfs_options='*(rw,root_squash)'
##openshift_metrics_storage_volume_name=metrics
##openshift_metrics_storage_volume_size=10Gi

# ansible broker
ansible_service_broker_install=false

# template service broker
template_service_broker_install=false
openshift_service_broker_selector={"node-role.kubernetes.io/infra":"true"}
openshift_template_service_broker_namespace=['openshift','template-service-broker']

# web console
openshift_web_console_install=true
openshift_console_install=true

just same as you

prithivip commented 5 years ago

Hi Team, I'm getting the exact same error. Could you please help me to figure out what is the problem for this "Unable to connect to the server: dial tcp: i/o timeout"

abhiroopghatak commented 5 years ago

TASK [openshift_control_plane : Wait for control plane pods to appear] ***** Monday 11 March 2019 07:18:27 +0000 (0:00:00.229) 0:22:21.282 ** FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (58 retries left). FAILED - RETRYING: Wait for control plane pods to appear (57 retries left). FAILED - RETRYING: Wait for control plane pods to appear (56 retries left). FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (55 retries left). FAILED - RETRYING: Wait for control plane pods to appear (54 retries left). FAILED - RETRYING: Wait for control plane pods to appear (53 retries left). FAILED - RETRYING: Wait for control plane pods to appear (52 retries left). FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (51 retries left). FAILED - RETRYING: Wait for control plane pods to appear (50 retries left). FAILED - RETRYING: Wait for control plane pods to appear (49 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (48 retries left). FAILED - RETRYING: Wait for control plane pods to appear (47 retries left). FAILED - RETRYING: Wait for control plane pods to appear (46 retries left). FAILED - RETRYING: Wait for control plane pods to appear (45 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (44 retries left).

and it goes on ... SO many folks facing same problem .... ANy solution yet ?

tech-mint commented 5 years ago

after

TASK [openshift_control_plane : Wait for control plane pods to appear] ***** Monday 11 March 2019 07:18:27 +0000 (0:00:00.229) 0:22:21.282 ** FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (58 retries left). FAILED - RETRYING: Wait for control plane pods to appear (57 retries left). FAILED - RETRYING: Wait for control plane pods to appear (56 retries left). FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (55 retries left). FAILED - RETRYING: Wait for control plane pods to appear (54 retries left). FAILED - RETRYING: Wait for control plane pods to appear (53 retries left). FAILED - RETRYING: Wait for control plane pods to appear (52 retries left). FAILED - RETRYING: Wait for control plane pods to appear (60 retries left). FAILED - RETRYING: Wait for control plane pods to appear (51 retries left). FAILED - RETRYING: Wait for control plane pods to appear (50 retries left). FAILED - RETRYING: Wait for control plane pods to appear (49 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (48 retries left). FAILED - RETRYING: Wait for control plane pods to appear (47 retries left). FAILED - RETRYING: Wait for control plane pods to appear (46 retries left). FAILED - RETRYING: Wait for control plane pods to appear (45 retries left). FAILED - RETRYING: Wait for control plane pods to appear (59 retries left). FAILED - RETRYING: Wait for control plane pods to appear (44 retries left).

and it goes on ... SO many folks facing same problem .... ANy solution yet ?

after it failed proceed with running each Ansible playbooks one by one from where you left off following the docs

It usually works

breeze1974 commented 5 years ago

The below fixed my issue. I use a proxy in my environment. I had to add the hostname to no_proxy

$ cat < /etc/environment http_proxy=http://10.xx.xx.xx:8080 https_proxy=http://10.xx.xx.xx:8080 ftp_proxy=http://10.xx.xx.xx:8080 no_proxy=127.0.0.1,localhost,172.17.240.84,172.17.240.85,172.17.240.86,172.17.240.87,10.96.0.0/12,10.244.0.0/16,v-openshift1-lnx1,v-node01-lnx1,v-node02-lnx1,console,console.inet.co.za EOF

$ cat < /etc/systemd/system/docker.service.d/no-proxy.conf [Service] Environment="NO_PROXY=artifactory-za.devel.iress.com.au, 172.30.9.71, 172.17.240.84, 172.17.240.85, 172.17.240.86, 172.17.240.87" Environment="HTTP_PROXY=http://10.xx.xx.xx:8080/" Environment="HTTPS_PROXY=http://10.xx.xx.xx:8080/" EOF