FAILED - RETRYING: Wait for control plane pods to appear (v3.11/poor etcd listen host)

alanbchristie commented 4 years ago

Description

Behaviour very similar to issue #9575.

Here I'm deploying 3.11 to bare metal and the openshift_control_plane : Wait for control plane pods to appear task fails with the same error as #9575: -

 The connection to the server XYZ was refused - did you specify the right host or port?

Version

Ansible version

$ ansible --version
ansible 2.7.13
  config file = None
  configured module search path = [u'/home/centos/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/centos/.local/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

OpenShift Ansible tag: -

openshift-ansible-3.11.152-1

Steps To Reproduce

Run the deploy_cluster playbook

Expected Results

Control plane pods should appear.

Observed Results

The control-plane pods do not appear.

The output is essentially the same as issue #9575 I lost my output so that issue's error (which is essentially exactly the same) is repeated here...

TASK [openshift_control_plane : Wait for control plane pods to appear] ************************************************************************************************************************************************************
Tuesday 14 August 2018  16:39:24 +0800 (0:00:00.086)       0:22:42.301 ******** 
FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
...............
FAILED - RETRYING: Wait for control plane pods to appear (1 retries left).
failed: [10.10.244.212] (item=__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251) => {"attempts": 60, "changed": false, "item": "__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251", "msg": {"cmd": "/bin/oc get pod master-__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251-10.10.244.212 -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The connection to the server 10.10.244.212:8443 was refused - did you specify the right host or port?\n", "stdout": ""}}

Additional Information

As discussed in the related issue, the etcd pod is listening using a specific IP address as can be seen by displaying the logs for that pod (i.e. it listens on 134.93.174.200:2379). But the API service is connection using 127.0.0.1:2379.

The work-around, which avoids the installation error is to over-ride the host using etcd_listen_client_urls.

In my YAML-based inventory I add this...

all:
  children:
    OSEv3:
      vars:
        etcd_listen_client_urls: 'https://0.0.0.0:2379'

And it works!

Is it time to ensure that thew etcd pod, out of the box, listens on 0.0.0.0:2379?

alanbchristie commented 4 years ago

I used the temporary solution posted danielkucera on 8th August in issue #6986

tetsushiawano commented 4 years ago

I am facing same issue I will try the approach mentioned here.

tetsushiawano commented 4 years ago

It didnt work T T

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

alanbchristie commented 4 years ago

/remove-lifecycle stale

openshift-bot commented 3 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 3 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 3 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 3 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/11942#issuecomment-745403464): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift / openshift-ansible