openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.18k stars 2.31k forks source link

ansible BYO containerized installation failed #3174

Closed Moep90 closed 6 years ago

Moep90 commented 7 years ago

Description

Single Master - Multi Node

Version
ansible 2.2.0.0
  config file = /Users/danny.heinrich/.ansible.cfg
  configured module search path = Default w/o overrides

git describe
openshift-ansible-3.5.1-1-63-g0880fba0
Steps To Reproduce
  1. ansible-playbook -i inventory/byo/hosts playbooks/byo/config.ym
Observed Results
TASK [openshift_master : Start and enable master] ******************************
FAILED - RETRYING: TASK: openshift_master : Start and enable master (1 retries left).
fatal: [master]: FAILED! => {"attempts": 1, "changed": false, "failed": true, "msg": "Unable to start service origin-master: Job for origin-master.service failed because the control process exited with error code. See \"systemctl status origin-master.service\" and \"journalctl -xe\" for details.\n"}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
localhost                  : ok=7    changed=0    unreachable=0    failed=0
master                     : ok=195  changed=41   unreachable=0    failed=1
node1                      : ok=59   changed=8    unreachable=0    failed=0
node2                      : ok=58   changed=8    unreachable=0    failed=0

For long output or logs, consider using a gist

Additional Information
cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
cat intentory/byo/hosts

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root

# If ansible_ssh_user is not root, ansible_sudo must be set to true
#ansible_sudo=true

product_type=openshift
deployment_type=origin

# Install with Dockercontainers
containerized=true
# Select OpenShift Version
openshift_release=v1.4.0

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/openshift/openshift-passwd'}]

# host group for masters
[masters]
master ansible_host=10.0.0.10

# host group for nodes, includes region info
[nodes]
master ansible_host=10.0.0.10 openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
node1 ansible_host=10.0.0.11 openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
node2 ansible_host=10.0.0.12 openshift_node_labels="{'region': 'primary', 'zone': 'west'}"
sdodson commented 7 years ago

@Moep90 We need to understand why the master failed to start. Take a look at systemctl status origin-master.service and/or journalctl -lu origin-master.service on the master.

bdurrow commented 7 years ago

I have seen this kind of problem if openshift_cloudprovider_kind doesn't get set.

raditv commented 7 years ago

I've got the sampe problem as above, below is my inventory and journalctl

May 21 23:23:08 localhost.localdomain systemd[1]: origin-master.service: main process exited, code=exited, status=255/n/a May 21 23:23:08 localhost.localdomain systemd[1]: Failed to start Origin Master Service. May 21 23:23:08 localhost.localdomain systemd[1]: Unit origin-master.service entered failed state. May 21 23:23:08 localhost.localdomain systemd[1]: origin-master.service failed. May 21 23:23:13 localhost.localdomain systemd[1]: origin-master.service holdoff time over, scheduling restart. May 21 23:23:13 localhost.localdomain systemd[1]: Starting Origin Master Service... May 21 23:23:13 localhost.localdomain origin-master[3420]: W0521 23:23:13.878542 3420 start_master.go:291] Warning: assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console, maste May 21 23:23:13 localhost.localdomain origin-master[3420]: W0521 23:23:13.878722 3420 start_master.go:291] Warning: assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console, master start wi May 21 23:23:13 localhost.localdomain origin-master[3420]: W0521 23:23:13.878739 3420 start_master.go:291] Warning: auditConfig.auditFilePath: Required value: audit can now be logged to a separate file, master start will continue May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.889494 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.890936 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.892251 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.893476 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.894731 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.896221 3420 admission.go:107] Admission plugin ProjectRequestLimit is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.896243 3420 admission.go:107] Admission plugin openshift.io/RestrictSubjectBindings is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.896254 3420 admission.go:107] Admission plugin PodNodeConstraints is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.896290 3420 admission.go:107] Admission plugin RunOnceDuration is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.896303 3420 admission.go:107] Admission plugin PodNodeConstraints is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.896320 3420 admission.go:107] Admission plugin ClusterResourceOverride is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.898864 3420 admission.go:107] Admission plugin ImagePolicyWebhook is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.898949 3420 admission.go:107] Admission plugin AlwaysPullImages is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.898960 3420 admission.go:107] Admission plugin LimitPodHardAntiAffinityTopology is not enabled. It will not be started. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.905521 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.906530 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.907709 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.908924 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.909395 3420 plugins.go:94] No cloud provider specified. May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.913317 3420 master_config.go:367] Using the lease endpoint reconciler May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.914291 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.915303 3420 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.915391 3420 start_master.go:410] Starting master on 0.0.0.0:8443 (v1.5.0) May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.915410 3420 start_master.go:411] Public master address is https://openshift.app:8443 May 21 23:23:13 localhost.localdomain origin-master[3420]: I0521 23:23:13.915433 3420 start_master.go:415] Using images from "openshift/origin-:v1.5.0" May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.920892 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.User: client: etcd cluster is unavailable or misco May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.920976 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.OAuthAccessToken: client: etcd cluster is unavaila May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.921070 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.921152 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/storageclass/default/admission.go:75: Failed to lis May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.926106 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list ap May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.926199 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list ap May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.926278 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Group: client: etcd cluster is unavailable or misc May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.926375 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.PolicyBinding: client: etcd cluster is unavailable May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.926451 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Policy: client: etcd cluster is unavailable or mis May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.926531 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.ClusterPolicyBinding: client: etcd cluster is unav May 21 23:23:13 localhost.localdomain origin-master[3420]: E0521 23:23:13.926636 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.ClusterPolicy: client: etcd cluster is unavailable May 21 23:23:14 localhost.localdomain origin-master[3420]: E0521 23:23:14.923085 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.OAuthAccessToken: client: etcd cluster is unavaila May 21 23:23:14 localhost.localdomain origin-master[3420]: E0521 23:23:14.923159 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.User: client: etcd cluster is unavailable or misco May 21 23:23:14 localhost.localdomain origin-master[3420]: E0521 23:23:14.923230 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list May 21 23:23:14 localhost.localdomain origin-master[3420]: E0521 23:23:14.927354 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Policy: client: etcd cluster is unavailable or mis May 21 23:23:14 localhost.localdomain origin-master[3420]: E0521 23:23:14.927416 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.PolicyBinding: client: etcd cluster is unavailable May 21 23:23:14 localhost.localdomain origin-master[3420]: E0521 23:23:14.927468 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Group: client: etcd cluster is unavailable or misc May 21 23:23:16 localhost.localdomain origin-master[3420]: E0521 23:23:16.933851 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list ap May 21 23:23:16 localhost.localdomain origin-master[3420]: E0521 23:23:16.934687 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list ap May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.933029 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Group: client: etcd cluster is unavailable or misc May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.933141 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.PolicyBinding: client: etcd cluster is unavailable May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.933203 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Policy: client: etcd cluster is unavailable or mis May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.933279 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.933348 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.User: client: etcd cluster is unavailable or misco May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.933405 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.OAuthAccessToken: client: etcd cluster is unavaila May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.933457 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.ClusterPolicyBinding: client: etcd cluster is unav May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.937432 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list ap May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.937517 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/storageclass/default/admission.go:75: Failed to lis May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.937582 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.ClusterPolicy: client: etcd cluster is unavailable May 21 23:23:17 localhost.localdomain origin-master[3420]: E0521 23:23:17.937661 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list ap May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.936138 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.OAuthAccessToken: client: etcd cluster is unavaila May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.936223 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.User: client: etcd cluster is unavailable or misco May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.936293 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.936347 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Policy: client: etcd cluster is unavailable or mis May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.936405 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.PolicyBinding: client: etcd cluster is unavailable May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.936455 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.Group: client: etcd cluster is unavailable or misc May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.936505 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.ClusterPolicyBinding: client: etcd cluster is unav May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.939525 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list ap May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.939590 3420 cacher.go:260] unexpected ListAndWatch error: pkg/storage/cacher.go:201: Failed to list api.ClusterPolicy: client: etcd cluster is unavailable May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.940850 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/storageclass/default/admission.go:75: Failed to lis May 21 23:23:18 localhost.localdomain origin-master[3420]: E0521 23:23:18.940942 3420 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list ap May 21 23:23:19 localhost.localdomain origin-master[3420]: F0521 23:23:19.408706 3420 start_master.go:112] could not reach etcd: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 10.0.2.15:2379: getsockopt: May 21 23:23:19 localhost.localdomain systemd[1]: origin-master.service: main process exited, code=exited, status=255/n/a May 21 23:23:19 localhost.localdomain systemd[1]: Failed to start Origin Master Service.

and here is my inventory file

[OSEv3:children] masters nodes etcd

[OSEv3:vars] ansible_ssh_user=vagrant ansible_become=yes openshift_deployment_type=origin openshift_public_hostname=openshift.app openshift_master_default_subdomain=console.openshift.app

[masters] openshift.app [etcd] 192.168.1.5 [nodes] 192.168.1.6 192.168.1.7

vishal-biyani commented 7 years ago

Not sure if anyone has faced this issue again or has found a resolution. I am facing this issue intermittently. There are times the Ansible scripts just work fine but then there are other times. The symptom is that origin-master docker fails and when I look at it's log, I see that:

2017-05-25T01:07:55.865806000Z E0525 01:07:55.864022       1 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get https://master.org.com:8443/api/v1/limitranges?resourceVersion=0: dial tcp 10.200.1.68:8443: getsockopt: connection refused
2017-05-25T01:07:55.866014000Z E0525 01:07:55.865541       1 cacher.go:254] unexpected ListAndWatch error: pkg/storage/cacher.go:194: Failed to list *api.User: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 10.200.1.68:2379: getsockopt: connection refused
2017-05-25T01:07:55.979721000Z F0525 01:07:55.979072       1 start_master.go:108] could not reach etcd: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 10.200.1.68:2379: getsockopt: connection refused

But I guess underlying problem is with etcd - as the log for etcd is completely blank- which is indication of some sort of problem. So I compleletly remove the etcd container - and now it starts showing some activity. But still the etcd container is bound to 127.0.0.1 whereas the Openshift container tries to reach etcd at the IP address of the host (Single master+etcd & 2 node topology).

boyanEst commented 7 years ago

@vishal-biyani Thank you for sharing your solution. I have got the same problem with 3.6.0 after removing etcd now the problem is gone. Regards