openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created

oawties0412 commented 5 years ago

Description

Hello, I'm having problem deploying a cluster in Openshift.

I tried all of the related issues based on the error I saw here in Github. Such as #10969

Command: ansible-playbook -i /etc/ansible/hosts playbooks/deploy_cluster.yml

TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] **** Tuesday 13 August 2019 17:00:34 +0800 (0:00:02.250) 0:14:45.437 **** FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (29 retries left). ....... FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (2 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left).

Version

ansible 2.6.18
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jun 11 2019, 14:33:56) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

openshift-ansible-3.11.43-1.git.0.fa69a02.el7.noarch

Steps To Reproduce

Run ansible-playbook -i /etc/ansible/hosts playbooks/prerequisites.yml
Wait success results from prerequisites.yml
Run ansible-playbook -i /etc/ansible/hosts playbooks/deploy_cluster.yml

Expected Results

deploy_cluster.yml will finish without problems.

Observed Results

80-openshift-network.conf is missing in /etc/cni/net.d


NAME                        READY     STATUS              RESTARTS   AGE
docker-registry-1-deploy    0/1       ContainerCreating   0          1h
registry-console-1-deploy   0/1       ContainerCreating   0          1h
router-2-deploy             0/1       ContainerCreating   0          25m
[root@cssbh001d ~]# oc logs docker-registry-1-deploy
Error from server (BadRequest): container "deployment" in pod "docker-registry-1-deploy" is waiting to start: ContainerCreating
[root@cssbh001d ~]# oc logs registry-console-1-deploy
Error from server (BadRequest): container "deployment" in pod "registry-console-1-deploy" is waiting to start: ContainerCreating
[root@cssbh001d ~]# oc logs router-2-deploy
Error from server (BadRequest): container "deployment" in pod "router-2-deploy" is waiting to start: ContainerCreating

logs from journalctl -f can be found here: https://gist.github.com/oawties0412/fa2507bc2f15aabd348cf0fcea4621dc

Additional Information

I'm using - Red Hat Enterprise Linux Server release 7.7 (Maipo), oc version: oc v3.11.43

Inventory File (/etc/ansible/hosts) : [OSEv3:children] masters nfs etcd nodes

[OSEv3:vars] template_service_broker_install=false openshift_master_cluster_public_hostname=None ansible_ssh_user=root openshift_master_cluster_hostname=None openshift_deployment_type=openshift-enterprise os_firewall_use_firewalld=True openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}] openshift_master_default_subdomain=cloudapps.10.110.25.147.nip.io openshift_disable_check=memory_availability,disk_availability,package_version,docker_image_availability,package_availability,package_update,docker_storage

oreg_url=10.110.25.141:8082/openshift3/ose-${component}:${version} openshift_examples_modify_imagestreams=true openshift_image_tag=v3.11.43 openshift_pkg_version=-3.11.43

openshift_additional_registry_credentials=[{'host':'10.110.25.141:8082','user':'admin','password':'admin123','test_login':'False', 'tls_verify':'False'}]

[masters] cssbh001d.openshift.smart.ph openshift_public_ip=10.110.25.147 openshift_ip=10.110.25.147 openshift_public_hostname=cssbh001d.openshift.smart.ph openshift_hostname=cssbh001d.openshift.smart.ph ansible_connection=local

[nfs] cssbh001d.openshift.smart.ph openshift_public_ip=10.110.25.147 openshift_ip=10.110.25.147 openshift_public_hostname=cssbh001d.openshift.smart.ph openshift_hostname=cssbh001d.openshift.smart.ph ansible_connection=local

[etcd] cssbh001d.openshift.smart.ph openshift_public_ip=10.110.25.147 openshift_ip=10.110.25.147 openshift_public_hostname=cssbh001d.openshift.smart.ph openshift_hostname=cssbh001d.openshift.smart.ph ansible_connection=local

[nodes] cssbh001d.openshift.smart.ph openshift_public_ip=10.110.25.147 openshift_ip=10.110.25.147 openshift_public_hostname=cssbh001d.openshift.smart.ph openshift_hostname=cssbh001d.openshift.smart.ph openshift_node_labels="{'region': 'infra'}" openshift_node_group_name='node-config-master-infra' openshift_schedulable=True ansible_connection=local

I tried uninstalling using ansible-playbook -i /etc/ansible/hosts playbooks/adhoc/uninstall.yml,

yum remove openshift-ansible
yum remove docker-1.13.1

and then re-install, reboot. 

I also replicated the 80-openshift-network.conf from a different environment where openshift is working but did not work. 

I really need help, I'm really new to this set-up (from dev) so I'm still trying to figure things out.

Thank you so much!

oawties0412 commented 5 years ago

[root@cssbh001d ~]# oc -n openshift-sdn get pods NAME READY STATUS RESTARTS AGE ovs-pccsr 1/1 Running 0 3d sdn-76xhm 0/1 CrashLoopBackOff 1060 3d [root@cssbh001d ~]# [root@cssbh001d ~]# [root@cssbh001d ~]# oc -n openshift-sdn logs sdn-76xhm 2019/08/19 02:51:33 socat[26625] E connect(5, AF=1 "/var/run/openshift-sdn/cni-server.sock", 40): No such file or directory User "sa" set. Context "default/cssbh001d-openshift-smart-ph:8443/system:admin" modified. which: no openshift-sdn in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) I0819 02:51:34.801228 26599 start_network.go:200] Reading node configuration from /etc/origin/node/node-config.yaml I0819 02:51:34.806833 26599 start_network.go:207] Starting node networking cssbh001d.openshift.smart.ph (v3.11.43) W0819 02:51:34.807217 26599 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP. I0819 02:51:34.807351 26599 feature_gate.go:230] feature gates: &{map[]} I0819 02:51:34.810792 26599 transport.go:160] Refreshing client certificate from store I0819 02:51:34.810859 26599 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem". I0819 02:51:34.860211 26599 node.go:147] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "cssbh001d.openshift.smart.ph" (IP ""), iptables sync period "30s" F0819 02:51:34.864425 26599 start_network.go:106] could not start DNS, unable to read config file: open /etc/origin/node/resolv.conf: no such file or directory

I have another environment where as Openshift-enterprise 3.11.43 is installed and upon checking this /etc/origin/node/resolv.conf file, it is existing but empty inside.

/var/run/openshift-sdn/cni-server.sock and config.json this is missing too with the current environment I'm working at. In which part of installation that these files get created? or do I need to add them manually?

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

MichaelYuLimin commented 4 years ago

add NM_CONTROLLED=yes setting to network script

such as : echo "NM_CONTROLLED=yes" >> /etc/sysconfig/network-scripts/ifcfg-ens192

then restart NetworkManager service

systemctl restart NetworkManager

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/11825#issuecomment-683265951): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift / openshift-ansible