uninstall playbook didn't remove calico related stuff thoroughly

yu2003w commented 5 years ago

Description

Uninstall playbook didn't cleanup calico related stuff. And atomic-openshift-node.service brought up calico when openshift-sdn 'redhat/openshift-ovs-multitenant' was enabled.

Version

Your ansible version per ansible --version

ansible 2.6.19
config file = /usr/share/ansible/openshift-ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Jun 11 2019, 14:33:56) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

If you're running from playbooks installed via RPM

The output of rpm -q openshift-ansible

openshift-ansible-3.11.146-1.git.0.fcedb45.el7.noarch

Steps To Reproduce

install OCP 3.11 with following vars set

os_sdn_network_plugin_name=cni
openshift_use_calico=true
openshift_use_openshift_sdn=false

run uninstall playbook to remove OCP 3.11 3, install OCP 3.11 with openshift-sdn as below

openshift_use_openshift_sdn=true
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'

Expected Results

Describe what you expected to happen.

OCP 3.11 could be deployed successfully.

Observed Results

OCP 3.11 installation failed and pods stuck on 'ContainerCreating' with errors as below,

Events:
  Type     Reason                  Age                 From                         Message
  ----     ------                  ----                ----                         -------
  Normal   Scheduled               44m                 default-scheduler            Successfully assigned openshift-web-console/webconsole-85494cdb8c-7rs5m to buzz1.test.com
  Warning  FailedCreatePodSandBox  44m                 kubelet, buzz1.test.com  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "c6d2dbb19e359a62937869413a01017f221e92b804778063d563d9569cac5721" network for pod "webconsole-85494cdb8c-7rs5m": NetworkPlugin cni failed to set up pod "webconsole-85494cdb8c-7rs5m_openshift-web-console" network: context deadline exceeded, failed to clean up sandbox container "c6d2dbb19e359a62937869413a01017f221e92b804778063d563d9569cac5721" network for pod "webconsole-85494cdb8c-7rs5m": NetworkPlugin cni failed to teardown pod "webconsole-85494cdb8c-7rs5m_openshift-web-console" network: context deadline exceeded]
  Normal   SandboxChanged          4m (x105 over 44m)  kubelet, buzz1.test.com  Pod sandbox changed, it will be killed and re-created.

It seemed that origin-node service brought up calico. It's not expected result.

[root@buzz1 openshift-ansible]# systemctl status  atomic-openshift-node.service 
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-10-14 00:43:08 PDT; 22h ago
     Docs: https://github.com/openshift/origin
 Main PID: 87388 (hyperkube)
   CGroup: /system.slice/atomic-openshift-node.service
           ├─87388 /usr/bin/hyperkube kubelet --v=6 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-toke...
           └─88872 /opt/cni/bin/calico

Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.289674   87388 common.go:71] Using namespace "kube-s....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.289809   87388 file.go:199] Reading config file "/et...yaml"
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.292556   87388 common.go:62] Generated UID "598eab3c....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.293602   87388 common.go:66] Generated Name "master-....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.294512   87388 common.go:71] Using namespace "kube-s....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.295667   87388 file.go:199] Reading config file "/et...yaml"
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.296350   87388 common.go:62] Generated UID "d71dc810....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.296367   87388 common.go:66] Generated Name "master-....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.296379   87388 common.go:71] Using namespace "kube-s....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.300194   87388 config.go:303] Setting pods for source file
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.361625   87388 kubelet.go:1884] SyncLoop (SYNC): 3 p...d33c)
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.361693   87388 config.go:100] Looking for [api file]...e:{}]
Oct 14 23:15:48 buzz1.fyre.ibm.com atomic-openshift-node[87388]: I1014 23:15:48.361716   87388 kubelet.go:1907] SyncLoop (housekeeping)
Hint: Some lines were ellipsized, use -l to show in full.
[root@buzz1 openshift-ansible]# ps -ef | grep calico
root      88872  87388  0 23:15 ?        00:00:00 /opt/cni/bin/calico
root      88975  74601  0 23:15 pts/0    00:00:00 grep --color=auto calico

For long output or logs, consider using a gist

Additional Information

Provide any additional information which may help us diagnose the issue.

Red Hat Enterprise Linux Server release 7.7 (Maipo)

openshift-ci-robot commented 5 years ago

@yu2003w: The label(s) /label networking cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga

In response to [this](https://github.com/openshift/openshift-ansible/issues/11953#issuecomment-542142270): >/label networking Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift-ci-robot commented 5 years ago

@yu2003w: The label(s) /label networking cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga

In response to [this](https://github.com/openshift/openshift-ansible/issues/11953#issuecomment-542142270): >/label networking Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

ffroehli commented 4 years ago

Workaround: ansible nodes -b -o -m command -a 'rm /etc/cni/net.d/10-calico.conflist' ansible nodes -b -o -m command -a 'rm /etc/cni/net.d/calico-kubeconfig' ansible nodes -b -o -m command -a 'rm /etc/cni/net.d/calico-tls/ -R'

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/11953#issuecomment-667765208): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift / openshift-ansible