kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
15.94k stars 6.42k forks source link

CoreDNS deployment - manifest applied with '--force' flag #7113

Closed 0x4c6565 closed 8 months ago

0x4c6565 commented 3 years ago

When the CoreDNS deployment is applied to the cluster, the --force flag is used, resulting in non-graceful termination of pods rather than rolling update:

image

This causes significant DNS disruptions in our clusters.

We can see that the deployment is applied using the custom kube module with a state of latest:

https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes-apps/ansible/tasks/main.yml#L39-L46

latest state calls the replace function with no arguments, resulting in the default force value of true:

https://github.com/kubernetes-sigs/kubespray/blob/master/library/kube.py#L186-L191

Environment:

Cloud provider or hardware configuration: Bare metal VMs

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 3.10.0-957.5.1.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Version of Ansible (ansible --version):

2.8.11

Version of Python (python --version):

3.6.6

Kubespray version (commit) (git rev-parse --short HEAD):

HEAD
eb16986

Network plugin used:

Calico

Command used to invoke ansible:

ansible-playbook cluster.yml -t apps

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

0x4c6565 commented 3 years ago

/remove-lifecycle rotten

floryut commented 3 years ago

@0x4c6565 You may propose a PR to discuss this with the community force: false

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

miff2000 commented 3 years ago

@0x4c6565 You may propose a PR to discuss this with the community force: false

I've run into these issues myself when performing upgrades @floryut. Shall I put in a PR to switch to force: false as you've suggested?

Any ideas why it was forced in the first place?

floryut commented 3 years ago

@0x4c6565 You may propose a PR to discuss this with the community force: false

I've run into these issues myself when performing upgrades @floryut. Shall I put in a PR to switch to force: false as you've suggested?

Any ideas why it was forced in the first place?

No clue why it was done like that in the first place 🤷

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/7113#issuecomment-950112480): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
0x4c6565 commented 9 months ago

/reopen

k8s-ci-robot commented 9 months ago

@0x4c6565: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/7113#issuecomment-1843456270): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
0x4c6565 commented 9 months ago

I've now added a WIP PR for this issue

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 8 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/7113#issuecomment-1902067778): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.