openshift / kuryr-kubernetes

kuryr-kubernetes - CNI plugin using OpenStack Neutron and Octavia to provide networking for pods and services.
Apache License 2.0
21 stars 23 forks source link

Bug OCPBUGS-198: Cleanup KuryrPort when Pod is missing #691

Closed dulek closed 1 year ago

dulek commented 2 years ago

We can easily imagine an user frustrated by his pod not getting deleted and opting to remove the finalizer from the Pod. If the cause of the deletion delay was the kuryr-controller being down, we end up with an orphaned KuryrPort. At the moment this causes crashes, which obviously it shouldn't. Moreover we should figure out how to clean up the Neutron port if that happens. This commit does so as explained below.

  1. KuryrPort on_present() will trigger its deletion when it detects that Pod does not longer exist.
  2. Turns out security_groups parameter passed to release_vif() was never used. I removed it from drivers and got rid of get_security_groups() call from on_finalize() as it's no longer necessary.
  3. When we cannot get the Pod in KuryrPort on_finalize() we attempt to gather info required to cleanup the KuryrPort and "mock" a Pod object. A precaution is added that any error from release_vif() is ignored in that case to make sure failed cleanup is not causing the system to go down.
openshift-ci-robot commented 2 years ago

@dulek: This pull request references Jira Issue OCPBUGS-198, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.12.0) matches configured target version for branch (4.12.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @eurijon

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/kuryr-kubernetes/pull/691): >We can easily imagine an user frustrated by his pod not getting deleted >and opting to remove the finalizer from the Pod. If the cause of the >deletion delay was the kuryr-controller being down, we end up with an >orphaned KuryrPort. At the moment this causes crashes, which obviously >it shouldn't. Moreover we should figure out how to clean up the Neutron >port if that happens. This commit does so as explained below. > >1. KuryrPort on_present() will trigger its deletion when it detects that > Pod does not longer exist. >2. Turns out security_groups parameter passed to release_vif() was never > used. I removed it from drivers and got rid of get_security_groups() > call from on_finalize() as it's no longer necessary. >3. When we cannot get the Pod in KuryrPort on_finalize() we attempt to > gather info required to cleanup the KuryrPort and "mock" a Pod > object. A precaution is added that any error from release_vif() is > ignored in that case to make sure failed cleanup is not causing the > system to go down. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
dulek commented 2 years ago

Tests look good, only the one breaking CI failed.

MaysaMacedo commented 1 year ago

/lgtm /override ci/prow/e2e-openstack-kuryr

The test failure in unrelated to this PR

openshift-ci[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dulek, MaysaMacedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/kuryr-kubernetes/blob/master/OWNERS)~~ [MaysaMacedo,dulek] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 1 year ago

@MaysaMacedo: /override requires failed status contexts, check run or a prowjob name to operate on. The following unknown contexts/checkruns were given:

Only the following failed contexts/checkruns were expected:

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to [this](https://github.com/openshift/kuryr-kubernetes/pull/691#issuecomment-1257813891): >/lgtm >/override ci/prow/e2e-openstack-kuryr > >The test failure in unrelated to this PR Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci-robot commented 1 year ago

/retest-required

Remaining retests: 0 against base HEAD faaccd338ecbf3cc01814bbb4c872d5194617d95 and 2 for PR HEAD afaeee3c339bd1c8187da8d6ff156a2daedb935d in total

dulek commented 1 year ago

/override ci/prow/e2e-openstack-kuryr

openshift-ci[bot] commented 1 year ago

@dulek: Overrode contexts on behalf of dulek: ci/prow/e2e-openstack-kuryr

In response to [this](https://github.com/openshift/kuryr-kubernetes/pull/691#issuecomment-1257978614): >/override ci/prow/e2e-openstack-kuryr Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 1 year ago

@dulek: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 1 year ago

@dulek: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-198 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/kuryr-kubernetes/pull/691): >We can easily imagine an user frustrated by his pod not getting deleted >and opting to remove the finalizer from the Pod. If the cause of the >deletion delay was the kuryr-controller being down, we end up with an >orphaned KuryrPort. At the moment this causes crashes, which obviously >it shouldn't. Moreover we should figure out how to clean up the Neutron >port if that happens. This commit does so as explained below. > >1. KuryrPort on_present() will trigger its deletion when it detects that > Pod does not longer exist. >2. Turns out security_groups parameter passed to release_vif() was never > used. I removed it from drivers and got rid of get_security_groups() > call from on_finalize() as it's no longer necessary. >3. When we cannot get the Pod in KuryrPort on_finalize() we attempt to > gather info required to cleanup the KuryrPort and "mock" a Pod > object. A precaution is added that any error from release_vif() is > ignored in that case to make sure failed cleanup is not causing the > system to go down. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
dulek commented 1 year ago

/cherry-pick release-4.11

openshift-cherrypick-robot commented 1 year ago

@dulek: new pull request created: #694

In response to [this](https://github.com/openshift/kuryr-kubernetes/pull/691#issuecomment-1257994905): >/cherry-pick release-4.11 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.