k8snetworkplumbingwg / ovs-cni

Open vSwitch CNI plugin
Apache License 2.0
218 stars 70 forks source link

Fix problem with empty cache after reboot #304

Closed ykulazhenkov closed 4 months ago

ykulazhenkov commented 4 months ago

What this PR does / why we need it: Use /var/lib/cni/ovs-cni/cache dir to store config cache. Old path /tmp/ovscache is cleaned up after reboot on some operating systems. Lack of the cache entries may prevent ovs-cni to do a proper cleanup on CmdDel. Use /var/lib/cni/ovs-cni/cache as a persistent cache dir.

Special notes for your reviewer:

Problem with the current code (before this patch):

  1. /tmp/ovscache is used as a cache path by default https://github.com/k8snetworkplumbingwg/ovs-cni/blob/10d2594d0fbcce45af9c7e819ea600404100c7b8/pkg/utils/cache.go#L29
  2. /tmp dir will be cleaned up during the host reboot on some distros
  3. CmdDel exit without an error if cache entry not found https://github.com/k8snetworkplumbingwg/ovs-cni/blob/10d2594d0fbcce45af9c7e819ea600404100c7b8/pkg/plugin/plugin.go#L485
  4. ovs-cni will not be able to do a cleanup after the reboot, this will cause, the following state in ovs
    d178a096-20dd-4aa1-931a-0a3e82613228
    Bridge mybr
        Port mybr
            Interface mybr
                type: internal
        Port vethd25775b1
            Interface vethd25775b1
                error: "could not open network device vethd25775b1 (No such device)"
    ovs_version: "2.17.9"
  5. on next CmdAdd all port with an error will be removed https://github.com/k8snetworkplumbingwg/ovs-cni/blob/10d2594d0fbcce45af9c7e819ea600404100c7b8/pkg/plugin/plugin.go#L271
  6. for non HW offloading use-case this will fix the issue (symptom, not the root cause)
  7. for HW offloading use-case, Ports will have no error and these ports will not be released by the cleanPorts function. As a result VFs will stuck in the OVS and manual clean-up is required before they can be used again.

Release note:

NONE

cc @e0ne @SchSeba

kubevirt-bot commented 4 months ago

Hi @ykulazhenkov. Thanks for your PR.

I'm waiting for a k8snetworkplumbingwg member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ykulazhenkov commented 4 months ago

After discussion with @e0ne we decided that it is better to implement alternative option. I marked PR as draft to apply required changes.

ykulazhenkov commented 4 months ago

After discussion with @e0ne we decided that it is better to implement alternative option. I marked PR as draft to apply required changes.

required changes implemented. The PR is ready for reviews

SchSeba commented 4 months ago

/ok-to-test

SchSeba commented 4 months ago

/lgtm /approve

kubevirt-bot commented 4 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SchSeba, ykulazhenkov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/k8snetworkplumbingwg/ovs-cni/blob/main/OWNERS)~~ [SchSeba] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ykulazhenkov commented 4 months ago

/test pull-e2e-ovs-cni