openshift / ingress-node-firewall

Ingress node firewall implements Kubernetes operator to provision stateless ingress node level firewall rules, stateless ingress node firewall implementation is done using eBPF XDP kernel plugin
Apache License 2.0
45 stars 25 forks source link

I saw "could not attach XDP program: create link: device or resource busy" once #208

Closed martinkennelly closed 2 years ago

martinkennelly commented 2 years ago

Describe the bug Daemon fails to load or more likely didnt unload XDP program at some stage.

To Reproduce Steps to reproduce the behaviour: Unknown so far. Produced it with e2e tests in #173 but couldn't replicate it. Opened this issue to track my investigation.

Logs - but i failed to get previous logs when it occurred :(

1.6655759345163863e+09  INFO    setup   Version {"version.Version": "361d7226-dirty"}
I1012 11:58:55.567113  172749 request.go:682] Waited for 1.039010097s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/k8s.cni.cncf.io/v1?timeout=32s
1.6655759368196218e+09  INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": "127.0.0.1:39301"}
1.665575936819802e+09   INFO    setup   starting manager
1.6655759368200016e+09  INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:39301"}
1.665575936820043e+09   INFO    Starting server {"kind": "health probe", "addr": "127.0.0.1:39300"}
1.6655759368201237e+09  INFO    Starting EventSource    {"controller": "ingressnodefirewallnodestate", "controllerGroup": "ingressnodefirewall.openshift.io", "controllerKind": "IngressNodeFirewallNodeState", "source": "kind source: *v1alpha1.IngressNodeFirewallNodeState"}
1.6655759368201387e+09  INFO    Starting Controller {"controller": "ingressnodefirewallnodestate", "controllerGroup": "ingressnodefirewall.openshift.io", "controllerKind": "IngressNodeFirewallNodeState"}
1.665575936921046e+09   INFO    Starting workers    {"controller": "ingressnodefirewallnodestate", "controllerGroup": "ingressnodefirewall.openshift.io", "controllerKind": "IngressNodeFirewallNodeState", "worker count": 1}
1.6655759568332734e+09  INFO    controllers.IngressNodeFirewall Reconciling resource and programming bpf    {"name": "worker-0.ostest.test.metalkube.org", "namespace": "openshift-ingress-node-firewall"}
1.6655759568333015e+09  INFO    controllers.IngressNodeFirewall.syncIngressNodeFirewallResources    Running sync operation  {"ifaceIngressRules": {"genev_sys_6081":[{"sourceCIDRs":["10.129.2.43/32"],"rules":[{"order":1,"protocolConfig":{"protocol":"TCP","tcp":{"ports":"80"}},"action":"Deny"},{"order":2,"protocolConfig":{"protocol":"UDP","udp":{"ports":"80"}},"action":"Deny"}]},{"sourceCIDRs":["fd01:0:0:6::2b/128"],"rules":[{"order":1,"protocolConfig":{"protocol":"TCP","tcp":{"ports":"80"}},"action":"Deny"},{"order":2,"protocolConfig":{"protocol":"UDP","udp":{"ports":"80"}},"action":"Deny"}]}]}, "isDelete": false}
1.6655759568334153e+09  INFO    controllers.IngressNodeFirewall Creating a new eBPF firewall node controller
I1012 11:59:16.879072  172749 ingress_node_firewall_loader.go:327] Loading interfaces from pinned dir into memory
2022/10/12 11:59:16 Listening for events..
1.6655759568793685e+09  INFO    controllers.IngressNodeFirewall Comparing currently managed interfaces against list of XDP interfaces on system {"e.managedInterfaces": {}}
1.6655759568797479e+09  INFO    controllers.IngressNodeFirewall Attaching firewall interface    {"intf": "genev_sys_6081"}
1.6655759568798752e+09  ERROR   controllers.IngressNodeFirewall Fail to attach ingress firewall prog    {"error": "could not attach XDP program: create link: device or resource busy", "errorCauses": [{"error": "could not attach XDP program: create link: device or resource busy"}]}
github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer.(*ebpfSingleton).attachNewInterfaces.func2
    /go/src/github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer/ebpfsyncer.go:187
k8s.io/client-go/util/retry.OnError.func1
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/client-go/util/retry/util.go:51
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:222
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:228
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:423
k8s.io/client-go/util/retry.OnError
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/client-go/util/retry/util.go:50
github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer.(*ebpfSingleton).attachNewInterfaces
    /go/src/github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer/ebpfsyncer.go:179
github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer.(*ebpfSingleton).SyncInterfaceIngressRules
    /go/src/github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer/ebpfsyncer.go:102
github.com/openshift/ingress-node-firewall/controllers.(*IngressNodeFirewallNodeStateReconciler).reconcileResource
    /go/src/github.com/openshift/ingress-node-firewall/controllers/ingressnodefirewallnodestate_controller.go:94
github.com/openshift/ingress-node-firewall/controllers.(*IngressNodeFirewallNodeStateReconciler).Reconcile
    /go/src/github.com/openshift/ingress-node-firewall/controllers/ingressnodefirewallnodestate_controller.go:77
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
1.6655759568910933e+09  INFO    controllers.IngressNodeFirewall Attaching firewall interface    {"intf": "genev_sys_6081"}
1.6655759568913753e+09  ERROR   controllers.IngressNodeFirewall Fail to attach ingress firewall prog    {"error": "could not attach XDP program: create link: device or resource busy", "errorCauses": [{"error": "could not attach XDP program: create link: device or resource busy"}]}
github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer.(*ebpfSingleton).attachNewInterfaces.func2
    /go/src/github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer/ebpfsyncer.go:187
k8s.io/client-go/util/retry.OnError.func1
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/client-go/util/retry/util.go:51
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:222
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:228
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:423
k8s.io/client-go/util/retry.OnError
    /go/src/github.com/openshift/ingress-node-firewall/vendor/k8s.io/client-go/util/retry/util.go:50
github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer.(*ebpfSingleton).attachNewInterfaces
    /go/src/github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer/ebpfsyncer.go:179
github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer.(*ebpfSingleton).SyncInterfaceIngressRules
    /go/src/github.com/openshift/ingress-node-firewall/pkg/ebpfsyncer/ebpfsyncer.go:102
github.com/openshift/ingress-node-firewall/controllers.(*IngressNodeFirewallNodeStateReconciler).reconcileResource
    /go/src/github.com/openshift/ingress-node-firewall/controllers/ingressnodefirewallnodestate_controller.go:94
github.com/openshift/ingress-node-firewall/controllers.(*IngressNodeFirewallNodeStateReconciler).Reconcile
    /go/src/github.com/openshift/ingress-node-firewall/controllers/ingressnodefirewallnodestate_controller.go:77
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/src/github.com/openshift/ingress-node-firewall/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
.....
Keeps repeating
msherif1234 commented 2 years ago

@martinkennelly you have #204 ?

martinkennelly commented 2 years ago

Replicated it to occur when an INF policy was active and if you deleted a daemon, it would crash after it came back up. It is fixed by #204. :)