kubernetes-sigs / security-profiles-operator

The Kubernetes Security Profiles Operator
Apache License 2.0
678 stars 104 forks source link

bpf-recorder is not valid for spod pods #1837

Closed shaojini closed 5 months ago

shaojini commented 1 year ago

What happened:

After installing SPO, when the verification of the bpf-recorder for its up and running is done, it shows an error: container bpf-recorder is not valid for the spod pod. When I try to enable it by patching the spod configuration, all pods of spod have been crashed without able to successfully restart.

What you expected to happen:

The bpf-recorder is up and running.

How to reproduce it (as minimally and precisely as possible):

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS    RESTARTS   AGE
security-profiles-operator-8588b78997-4p2zv           1/1     Running   0          59s
security-profiles-operator-8588b78997-nlvxf           1/1     Running   0          59s
security-profiles-operator-8588b78997-wctnn           1/1     Running   0          59s
security-profiles-operator-webhook-8476cd6f8c-d7qqb   1/1     Running   0          56s
security-profiles-operator-webhook-8476cd6f8c-f2zqs   1/1     Running   0          56s
security-profiles-operator-webhook-8476cd6f8c-vq6z2   1/1     Running   0          56s
spod-2vkz6                                            2/2     Running   0          56s
spod-g9d2m                                            2/2     Running   0          56s
spod-kd4b6                                            2/2     Running   0          56s

root@k8s-master:~# kubectl -n security-profiles-operator logs --selector name=spod -c bpf-recorder
error: container bpf-recorder is not valid for pod spod-2vkz6

root@k8s-master:~# kubectl -n security-profiles-operator patch spod spod --type=merge -p '{"spec":{"enableBpfRecorder":true}}'
securityprofilesoperatordaemon.security-profiles-operator.x-k8s.io/spod patched

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS             RESTARTS        AGE
security-profiles-operator-8588b78997-4p2zv           1/1     Running            0               22m
security-profiles-operator-8588b78997-nlvxf           1/1     Running            0               22m
security-profiles-operator-8588b78997-wctnn           1/1     Running            0               22m
security-profiles-operator-webhook-8476cd6f8c-d7qqb   1/1     Running            0               21m
security-profiles-operator-webhook-8476cd6f8c-f2zqs   1/1     Running            0               21m
security-profiles-operator-webhook-8476cd6f8c-vq6z2   1/1     Running            0               21m
spod-28qd6                                            2/3     CrashLoopBackOff   7 (5m3s ago)    16m
spod-2msmj                                            2/3     Error              8 (5m6s ago)    16m
spod-rp2vz                                            2/3     CrashLoopBackOff   7 (4m50s ago)   16m

Anything else we need to know?:

Environment:

saschagrunert commented 1 year ago

Hey @shaojini, thank you for the report. CAn you extract the crash logs of the spod instances, like spod-28qd6?

shaojini commented 1 year ago
root@k8s-master:~# kubectl -n security-profiles-operator logs spod-6579n
Defaulted container "security-profiles-operator" out of: security-profiles-operator, bpf-recorder, metrics, non-root-enabler (init)
I0817 12:45:07.665328 1346435 main.go:260]  "msg"="Set logging verbosity to 0"
I0817 12:45:07.666835 1346435 main.go:266]  "msg"="Profiling support enabled: false"
I0817 12:45:07.667151 1346435 main.go:286] setup "msg"="starting component: spod" "buildDate"="1980-01-01T00:00:00Z" "buildTags"="netgo,osusergo,seccomp,apparmor" "cgoldFlags"="unknown" "compiler"="gc" "dependencies"="cloud.google.com/go/compute/metadata v0.2.3 ,cuelang.org/go v0.5.0 ,filippo.io/edwards25519 v1.0.0 ,github.com/AliyunContainerService/ack-ram-tool/pkg/credentials/alibabacloudsdkgo/helper v0.2.0 ,github.com/Azure/azure-sdk-for-go v68.0.0+incompatible ,github.com/Azure/go-autorest/autorest v0.11.29 ,github.com/Azure/go-autorest/autorest/adal v0.9.22 ,github.com/Azure/go-autorest/autorest/azure/auth v0.5.12 ,github.com/Azure/go-autorest/autorest/azure/cli v0.4.6 ,github.com/Azure/go-autorest/autorest/date v0.3.0 ,github.com/Azure/go-autorest/logger v0.2.1 ,github.com/Azure/go-autorest/tracing v0.6.0 ,github.com/OneOfOne/xxhash v1.2.8 ,github.com/ProtonMail/go-crypto v0.0.0-20230518184743-7afd39499903 ,github.com/acobaugh/osrelease v0.1.0 ,github.com/agnivade/levenshtein v1.1.1 ,github.com/alibabacloud-go/alibabacloud-gateway-spi v0.0.4 ,github.com/alibabacloud-go/cr-20160607 v1.0.1 ,github.com/alibabacloud-go/cr-20181201 v1.0.10 ,github.com/alibabacloud-go/darabonba-openapi v0.1.18 ,github.com/alibabacloud-go/debug v0.0.0-20190504072949-9472017b5c68 ,github.com/alibabacloud-go/endpoint-util v1.1.1 ,github.com/alibabacloud-go/openapi-util v0.0.11 ,github.com/alibabacloud-go/tea v1.1.18 ,github.com/alibabacloud-go/tea-utils v1.4.4 ,github.com/alibabacloud-go/tea-xml v1.1.2 ,github.com/aliyun/credentials-go v1.2.3 ,github.com/aquasecurity/libbpfgo v0.4.9-libbpf-1.2.0 ,github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 ,github.com/aws/aws-sdk-go-v2 v1.18.1 ,github.com/aws/aws-sdk-go-v2/config v1.18.27 ,github.com/aws/aws-sdk-go-v2/credentials v1.13.26 ,github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.13.4 ,github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.34 ,github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.28 ,github.com/aws/aws-sdk-go-v2/internal/ini v1.3.35 ,github.com/aws/aws-sdk-go-v2/service/ecr v1.15.0 ,github.com/aws/aws-sdk-go-v2/service/ecrpublic v1.12.0 ,github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.28 ,github.com/aws/aws-sdk-go-v2/service/sso v1.12.12 ,github.com/aws/aws-sdk-go-v2/service/ssooidc v1.14.12 ,github.com/aws/aws-sdk-go-v2/service/sts v1.19.2 ,github.com/aws/smithy-go v1.13.5 ,github.com/awslabs/amazon-ecr-credential-helper/ecr-login v0.0.0-20220228164355-396b2034c795 ,github.com/beorn7/perks v1.0.1 ,github.com/blang/semver v3.5.1+incompatible ,github.com/blang/semver/v4 v4.0.0 ,github.com/buildkite/agent/v3 v3.49.0 ,github.com/cert-manager/cert-manager v1.12.3 ,github.com/cespare/xxhash/v2 v2.2.0 ,github.com/chrismellard/docker-credential-acr-env v0.0.0-20220119192733-fe33c00cee21 ,github.com/clbanning/mxj/v2 v2.5.6 ,github.com/cloudflare/circl v1.3.3 ,github.com/cockroachdb/apd/v2 v2.0.2 ,github.com/common-nighthawk/go-figure v0.0.0-20210622060536-734e95fb86be ,github.com/containerd/stargz-snapshotter/estargz v0.14.3 ,github.com/containers/common v0.55.3 ,github.com/coreos/go-oidc/v3 v3.6.0 ,github.com/cpuguy83/go-md2man/v2 v2.0.2 ,github.com/cyberphone/json-canonicalization v0.0.0-20230514072755-504adb8a8af1 ,github.com/davecgh/go-spew v1.1.1 ,github.com/digitorus/pkcs7 v0.0.0-20221212123742-001c36b64ec3 ,github.com/digitorus/timestamp v0.0.0-20221019182153-ef3b63b79b31 ,github.com/dimchansky/utfbom v1.1.1 ,github.com/docker/cli v24.0.0+incompatible ,github.com/docker/distribution v2.8.2+incompatible ,github.com/docker/docker v24.0.2+incompatible ,github.com/docker/docker-credential-helpers v0.7.0 ,github.com/emicklei/go-restful/v3 v3.9.0 ,github.com/emicklei/proto v1.10.0 ,github.com/evanphx/json-patch/v5 v5.6.0 ,github.com/fsnotify/fsnotify v1.6.0 ,github.com/gabriel-vasile/mimetype v1.4.2 ,github.com/ghodss/yaml v1.0.0 ,github.com/go-chi/chi v4.1.2+incompatible ,github.com/go-jose/go-jose/v3 v3.0.0 ,github.com/go-logr/logr v1.2.4 ,github.com/go-logr/stdr v1.2.2 ,github.com/go-openapi/analysis v0.21.4 ,github.com/go-openapi/errors v0.20.3 ,github.com/go-openapi/jsonpointer v0.19.6 ,github.com/go-openapi/jsonreference v0.20.2 ,github.com/go-openapi/loads v0.21.2 ,github.com/go-openapi/runtime v0.26.0 ,github.com/go-openapi/spec v0.20.9 ,github.com/go-openapi/strfmt v0.21.7 ,github.com/go-openapi/swag v0.22.4 ,github.com/go-openapi/validate v0.22.1 ,github.com/go-playground/locales v0.14.1 ,github.com/go-playground/universal-translator v0.18.1 ,github.com/go-playground/validator/v10 v10.14.0 ,github.com/gobwas/glob v0.2.3 ,github.com/gogo/protobuf v1.3.2 ,github.com/golang-jwt/jwt/v4 v4.5.0 ,github.com/golang/glog v1.1.0 ,github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da ,github.com/golang/protobuf v1.5.3 ,github.com/golang/snappy v0.0.4 ,github.com/google/certificate-transparency-go v1.1.6 ,github.com/google/gnostic-models v0.6.8 ,github.com/google/go-cmp v0.5.9 ,github.com/google/go-containerregistry v0.16.1 ,github.com/google/go-github/v50 v50.2.0 ,github.com/google/go-querystring v1.1.0 ,github.com/google/gofuzz v1.2.0 ,github.com/google/s2a-go v0.1.4 ,github.com/google/uuid v1.3.0 ,github.com/googleapis/enterprise-certificate-proxy v0.2.4 ,github.com/hashicorp/go-cleanhttp v0.5.2 ,github.com/hashicorp/go-retryablehttp v0.7.2 ,github.com/hashicorp/hcl v1.0.0 ,github.com/imdario/mergo v0.3.16 ,github.com/in-toto/in-toto-golang v0.9.0 ,github.com/jedisct1/go-minisign v0.0.0-20211028175153-1c139d1cc84b ,github.com/jellydator/ttlcache/v3 v3.0.1 ,github.com/jmespath/go-jmespath v0.4.0 ,github.com/josharian/intern v1.0.0 ,github.com/json-iterator/go v1.1.12 ,github.com/klauspost/compress v1.16.6 ,github.com/leodido/go-urn v1.2.4 ,github.com/letsencrypt/boulder v0.0.0-20230213213521-fdfea0d469b6 ,github.com/magiconair/properties v1.8.7 ,github.com/mailru/easyjson v0.7.7 ,github.com/matttproud/golang_protobuf_extensions v1.0.4 ,github.com/mitchellh/go-homedir v1.1.0 ,github.com/mitchellh/go-wordwrap v1.0.1 ,github.com/mitchellh/mapstructure v1.5.0 ,github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd ,github.com/modern-go/reflect2 v1.0.2 ,github.com/mozillazg/docker-credential-acr-helper v0.3.0 ,github.com/mpvl/unique v0.0.0-20150818121801-cbe035fff7de ,github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 ,github.com/nozzle/throttler v0.0.0-20180817012639-2ea982251481 ,github.com/nxadm/tail v1.4.8 ,github.com/oklog/ulid v1.3.1 ,github.com/open-policy-agent/opa v0.52.0 ,github.com/opencontainers/go-digest v1.0.0 ,github.com/opencontainers/image-spec v1.1.0-rc4 ,github.com/opencontainers/runtime-spec v1.1.0 ,github.com/openshift/api v0.0.0-20221205111557-f2fbb1d1cd5e ,github.com/opentracing/opentracing-go v1.2.0 ,github.com/pborman/uuid v1.2.1 ,github.com/pelletier/go-toml/v2 v2.0.8 ,github.com/pjbgf/go-apparmor v0.1.2 ,github.com/pkg/errors v0.9.1 ,github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.67.1 ,github.com/prometheus/client_golang v1.16.0 ,github.com/prometheus/client_model v0.4.0 ,github.com/prometheus/common v0.42.0 ,github.com/prometheus/procfs v0.10.1 ,github.com/protocolbuffers/txtpbfmt v0.0.0-20220428173112-74888fd59c2b ,github.com/rcrowley/go-metrics v0.0.0-20201227073835-cf1acfcdf475 ,github.com/russross/blackfriday/v2 v2.1.0 ,github.com/sassoftware/relic v7.2.1+incompatible ,github.com/seccomp/libseccomp-golang v0.10.0 ,github.com/secure-systems-lab/go-securesystemslib v0.6.0 ,github.com/segmentio/ksuid v1.0.4 ,github.com/shibumi/go-pathspec v1.3.0 ,github.com/sigstore/cosign/v2 v2.1.1 ,github.com/sigstore/fulcio v1.3.1 ,github.com/sigstore/rekor v1.2.2-0.20230601122533-4c81ff246d12 ,github.com/sigstore/sigstore v1.7.1 ,github.com/sigstore/timestamp-authority v1.1.1 ,github.com/sirupsen/logrus v1.9.3 ,github.com/skratchdot/open-golang v0.0.0-20200116055534-eef842397966 ,github.com/spf13/afero v1.9.5 ,github.com/spf13/cast v1.5.1 ,github.com/spf13/cobra v1.7.0 ,github.com/spf13/jwalterweatherman v1.1.0 ,github.com/spf13/pflag v1.0.5 ,github.com/spf13/viper v1.16.0 ,github.com/spiffe/go-spiffe/v2 v2.1.6 ,github.com/subosito/gotenv v1.4.2 ,github.com/syndtr/goleveldb v1.0.1-0.20220721030215-126854af5e6d ,github.com/tchap/go-patricia/v2 v2.3.1 ,github.com/theupdateframework/go-tuf v0.5.2 ,github.com/titanous/rocacheck v0.0.0-20171023193734-afe73141d399 ,github.com/tjfoc/gmsm v1.3.2 ,github.com/transparency-dev/merkle v0.0.2 ,github.com/urfave/cli/v2 v2.25.7 ,github.com/vbatts/tar-split v0.11.3 ,github.com/xanzy/go-gitlab v0.86.0 ,github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb ,github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415 ,github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 ,github.com/yashtewari/glob-intersection v0.1.0 ,github.com/zeebo/errs v1.3.0 ,go.mongodb.org/mongo-driver v1.11.3 ,go.opencensus.io v0.24.0 ,go.opentelemetry.io/otel v1.16.0 ,go.opentelemetry.io/otel/metric v1.16.0 ,go.opentelemetry.io/otel/trace v1.16.0 ,go.step.sm/crypto v0.32.1 ,go.uber.org/atomic v1.10.0 ,go.uber.org/multierr v1.11.0 ,go.uber.org/zap v1.24.0 ,golang.org/x/crypto v0.12.0 ,golang.org/x/exp v0.0.0-20230522175609-2e198f4a06a1 ,golang.org/x/mod v0.12.0 ,golang.org/x/net v0.14.0 ,golang.org/x/oauth2 v0.9.0 ,golang.org/x/sync v0.3.0 ,golang.org/x/sys v0.11.0 ,golang.org/x/term v0.11.0 ,golang.org/x/text v0.12.0 ,golang.org/x/time v0.3.0 ,gomodules.xyz/jsonpatch/v2 v2.3.0 ,google.golang.org/api v0.128.0 ,google.golang.org/appengine v1.6.7 ,google.golang.org/genproto/googleapis/rpc v0.0.0-20230530153820-e85fd2cbaebc ,google.golang.org/grpc v1.57.0 ,google.golang.org/protobuf v1.31.0 ,gopkg.in/go-jose/go-jose.v2 v2.6.1 ,gopkg.in/inf.v0 v0.9.1 ,gopkg.in/ini.v1 v1.67.0 ,gopkg.in/square/go-jose.v2 v2.6.0 ,gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 ,gopkg.in/yaml.v2 v2.4.0 ,gopkg.in/yaml.v3 v3.0.1 ,k8s.io/api v0.28.0 ,k8s.io/apiextensions-apiserver v0.27.2 ,k8s.io/apimachinery v0.28.0 ,k8s.io/client-go v0.28.0 ,k8s.io/component-base v0.27.2 ,k8s.io/klog/v2 v2.100.1 ,k8s.io/kube-openapi v0.0.0-20230717233707-2695361300d9 ,k8s.io/utils v0.0.0-20230505201702-9f6742963106 ,oras.land/oras-go/v2 v2.2.1 ,sigs.k8s.io/controller-runtime v0.15.1 ,sigs.k8s.io/gateway-api v0.7.0 ,sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd ,sigs.k8s.io/release-utils v0.7.4 ,sigs.k8s.io/structured-merge-diff/v4 v4.2.3 ,sigs.k8s.io/yaml v1.3.0 " "gitCommit"="6d51dc8d1bdae339b47facd5c9b8a0e884c30ff8" "gitCommitDate"="2023-08-17T07:36:21Z" "gitTreeState"="clean" "goVersion"="go1.20.4" "ldFlags"="unknown" "libbpf"="v1.2" "libseccomp"="2.5.4" "platform"="linux/amd64" "version"="0.8.1-dev"
I0817 12:45:07.667702 1346435 main.go:365] setup "msg"="watching all namespaces"
I0817 12:45:07.668061 1346435 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I0817 12:45:07.668442 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_error_total"
I0817 12:45:07.668482 1346435 metrics.go:217] metrics "msg"="Registering metric: selinux_profile_audit_total"
I0817 12:45:07.668497 1346435 metrics.go:217] metrics "msg"="Registering metric: apparmor_profile_total"
I0817 12:45:07.668504 1346435 metrics.go:217] metrics "msg"="Registering metric: apparmor_profile_audit_total"
I0817 12:45:07.668512 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_total"
I0817 12:45:07.668524 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_bpf_total"
I0817 12:45:07.668531 1346435 metrics.go:217] metrics "msg"="Registering metric: selinux_profile_error_total"
I0817 12:45:07.668539 1346435 metrics.go:217] metrics "msg"="Registering metric: apparmor_profile_error_total"
I0817 12:45:07.668546 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_audit_total"
I0817 12:45:07.668553 1346435 metrics.go:217] metrics "msg"="Registering metric: selinux_profile_total"
I0817 12:45:07.669531 1346435 grpc.go:60] metrics "msg"="Starting GRPC server API"
I0817 12:45:07.707643 1346435 profilerecorder.go:144] recorder-spod "msg"="Setting up profile recorder" "Node"="192.168.0.11"
I0817 12:45:07.707706 1346435 main.go:486] setup "msg"="starting daemon"
I0817 12:45:07.707891 1346435 server.go:50]  "msg"="starting server" "addr"={"IP":"::","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics"
I0817 12:45:07.707968 1346435 internal.go:360]  "msg"="Starting server" "addr"={"IP":"::","Port":8085,"Zone":""} "kind"="health probe"
I0817 12:45:07.708018 1346435 controller.go:177]  "msg"="Starting EventSource" "controller"="profile" "controllerGroup"="security-profiles-operator.x-k8s.io" "controllerKind"="SeccompProfile" "source"="kind source: *v1beta1.SeccompProfile"
I0817 12:45:07.708044 1346435 controller.go:177]  "msg"="Starting EventSource" "controller"="profile" "controllerGroup"="security-profiles-operator.x-k8s.io" "controllerKind"="SeccompProfile" "source"="kind source: *v1alpha1.SecurityProfilesOperatorDaemon"
I0817 12:45:07.708060 1346435 controller.go:185]  "msg"="Starting Controller" "controller"="profile" "controllerGroup"="security-profiles-operator.x-k8s.io" "controllerKind"="SeccompProfile"
I0817 12:45:07.708061 1346435 controller.go:177]  "msg"="Starting EventSource" "controller"="profilerecorder" "controllerGroup"="" "controllerKind"="Pod" "source"="kind source: *v1.Pod"
I0817 12:45:07.708076 1346435 controller.go:185]  "msg"="Starting Controller" "controller"="profilerecorder" "controllerGroup"="" "controllerKind"="Pod"
I0817 12:45:07.899659 1346435 controller.go:219]  "msg"="Starting workers" "controller"="profile" "controllerGroup"="security-profiles-operator.x-k8s.io" "controllerKind"="SeccompProfile" "worker count"=1
I0817 12:45:07.941865 1346435 controller.go:219]  "msg"="Starting workers" "controller"="profilerecorder" "controllerGroup"="" "controllerKind"="Pod" "worker count"=1
shaojini commented 1 year ago

Hi, @saschagrunert .

Any comment for this issue? Thanks.

saschagrunert commented 1 year ago

@shaojini we need to find out why the pod has been crashed, while the logs on https://github.com/kubernetes-sigs/security-profiles-operator/issues/1837#issuecomment-1682228753 do not indicate any crash at all. Do you have the logs of the crashing pod somehow available?

shaojini commented 1 year ago

Hi, @saschagrunert .

I have tried uninstalled and re-installed a few time, but the problem is the same. The logs given previously is taken from one of those tries (Before reporting the issue, I have tried at least twice to confirm it). The restart of pods may be normal because the patching on those spod pods has been done (the name of spods has been changed). However, the restart of "crash" pods can not been done successfully (describe of spod can find some information?):

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS    RESTARTS   AGE
security-profiles-operator-8588b78997-8cm8z           1/1     Running   0          17h
security-profiles-operator-8588b78997-9rg9j           1/1     Running   0          17h
security-profiles-operator-8588b78997-csrhk           1/1     Running   0          17h
security-profiles-operator-webhook-8476cd6f8c-g9m5v   1/1     Running   0          17h
security-profiles-operator-webhook-8476cd6f8c-nh57n   1/1     Running   0          17h
security-profiles-operator-webhook-8476cd6f8c-qzpk5   1/1     Running   0          17h
spod-lbcnc                                            3/3     Running   0          17h
spod-t5vf6                                            3/3     Running   0          17h
spod-wg9w7                                            3/3     Running   0          17h

root@k8s-master:~# kubectl -n security-profiles-operator logs --selector name=spod -c bpf-recorder
error: container bpf-recorder is not valid for pod spod-lbcnc

root@k8s-master:~# kubectl -n security-profiles-operator patch spod spod --type=merge -p '{"spec":{"enableBpfRecorder":true}}'
securityprofilesoperatordaemon.security-profiles-operator.x-k8s.io/spod patched

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS             RESTARTS      AGE
security-profiles-operator-8588b78997-8cm8z           1/1     Running            0             17h
security-profiles-operator-8588b78997-9rg9j           1/1     Running            0             17h
security-profiles-operator-8588b78997-csrhk           1/1     Running            0             17h
security-profiles-operator-webhook-8476cd6f8c-g9m5v   1/1     Running            0             17h
security-profiles-operator-webhook-8476cd6f8c-nh57n   1/1     Running            0             17h
security-profiles-operator-webhook-8476cd6f8c-qzpk5   1/1     Running            0             17h
spod-ppm5q                                            3/4     CrashLoopBackOff   5 (54s ago)   4m12s
spod-xn6wt                                            3/4     CrashLoopBackOff   5 (62s ago)   4m11s
spod-xwkft                                            3/4     CrashLoopBackOff   5 (49s ago)   4m11s

root@k8s-master:~# kubectl describe -n security-profiles-operator pod spod-ppm5q
Name:                 spod-ppm5q
Namespace:            security-profiles-operator
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      spod
Node:                 k8s-worker3/192.168.0.11
Start Time:           Tue, 22 Aug 2023 11:36:36 +0300
Labels:               app=security-profiles-operator
                      controller-revision-hash=6f798fcbb9
                      name=spod
                      pod-template-generation=3
Annotations:          openshift.io/scc: privileged
Status:               Running
SeccompProfile:       RuntimeDefault
IP:                   10.0.0.99
IPs:
  IP:           10.0.0.99
Controlled By:  DaemonSet/spod
Init Containers:
  non-root-enabler:
    Container ID:  cri-o://5f1ec4f1c35b36ee0e940fb6d73e553a15bbc5ea1483f18c246aecc549236ebc
    Image:         gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest
    Image ID:      gcr.io/k8s-staging-sp-operator/security-profiles-operator@sha256:40f98b564084d46acac519a515032e5602b6eec480d221771053c96f4057811d
    Port:          <none>
    Host Port:     <none>
    Args:
      non-root-enabler
      --runtime=cri-o
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 22 Aug 2023 11:36:44 +0300
      Finished:     Tue, 22 Aug 2023 11:36:44 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  50Mi
      memory:             64Mi
    Requests:
      cpu:                100m
      ephemeral-storage:  10Mi
      memory:             32Mi
    Environment:
      NODE_NAME:       (v1:spec.nodeName)
      KUBELET_DIR:    /var/lib/kubelet
      SPO_VERBOSITY:  0
    Mounts:
      /host from host-root-volume (rw)
      /opt/spo-profiles from operator-profiles-volume (ro)
      /var/lib from host-varlib-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g8x8b (ro)
      /var/run/secrets/metrics from metrics-cert-volume (rw)
Containers:
  security-profiles-operator:
    Container ID:        cri-o://b8f632a1fcfc144f57b6042ba44fa77ecbb5399163d77fe6a1ffb8529bf7fab8
    Image:               gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest
    Image ID:            gcr.io/k8s-staging-sp-operator/security-profiles-operator@sha256:40f98b564084d46acac519a515032e5602b6eec480d221771053c96f4057811d
    Port:                8085/TCP
    Host Port:           0/TCP
    SeccompProfile:      Localhost
      LocalhostProfile:  security-profiles-operator.json
    Args:
      daemon
      --with-recording=true
    State:          Running
      Started:      Tue, 22 Aug 2023 11:36:47 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  200Mi
      memory:             128Mi
    Requests:
      cpu:                100m
      ephemeral-storage:  50Mi
      memory:             64Mi
    Liveness:             http-get http://:liveness-port/healthz delay=0s timeout=1s period=10s #success=1 #failure=1
    Startup:              http-get http://:liveness-port/healthz delay=0s timeout=1s period=3s #success=1 #failure=10
    Environment:
      NODE_NAME:             (v1:spec.nodeName)
      OPERATOR_NAMESPACE:   security-profiles-operator (v1:metadata.namespace)
      SPOD_NAME:            spod
      KUBELET_DIR:          /var/lib/kubelet
      HOME:                 /home
      ENABLE_LOG_ENRICHER:  false
      ENABLE_BPF_RECORDER:  false
      SPO_VERBOSITY:        0
    Mounts:
      /etc/selinux.d from selinux-drop-dir (rw)
      /home from home-volume (rw)
      /tmp from tmp-volume (rw)
      /tmp/security-profiles-operator-recordings from profile-recording-output-volume (rw)
      /var/lib/kubelet/seccomp/operator from host-operator-volume (rw)
      /var/run/grpc from grpc-server-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g8x8b (ro)
      /var/run/selinuxd from selinuxd-private-volume (rw)
  log-enricher:
    Container ID:  cri-o://33b914d4e40dc990d51b7b19e6a7470b3446b90a6786f1a39764d2ec7ca2630e
    Image:         gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest
    Image ID:      gcr.io/k8s-staging-sp-operator/security-profiles-operator@sha256:40f98b564084d46acac519a515032e5602b6eec480d221771053c96f4057811d
    Port:          <none>
    Host Port:     <none>
    Args:
      log-enricher
    State:          Running
      Started:      Tue, 22 Aug 2023 11:36:48 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  128Mi
      memory:             256Mi
    Requests:
      cpu:                50m
      ephemeral-storage:  10Mi
      memory:             64Mi
    Environment:
      NODE_NAME:       (v1:spec.nodeName)
      KUBELET_DIR:    /var/lib/kubelet
      SPO_VERBOSITY:  0
    Mounts:
      /var/log from host-syslog-volume (ro)
      /var/log/audit from host-auditlog-volume (ro)
      /var/run/grpc from grpc-server-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g8x8b (ro)
  bpf-recorder:
    Container ID:  cri-o://1f26b655d092359cbd22fc47a72a6f065b0faee1cd92f44b542a5bb129241f62
    Image:         gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest
    Image ID:      gcr.io/k8s-staging-sp-operator/security-profiles-operator@sha256:40f98b564084d46acac519a515032e5602b6eec480d221771053c96f4057811d
    Port:          <none>
    Host Port:     <none>
    Args:
      bpf-recorder
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 22 Aug 2023 11:42:37 +0300
      Finished:     Tue, 22 Aug 2023 11:42:37 +0300
    Ready:          False
    Restart Count:  6
    Limits:
      ephemeral-storage:  20Mi
      memory:             128Mi
    Requests:
      cpu:                50m
      ephemeral-storage:  10Mi
      memory:             64Mi
    Environment:
      NODE_NAME:       (v1:spec.nodeName)
      KUBELET_DIR:    /var/lib/kubelet
      SPO_VERBOSITY:  0
    Mounts:
      /etc/os-release from host-etc-osrelease-volume (rw)
      /sys/kernel/debug from sys-kernel-debug-volume (ro)
      /tmp from tmp-volume (rw)
      /var/run/grpc from grpc-server-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g8x8b (ro)
  metrics:
    Container ID:  cri-o://795ced755723f826f0fbb7cb80584fc4e5e0a777340a418102e7c55c9b6f3519
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.14.1
    Image ID:      gcr.io/kubebuilder/kube-rbac-proxy@sha256:928e64203edad8f1bba23593c7be04f0f8410c6e4feb98d9e9c2d00a8ff59048
    Port:          9443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:9443
      --upstream=http://127.0.0.1:8080
      --v=10
      --tls-cert-file=/var/run/secrets/metrics/tls.crt
      --tls-private-key-file=/var/run/secrets/metrics/tls.key
    State:          Running
      Started:      Tue, 22 Aug 2023 11:36:50 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  20Mi
      memory:             128Mi
    Requests:
      cpu:                50m
      ephemeral-storage:  10Mi
      memory:             32Mi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g8x8b (ro)
      /var/run/secrets/metrics from metrics-cert-volume (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  host-varlib-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib
    HostPathType:  Directory
  host-operator-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/security-profiles-operator
    HostPathType:  DirectoryOrCreate
  operator-profiles-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      security-profiles-operator-profile
    Optional:  false
  selinux-drop-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  selinuxd-private-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  host-fsselinux-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/selinux
    HostPathType:  Directory
  host-etcselinux-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/selinux
    HostPathType:  Directory
  host-varlibselinux-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/selinux
    HostPathType:  Directory
  profile-recording-output-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/security-profiles-operator-recordings
    HostPathType:  DirectoryOrCreate
  host-auditlog-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/audit
    HostPathType:  DirectoryOrCreate
  host-syslog-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  DirectoryOrCreate
  metrics-cert-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  metrics-server-cert
    Optional:    false
  sys-kernel-debug-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/kernel/debug
    HostPathType:  Directory
  host-etc-osrelease-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/os-release
    HostPathType:  File
  tmp-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  grpc-server-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  host-root-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  Directory
  home-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-g8x8b:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m25s                  default-scheduler  Successfully assigned security-profiles-operator/spod-ppm5q to k8s-worker3
  Normal   Pulling    6m25s                  kubelet            Pulling image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest"
  Normal   Pulled     6m18s                  kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 6.98679362s (6.986814507s including waiting)
  Normal   Created    6m18s                  kubelet            Created container non-root-enabler
  Normal   Started    6m18s                  kubelet            Started container non-root-enabler
  Normal   Pulling    6m17s                  kubelet            Pulling image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest"
  Normal   Pulled     6m16s                  kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 987.480618ms (987.493453ms including waiting)
  Normal   Created    6m15s                  kubelet            Created container security-profiles-operator
  Normal   Started    6m15s                  kubelet            Started container security-profiles-operator
  Normal   Pulling    6m15s                  kubelet            Pulling image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest"
  Normal   Pulled     6m14s                  kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 1.266629881s (1.266641343s including waiting)
  Normal   Created    6m14s                  kubelet            Created container log-enricher
  Normal   Started    6m14s                  kubelet            Started container log-enricher
  Normal   Pulled     6m13s                  kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 827.998943ms (828.022247ms including waiting)
  Normal   Pulled     6m13s                  kubelet            Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.14.1" already present on machine
  Normal   Pulling    6m12s (x2 over 6m14s)  kubelet            Pulling image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest"
  Normal   Created    6m12s                  kubelet            Created container metrics
  Normal   Started    6m12s                  kubelet            Started container metrics
  Normal   Created    6m11s (x2 over 6m13s)  kubelet            Created container bpf-recorder
  Normal   Pulled     6m11s                  kubelet            Successfully pulled image "gcr.io/k8s-staging-sp-operator/security-profiles-operator:latest" in 1.006150487s (1.006230461s including waiting)
  Normal   Started    6m10s (x2 over 6m13s)  kubelet            Started container bpf-recorder
  Warning  BackOff    77s (x25 over 6m10s)   kubelet            Back-off restarting failed container bpf-recorder in pod spod-ppm5q_security-profiles-operator(8f1d734d-2b89-47e5-b74a-fd0bd996473f)
root@k8s-master:~#
saschagrunert commented 1 year ago

@shaojini you can see that the bpf-recorder has the container id 1f26b655d092359cbd22fc47a72a6f065b0faee1cd92f44b542a5bb129241f62 from kubectl describe. May I ask you to access the node and run something like sudo crictl logs <ID> to get the logs of the crashing container?

shaojini commented 1 year ago

Hi, @saschagrunert .

I have re-operated the issue again for comparing the "describe pod spod-xxxx" before and after the patching. The difference is to enable "recording" in the "daemon" and create one extra container of "bpf-recorder" in the spod. In addition, the Container IDs (cri-o) of security-profiles-operator and metric have been changed:

From the logs of the node (re-installing has been done), the error is the "container ID does not exist":

root@k8s-worker3:~# sudo crictl logs 723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e

E0822 14:57:23.045998 1389099 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e\": container with ID starting with 723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e not found: ID does not exist" containerID="723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e"
FATA[0000] rpc error: code = NotFound desc = could not find container "723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e": container with ID starting with 723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e not found: ID does not exist
shaojini commented 1 year ago

Hi @saschagrunert .

That ID in the "describe" is not the actual container id. I got the ID in this way:

root@k8s-worker3:~# crictl ps -a

CONTAINER           IMAGE                                                                                                                               CREATED             STATE               NAME                         ATTEMPT             POD ID              POD
a8fefdb4a1664       gcr.io/k8s-staging-sp-operator/security-profiles-operator@sha256:40f98b564084d46acac519a515032e5602b6eec480d221771053c96f4057811d   2 minutes ago       Exited              bpf-recorder                 53                  70487bf0e53cc       spod-2v9z7

Then to get the error from the logs of the container ID:
root@k8s-worker3:~# sudo crictl logs a8fefdb4a1664
..............................................................................
E0822 16:01:38.078963 1543597 main.go:235] setup "msg"="running security-profiles-operator" "error"="connect to metrics server: connect to local GRPC server: wait on retry: timed out waiting for the condition"

Because it tries to restart all time, the container id also changes all the time. Therefore, the logs of the ID can not be found after a few minutes.

saschagrunert commented 1 year ago

@shaojini the previous logs of the container should be still available somehow, see kubectl logs --previous. At least for a few minutes as you'd mentioned.

shaojini commented 1 year ago

Hi, @saschagrunert.

The bef-recorder container log has shown the error: "error"="connect to metrics server: connect to local GRPC server: wait on retry: timed out waiting for the condition".

Is it the root cause for the unsuccessful staring-up of the bef-recorder container according to the logs? Do you have the same issue when you config the bef-recorder for recording?

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 5 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/security-profiles-operator/issues/1837#issuecomment-2021566411): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.