openshift / compliance-operator

Operator providing OpenShift cluster compliance checks
Apache License 2.0
110 stars 110 forks source link

tailored profile state is Error if profile that supposed to be extended does not exist #791

Closed maratsal closed 2 years ago

maratsal commented 2 years ago

I am using Argo CD for deployment of compliance operator and for creation of scansettingbinding and tailoredprofile resources.

When I run argo cd sync I use following annotation with sync retry option, so that sync of application will go through and argo cd will retry sync and create scansettingbinding and tailoredprofile resources when CRDs are created.

    annotations:
      argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true

However, if tailored profile is set to Error state, then compliance operator never tries to reconcile it and the whole setup doesn't work as scansettingbinding is waiting for tailored profile, but tailored profile is never fixed.

To fix the issue I need to delete tailored profile and scan setting binding and after that run argo cd application sync one more time.

I can play with the delay of argo cd application sync retry, but I think operator should be retrying reconciliation of the tailored profile.

Status of scansettingbinding:

status:
  conditions:
  - lastTransitionTime: "2022-02-07T23:27:16Z"
    message: The TailoredProfile referenced has an error and is not usable
    reason: Invalid
    status: "False"
    type: Ready

Status of tailored profile:

status:
  errorMessage: 'fetching profile to be extended: Profile.compliance.openshift.io
    "ocp4-cis" not found'
  outputRef:
    name: ""
    namespace: ""
  state: ERROR
jhrozek commented 2 years ago

hmm, maybe we could reconcile on tailored profiles with a custom reconcile map function that would allow us to only reconcile on those TPs that are linked to a binding..let me check if that's possible. Retrying, I don't really like tbh, we'd be potentially flooding the API server undefinitely with requests.

jhrozek commented 2 years ago

Yeah, that seems to work, let me polish the code and submit.. /assign

maratsal commented 2 years ago

thanks @jhrozek

I am not a big expert in operators, but I was thinking that operator has to check all resources time to time and try to reconcile them if something is wrong (set to desired state) without any dependency on other resources. As an example in current case operator keeps trying to reconcile scansettingbinding. Please see logs sample.

{"level":"error","ts":1644276436.369145,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276437.374369,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276438.3788202,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276439.3839622,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276440.3922532,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276441.3998294,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276442.4042456,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276443.40741,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276444.4136932,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1644276445.7000763,"logger":"controller","msg":"Reconciler error","controller":"scansettingbinding-controller","name":"cis-compliance","namespace":"openshift-compliance","error":"NamedObjectReference openshift-compliance/ocp4-cis-node not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.19.11/pkg/util/wait/wait.go:90"}

So maybe instead of checking tailored profile only when it has scan setting binding operator can try reconcile tailored profile without dependency, but do it not in linear time frames, but by gradually increasing check intervals?

just thinking out loud :)

jhrozek commented 2 years ago

You're seeing the repeated message because it's an error and kubernetes controllers typically reconcile again on errors. See e.g. https://cloud.redhat.com/blog/kubernetes-operators-best-practices. For even more in-depth details, check the "Programming Kubernetes" book by Hausenblas and Schimanski.

maratsal commented 2 years ago

Thanks, I will check it out.

And I understand that controller is trying to reconcile on the error, what I was thinking shouldn't that be the same behaviour for tailored profile as well - if the state is error - just keep trying to reconcile it?

jhrozek commented 2 years ago

yes, but each controller typically watches only a single primary resource. What I'm adding is exactly that, reconcile on changes to a secondary resource. I'll post the patch shortly, just adding tests.