redhat-cop / namespace-configuration-operator

The namespace-configuration-operator helps keeping configurations related to Users, Groups and Namespaces aligned with one of more policies specified as a CRs
Apache License 2.0
204 stars 55 forks source link

some namespaces fail to be reconsiled consistently #60

Closed bergerx closed 4 years ago

bergerx commented 4 years ago

We have a certain template being applied to all but one specific namespace on multiple clusters with same configuration.

I couldn't yet get a way to reliably replicate the issue, or went through the code to debug, but here are two log lines keep repeating during reconciles, these may be related (i changed the log line format little bit to make them easy to read):

{
  "level": "error",
  "ts": 1595859877.1848087,
  "logger": "controller_patchlocker",
  "msg": "unable to update status for",
  "object": {
    "apiVersion": "redhatcop.redhat.io/v1alpha1",
    "kind": "NamespaceConfig",
    "name": "networkpolicy-allow-on-system-namespaces"
  },
  "error": "Operation cannot be fulfilled on namespaceconfigs.redhatcop.redhat.io \"networkpolicy-allow-on-system-namespaces\": the object has been modified; please apply your changes to the latest version and try again",
  "stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error
    /home/travis/gopath/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*EnforcingReconciler).ManageSuccess
    /home/travis/gopath/pkg/mod/github.com/redhat-cop/operator-utils@v0.3.3/pkg/util/lockedresourcecontroller/enforcing-reconciler.go:170
github.com/redhat-cop/namespace-configuration-operator/pkg/controller/namespaceconfig.(*ReconcileNamespaceConfig).Reconcile
    /home/travis/gopath/src/github.com/redhat-cop/namespace-configuration-operator/pkg/controller/namespaceconfig/namespaceconfig_controller.go:195
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
    /home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"
}
{
  "level": "error",
  "ts": 1595859877.1848984,
  "logger": "controller-runtime.controller",
  "msg": "Reconciler error",
  "controller": "namespace-config-operator",
  "request": "/networkpolicy-allow-on-system-namespaces",
  "error": "Operation cannot be fulfilled on namespaceconfigs.redhatcop.redhat.io \"networkpolicy-allow-on-system-namespaces\": the object has been modified; please apply your changes to the latest version and try again",
  "stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error
    /home/travis/gopath/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
    /home/travis/gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
    /home/travis/gopath/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90"
}

Initially we were on an older version and just upgraded the operator to the recent v0.2.1 version with no help for the issue, the log above is the we are getting in v0.2.1.

raffaelespazzoli commented 4 years ago

the update status error is most often innocuous. There is a clear reason for it and I did not found away to eliminate it. You should see that eventually the system converges to a stable and correct state. If that is not happening for your situation, let's dig deeper. I'll need to see more logs prior and after the error messages.

bergerx commented 4 years ago

Nope, this issue was sticking forever, cant get around eventually and keeps the provisioner in the reconcile loop indefinitely causing it to create huge logs i guess since doing a kubectl logs --tail 10 -f <pod-id> was causing non-stop log flow.

There were no particular logs around these, just a few regular reconcile logs, i don't think we have any instance around anymore, but i'll try to get if we still have an instance.

This issue was happening on one NamespaceConfig and one particular matching namespace (it was kube-system). We developed a workaround and solved the issue by removing the problematic NamespaceConfig. I tried to replicate the issue but strangely was not successful. Will try to gather more info, or will close this issue.

raffaelespazzoli commented 4 years ago

ok, so let me know if we can actually troubleshoot this one. If not we should close it. recent releases of this operator have improved data validation, preventing the operator from entering some of the loops you describe. However this is not enough o say that the issue has been solved.

raffaelespazzoli commented 4 years ago

@bergerx are you still experiencing the problem? May I close this issue?

bergerx commented 4 years ago

Lets close this, i can open another one if i hit again and more details.

On Mon 31 Aug 2020, 17:55 raffaelespazzoli, notifications@github.com wrote:

@bergerx https://github.com/bergerx are you still experiencing the problem? May I close this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/redhat-cop/namespace-configuration-operator/issues/60#issuecomment-683901966, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALDW3SCFQWRVQPRN7ZGL3SDPIWLANCNFSM4PI3EFKQ .