openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.44k stars 4.69k forks source link

[3.10] Continuous <nil> garbage collector errors #22019

Closed rezie closed 4 years ago

rezie commented 5 years ago

After upgrading from 3.9 to 3.10, I'm seeing a lot of garbage collector errors and warnings that keep repeating every 1 second from the master controller logs:

E0212 14:15:14.330548       1 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:125: Failed to list <nil>: the server could not find the requested resource

<repeat some number of times, then finally...>
--
  W0212 14:22:24.502724       1 reflector.go:341] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:125: watch of <nil> ended with: unexpected object: &{map[code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired]}

<then repeat from the top>

Turning up the loglevel to 6 doesn't improve diagnosing this issue. Any suggestions or ideas to help with troubleshooting?

Version

oc v3.10.0+67ef696-102 kubernetes v1.10.0+b81c8f8

Steps To Reproduce

At least in our case, just upgrade to 3.10.

xavierbaude commented 5 years ago

I get the same error with a redhat cluster (from 3.9 to 3.10.111). I get another issue with this upgrade : unable to delete object. A finalizer(foreground) is setup on the deploymentConfig but nothing happens. (same for job, route replicationcontroller, pod object...) When delete the finalizer, the object disappear but replicationcontroller still exist ! All seems to point a point to the garbage collector.

edit : FYI my bug was due to istio, after removing all istio CRDs, all is ok. I don't have this log message anymore.

rezie commented 5 years ago

This seems to persist after upgrading to 3.11 from that 3.10 cluster mentioned earlier. I also did a 3.11 install from scratch and the same issue shows up.

I get the same error with a redhat cluster (from 3.9 to 3.10.111). I get another issue with this upgrade : unable to delete object. A finalizer(foreground) is setup on the deploymentConfig but nothing happens. (same for job, route replicationcontroller, pod object...) When delete the finalizer, the object disappear but replicationcontroller still exist ! All seems to point a point to the garbage collector.

Re: unable to delete object. We also had that issue at some point - try removing the finalizer (or more generally, updating any non-immutable fields in the resource). That should cause the resource to be removed immediately.

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/origin/issues/22019#issuecomment-518018128): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
SakshamAbhi commented 2 weeks ago

/reopen

openshift-ci[bot] commented 2 weeks ago

@SakshamAbhi: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/openshift/origin/issues/22019#issuecomment-2089932749): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.