kubernetes-client / java

Official Java client library for kubernetes
http://kubernetes.io/
Apache License 2.0
3.59k stars 1.91k forks source link

InterruptedException during graceful shutdown with Kubernetes Java Client #2778

Closed junger-dev closed 7 months ago

junger-dev commented 1 year ago

Issue Description:

During the graceful shutdown process of my Spring Boot application in a Kubernetes environment, I noticed that the Kubernetes Client's informer and related threads are getting interrupted. Here's a snippet of the logs during the shutdown process:

2023-08-30 19:01:07.515 [SpringApplicationShutdownHook] INFO o.s.b.w.e.t.GracefulShutdown:shutDownGracefully:53 - Commencing graceful shutdown. Waiting for active requests to complete
2023-08-30 19:01:07.521 [tomcat-shutdown] INFO o.s.b.w.e.t.GracefulShutdown:doShutdown:78 - Graceful shutdown complete
2023-08-30 19:01:07.539 [controller-reflector-io.kubernetes.client.openapi.models.V1ConfigMap-1] INFO i.k.c.i.c.ReflectorRunnable:run:162 - class io.kubernetes.client.openapi.models.V1ConfigMap#Read timeout retry list and watch
2023-08-30 19:01:07.542 [informer-controller-V1ConfigMap] ERROR i.k.c.i.c.Controller:processLoop:164 - DefaultController#processLoop get interrupted null
java.lang.InterruptedException: null
    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1638)
    at io.kubernetes.client.informer.cache.DeltaFIFO.pop(DeltaFIFO.java:318)
    at io.kubernetes.client.informer.cache.Controller.processLoop(Controller.java:162)
    at io.kubernetes.client.informer.cache.Controller.run(Controller.java:130)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-08-30 19:01:07.542 [pool-11-thread-1] ERROR i.k.c.i.c.ProcessorListener:run:96 - processor interrupted: {}
java.lang.InterruptedException: null
    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1638)
    at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
    at io.kubernetes.client.informer.cache.ProcessorListener.run(ProcessorListener.java:58)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
2023-08-30 19:01:09.860 [SpringApplicationShutdownHook] INFO o.s.o.j.LocalContainerEntityManagerFactoryBean:destroy:651 - Closing JPA EntityManagerFactory for persistence unit 'default'

Environment:

# configmap.yaml

apiVersion: v1
data:
  application.properties: |
    my-api.use-db=false
kind: ConfigMap
metadata:
  name: my-api-config
  namespace: my-namespace
..
# bootstrap.yaml

spring:
  cloud:
    kubernetes:
      enabled: true
      config:
        sources:
        - name: my-api-config
        use-name-as-prefix: false
        enabled: true
        namespace: my-namespace
        name: my-api-config
      reload:
        enabled: true

Expected Behavior:

The Kubernetes Client's threads (like the informer) should not be interrupted during a graceful shutdown, or should be able to handle interruptions more gracefully.

Actual Behavior:

The threads are getting interrupted and are throwing InterruptedException.

Question:

Is there a recommended way to cleanup Kubernetes Client resources during a pod's preStop hook or any other mechanism to avoid such interruptions?

Any insights or recommendations on how to handle this situation would be greatly appreciated.

brendandburns commented 1 year ago

Is this causing a problem? The situation is that the informer is waiting for an event. When you chose to shutdown, that wait needs to be broken, or the shutdown will hang until another event comes.

I suppose we could deliver a "watch stopping" event or something like that on shutdown, but this really shouldn't cause any problems (other than printing the exception into the logs) if you are shutting down your application anyway.

brendandburns commented 1 year ago

cc @yue9944882 for thoughts.

junger-dev commented 1 year ago

@brendandburns Thank you for the swift response.

Is this causing a problem?

: There are no issues with the application.

I understand that the informer waits for an event and needs to be interrupted to ensure the application doesn't hang during shutdown. My primary concern arises from the exception being logged, even if the application seems to shut down properly. In an environment where monitoring and alerting tools are sensitive to such exceptions, this can trigger false alarms — and it currently does.

Is there an existing configuration or workaround to address this behavior in the current version? If not, would excluding this error log from our monitoring alert tool be the best approach?

Thank you again.

brendandburns commented 1 year ago

The SharedIndexInformerFactory class allows you to pass in a custom exception handler:

https://github.com/kubernetes-client/java/blob/a14d0f30cc85ef2ed8cccff7eab7a183ee8190e3/util/src/main/java/io/kubernetes/client/informer/SharedInformerFactory.java#L140

Which will control what happens when an exception is thrown (or at least it should)

It's possible that not all of the right plumbing is in the right places to make this happen from Spring.

Please take a look at that code and see if it works for you. If not, can you add more details here about what is needed?

Thanks!

junger-dev commented 1 year ago

@brendandburns

Thank you for pointing out the SharedIndexInformerFactory class and its capabilities. While the SharedIndexInformerFactory class provides an option to pass in a custom exception handler to deal with the InterruptedException, I believe a more desirable approach would be to ensure that the informer and associated resources are terminated properly before the application shuts down. This preventive approach could lead to a more stable and predictable behavior during the shutdown process, rather than addressing the exception after it occurs.

Given that Kubernetes already has a preStop hook functionality, we can leverage this by triggering an endpoint like /kubernetes-client/informers/stop before the actual application shutdown. It would be beneficial if a future release of the Kubernetes Client could introduce methods like stop() or something similar for components like Controller. This would enable applications to shut down these components gracefully and release the associated resources.

brendandburns commented 1 year ago

Given that interrupting the wait as you suggest is complicated to implement and we have a custom exception handler to suppress the log, I don't think this is a high priority to fix. However, if you want to send a PR implementing the approach that you suggest we would be happy to review that PR.

Thanks!

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 7 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-client/java/issues/2778#issuecomment-2024152367): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.