Open bhellema opened 1 year ago
@bhellema how often you see this ? is it resolvedd shortly after?
thinking if this might be an issue in JOSDK:
So, the ... && hasSynced() && isWatching()
is also part of the expression. So hasSynced might be problem on startup. But isWatching
could be problem any time, since watch might be down.
However this coin has two sides, if the informer is running but not watching it is a problem. But only if it happens for a longer time period. Since it might resemble a real problem with the watch (like lost permissions to watch a resource). I will add log messages for the next patch release, so we can verify if this is the problem.
@csviri the issue showed up pretty frequently and wouldn't resolve.
Is there a release with this debug logging that I can try?
@bhellema just released it: https://github.com/java-operator-sdk/java-operator-sdk/releases/tag/v4.2.4
see logging: https://github.com/java-operator-sdk/java-operator-sdk/pull/1737/files
The corresponding Quarkus extension release is 5.0.4. Please let us know if that addresses your issue.
I believe this has been addressed in the 5.0.4 version so closing this issue for now. Please re-open if you're still encounter the problem.
@metacosm, sorry for the very long delay in getting back to this, I finally had a bit of time to try to upgrade things. I'm still having issues using the latest quarkus-operator-sdk 6.0.1.
It looks like we're having issues getting the Kubernetes Client injected into the Health check for some reason after having upgraded. I'm still investigating.
{"time":"2023-05-10T17:28:52.489Z","class":"io.smallrye.health","level":"ERROR","message":"SRHCK01000: Error processing Health Checks [Error Occurred After Shutdown]","stackTrace":"java.lang.RuntimeException: Error injecting io.fabric8.kubernetes.client.KubernetesClient com.adobe.granite.contentbackflow.operator.probes.ContentBackflowHealthCheck.kubernetesClient\n\tat com.adobe.granite.contentbackflow.operator.probes.ContentBackflowHealthCheck_Bean.doCreate(Unknown Source)\n\tat com.adobe.granite.contentbackflow.operator.probes.ContentBackflowHealthCheck_Bean.create(Unknown Source)\n\tat com.adobe.granite.contentbackflow.operator.probes.ContentBackflowHealthCheck_Bean.create(Unknown Source)\n\tat io.quarkus.arc.impl.AbstractSharedContext.createInstanceHandle(AbstractSharedContext.java:113)\n\tat io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:37)\n\tat io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:34)\n\tat io.quarkus.arc.impl.LazyValue.get(LazyValue.java:26)\n\tat io.quarkus.arc.impl.ComputingCache.computeIfAbsent(ComputingCache.java:69)\n\tat io.quarkus.arc.impl.AbstractSharedContext.get(AbstractSharedContext.java:34)\n\tat io.quarkus.arc.impl.ClientProxies.getApplicationScopedDelegate(ClientProxies.java:21)\n\tat com.adobe.granite.contentbackflow.operator.probes.ContentBackflowHealthCheck_ClientProxy.arc$delegate(Unknown Source)\n\tat com.adobe.granite.contentbackflow.operator.probes.ContentBackflowHealthCheck_ClientProxy.call(Unknown Source)\n\tat io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)\n\tat io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:28)\n\tat io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)\n\tat io.smallrye.mutiny.operators.uni.UniOnFailureFlatMap.subscribe(UniOnFailureFlatMap.java:31)\n\tat io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)\n\tat io.smallrye.mutiny.operators.uni.UniOnItemTransform.subscribe(UniOnItemTransform.java:22)\n\tat io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)\n\tat io.smallrye.mutiny.operators.uni.UniAndCombination$UniHandler.subscribe(UniAndCombination.java:233)\n\tat io.smallrye.mutiny.operators.uni.UniAndCombination$AndSupervisor.run(UniAndCombination.java:87)\n\tat io.smallrye.mutiny.operators.uni.UniAndCombination.subscribe(UniAndCombination.java:53)\n\tat io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)\n\tat io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:60)\n\tat io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)\n\tat io.smallrye.health.SmallRyeHealthReporter.getReadiness(SmallRyeHealthReporter.java:232)\n\tat io.smallrye.health.SmallRyeHealthReporter_ClientProxy.getReadiness(Unknown Source)\n\tat io.quarkus.smallrye.health.runtime.SmallRyeReadinessHandler.getHealth(SmallRyeReadinessHandler.java:11)\n\tat io.quarkus.smallrye.health.runtime.SmallRyeHealthHandlerBase.doHandle(SmallRyeHealthHandlerBase.java:44)\n\tat io.quarkus.smallrye.health.runtime.SmallRyeHealthHandlerBase.handle(SmallRyeHealthHandlerBase.java:31)\n\tat io.quarkus.smallrye.health.runtime.SmallRyeReadinessHandler.handle(SmallRyeReadinessHandler.java:7)\n\tat io.quarkus.smallrye.health.runtime.SmallRyeHealthHandlerBase.handle(SmallRyeHealthHandlerBase.java:19)\n\tat io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48)\n\tat io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137)\n\tat io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)\n\tat io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)\n\tat io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)\n\tat org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)\n\tat org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)\n\tat org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)\n\tat org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: java.lang.NullPointerException\n\tat io.quarkus.it.openshift.client.runtime.OpenShiftClientProducer_ProducerMethod_openShiftClient_ccf49c646fbe836d0e8b8406400d0ad276cd5438_Bean.get(Unknown Source)\n\tat io.quarkus.it.openshift.client.runtime.OpenShiftClientProducer_ProducerMethod_openShiftClient_ccf49c646fbe836d0e8b8406400d0ad276cd5438_Bean.get(Unknown Source)\n\t... 43 more\n","errorType":"java.lang.RuntimeException","errorMessage":"Error injecting io.fabric8.kubernetes.client.KubernetesClient com.adobe.granite.contentbackflow.operator.probes.ContentBackflowHealthCheck.kubernetesClient"}
@bhellema are you still experiencing issues?
Since upgrading to 5.0.0 we're noticing different behavior when it comes to starting up our operator. Sometimes we see the following health check status in our logs indicating that
Quarkus Operator SDK health check
is DOWN.Could there be a race condition here on this line?
@metacosm any chance you have some insight here wrt to this potentially being a race condition? I can have mixed results when starting our operator up.