micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.48k stars 992 forks source link

Stackdriver exception trying to publish metrics during shutdown #4353

Closed dimovelev closed 9 months ago

dimovelev commented 1 year ago

Describe the bug Whenever an application is shut down, the stack driver registry tries to publish its metrics and throws an exception with the error UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: Channel shutdown invoked

Environment Google Cloud Run Spring Boot 3.1.5 micrometer-registry-stackdriver 1.11.5 spring-cloud-gcp 4.8.2

To Reproduce

  1. Create an empty spring boot application e.g. from https://start.spring.io/
  2. Add the following dependencies
    <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>spring-cloud-gcp-starter</artifactId>
    </dependency>
    <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-stackdriver</artifactId>
    </dependency>
  3. Enable the micrometer metrics (e.g. application.properties)
    spring.cloud.gcp.metrics.enabled=true
  4. Login to GCP so that there are default credentials etc for GCP available
  5. Run the application. It will start and shutdown immediately as we have not added web server
  6. During shutdown you will get this exception stack trace
    
    2023-11-13T10:21:08.451+01:00  INFO 25564 --- [           main] com.example.demo.DemoApplication         : Starting DemoApplication using Java 21 with PID 25564 (D:\src\demo\target\classes started by XXX in D:\src\demo)
    2023-11-13T10:21:08.453+01:00  INFO 25564 --- [           main] com.example.demo.DemoApplication         : No active profile set, falling back to 1 default profile: "default"
    2023-11-13T10:21:09.076+01:00  INFO 25564 --- [           main] c.g.c.s.core.DefaultCredentialsProvider  : Default credentials provider for user XXX.apps.googleusercontent.com
    2023-11-13T10:21:09.076+01:00  INFO 25564 --- [           main] c.g.c.s.core.DefaultCredentialsProvider  : Scopes in use by default credentials: [https://www.googleapis.com/auth/pubsub, https://www.googleapis.com/auth/spanner.admin, https://www.googleapis.com/auth/spanner.data, https://www.googleapis.com/auth/datastore, https://www.googleapis.com/auth/sqlservice.admin, https://www.googleapis.com/auth/devstorage.read_only, https://www.googleapis.com/auth/devstorage.read_write, https://www.googleapis.com/auth/cloudruntimeconfig, https://www.googleapis.com/auth/trace.append, https://www.googleapis.com/auth/cloud-platform, https://www.googleapis.com/auth/cloud-vision, https://www.googleapis.com/auth/bigquery, https://www.googleapis.com/auth/monitoring.write]
    2023-11-13T10:21:09.413+01:00  INFO 25564 --- [           main] c.g.c.s.a.c.GcpContextAutoConfiguration  : The default project ID is XXX
    2023-11-13T10:21:10.052+01:00  INFO 25564 --- [           main] i.m.c.instrument.push.PushMeterRegistry  : publishing metrics for StackdriverMeterRegistry every 1m
    2023-11-13T10:21:10.415+01:00  INFO 25564 --- [           main] com.example.demo.DemoApplication         : Started DemoApplication in 2.263 seconds (process running for 2.877)
    2023-11-13T10:21:10.473+01:00  WARN 25564 --- [ionShutdownHook] i.m.s.StackdriverMeterRegistry           : Failed to pre populate verified descriptors for XXX

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f35496c[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@451b64f4[Wrapped task = TrustedListenableFutureTask@285303a0[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@7fa60050]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@7d04529c[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2081) ~[na:na] at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:841) ~[na:na] at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340) ~[na:na] at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562) ~[na:na] at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator.schedule(MoreExecutors.java:673) ~[guava-32.1.2-jre.jar:na] at com.google.api.gax.retrying.ScheduledRetryingExecutor.submit(ScheduledRetryingExecutor.java:116) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.handle(CallbackChainRetryingFuture.java:137) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.run(CallbackChainRetryingFuture.java:117) ~[gax-2.37.0.jar:2.37.0] at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-32.1.2-jre.jar:na] at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286) ~[guava-32.1.2-jre.jar:na] at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:760) ~[guava-32.1.2-jre.jar:na] at com.google.api.core.AbstractApiFuture.addListener(AbstractApiFuture.java:45) ~[api-common-2.20.0.jar:2.20.0] at com.google.api.gax.retrying.CallbackChainRetryingFuture.setAttemptFuture(CallbackChainRetryingFuture.java:93) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.AttemptCallable.call(AttemptCallable.java:88) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.RetryingCallable.futureCall(RetryingCallable.java:63) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.RetryingCallable.futureCall(RetryingCallable.java:41) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.PagedCallable.futureCall(PagedCallable.java:62) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.UnaryCallable$1.futureCall(UnaryCallable.java:126) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.UnaryCallable.futureCall(UnaryCallable.java:87) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112) ~[gax-2.37.0.jar:2.37.0] at com.google.cloud.monitoring.v3.MetricServiceClient.listMetricDescriptors(MetricServiceClient.java:741) ~[google-cloud-monitoring-3.31.0.jar:3.31.0] at io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.prePopulateVerifiedDescriptors(StackdriverMeterRegistry.java:429) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createMetricDescriptorIfNecessary(StackdriverMeterRegistry.java:383) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createTimeSeries(StackdriverMeterRegistry.java:358) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createTimeSeries(StackdriverMeterRegistry.java:324) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry.createGauge(StackdriverMeterRegistry.java:227) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry.lambda$publish$1(StackdriverMeterRegistry.java:158) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.core.instrument.Meter.match(Meter.java:100) ~[micrometer-core-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry.lambda$publish$10(StackdriverMeterRegistry.java:158) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) ~[na:na] at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708) ~[na:na] at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na] at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na] at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[na:na] at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na] at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[na:na] at io.micrometer.stackdriver.StackdriverMeterRegistry.publish(StackdriverMeterRegistry.java:163) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.core.instrument.step.StepMeterRegistry.close(StepMeterRegistry.java:146) ~[micrometer-core-1.11.5.jar:1.11.5] at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na] at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na] at org.springframework.beans.factory.support.DisposableBeanAdapter.invokeCustomDestroyMethod(DisposableBeanAdapter.java:303) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:238) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:587) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:559) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(DefaultListableBeanFactory.java:1189) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:520) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingletons(DefaultListableBeanFactory.java:1182) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1118) ~[spring-context-6.0.13.jar:6.0.13] at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1084) ~[spring-context-6.0.13.jar:6.0.13] at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1035) ~[spring-context-6.0.13.jar:6.0.13] at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145) ~[spring-boot-3.1.5.jar:3.1.5] at java.base/java.lang.Iterable.forEach(Iterable.java:75) ~[na:na] at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114) ~[spring-boot-3.1.5.jar:3.1.5] at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na] Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57) ~[gax-2.37.0.jar:2.37.0] ... 35 common frames omitted

2023-11-13T10:21:10.486+01:00 WARN 25564 --- [ionShutdownHook] i.m.s.StackdriverMeterRegistry : failed to create metric descriptor in Stackdriver for meter MeterId{name='jvm.buffer.memory.used', tags=[tag(id=direct)]}

com.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: Channel shutdown invoked at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:112) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:41) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:86) ~[gax-grpc-2.37.0.jar:2.37.0] at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66) ~[gax-grpc-2.37.0.jar:2.37.0] at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97) ~[gax-grpc-2.37.0.jar:2.37.0] at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:84) ~[api-common-2.20.0.jar:2.20.0] at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1127) ~[guava-32.1.2-jre.jar:na] at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-32.1.2-jre.jar:na] at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286) ~[guava-32.1.2-jre.jar:na] at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055) ~[guava-32.1.2-jre.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807) ~[guava-32.1.2-jre.jar:na] at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:568) ~[grpc-stub-1.59.0.jar:1.59.0] at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:538) ~[grpc-stub-1.59.0.jar:1.59.0] at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-api-1.59.0.jar:1.59.0] at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-api-1.59.0.jar:1.59.0] at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-api-1.59.0.jar:1.59.0] at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:570) ~[gax-grpc-2.37.0.jar:2.37.0] at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:574) ~[grpc-core-1.59.0.jar:1.59.0] at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:72) ~[grpc-core-1.59.0.jar:1.59.0] at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:742) ~[grpc-core-1.59.0.jar:1.59.0] at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723) ~[grpc-core-1.59.0.jar:1.59.0] at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.59.0.jar:1.59.0] at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.59.0.jar:1.59.0] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na] at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na] Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57) ~[gax-2.37.0.jar:2.37.0] at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112) ~[gax-2.37.0.jar:2.37.0] at com.google.cloud.monitoring.v3.MetricServiceClient.createMetricDescriptor(MetricServiceClient.java:1127) ~[google-cloud-monitoring-3.31.0.jar:3.31.0] at io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createMetricDescriptorIfNecessary(StackdriverMeterRegistry.java:405) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createTimeSeries(StackdriverMeterRegistry.java:358) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createTimeSeries(StackdriverMeterRegistry.java:324) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry.createGauge(StackdriverMeterRegistry.java:227) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry.lambda$publish$1(StackdriverMeterRegistry.java:158) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.core.instrument.Meter.match(Meter.java:100) ~[micrometer-core-1.11.5.jar:1.11.5] at io.micrometer.stackdriver.StackdriverMeterRegistry.lambda$publish$10(StackdriverMeterRegistry.java:158) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) ~[na:na] at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708) ~[na:na] at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na] at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na] at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[na:na] at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na] at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[na:na] at io.micrometer.stackdriver.StackdriverMeterRegistry.publish(StackdriverMeterRegistry.java:163) ~[micrometer-registry-stackdriver-1.11.5.jar:1.11.5] at io.micrometer.core.instrument.step.StepMeterRegistry.close(StepMeterRegistry.java:146) ~[micrometer-core-1.11.5.jar:1.11.5] at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na] at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na] at org.springframework.beans.factory.support.DisposableBeanAdapter.invokeCustomDestroyMethod(DisposableBeanAdapter.java:303) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:238) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:587) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:559) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(DefaultListableBeanFactory.java:1189) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:520) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingletons(DefaultListableBeanFactory.java:1182) ~[spring-beans-6.0.13.jar:6.0.13] at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1118) ~[spring-context-6.0.13.jar:6.0.13] at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1084) ~[spring-context-6.0.13.jar:6.0.13] at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1035) ~[spring-context-6.0.13.jar:6.0.13] at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145) ~[spring-boot-3.1.5.jar:3.1.5] at java.base/java.lang.Iterable.forEach(Iterable.java:75) ~[na:na] at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114) ~[spring-boot-3.1.5.jar:3.1.5] ... 1 common frames omitted Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: Channel shutdown invoked at io.grpc.Status.asRuntimeException(Status.java:537) ~[grpc-api-1.59.0.jar:1.59.0] ... 14 common frames omitted


**Expected behavior**
* No exception during shutdown
* The metrics are successfully pushed during shutdown

**Additional context**
I have not debugged through the code. However, the io.micrometer.core.instrument.push.PushMeterRegistry#close seems to call stop(), publish(), close() when the context is closed. The io.micrometer.stackdriver.StackdriverMeterRegistry#stop calls client.shutdownNow() which I guess is not correct. Maybe this should be rather done in close() and not in stop(). Just a wild guess.
dimovelev commented 1 year ago

FYI - locally I changed the two methods and do not get this exception anymore (I get a different one - something with header size too large but it seems to be a different issue):

    @Override
    public void stop() { // or just remove me
        super.stop();
    }

    @Override
    public void close() {
        try {
            super.close();
        } finally {
            if (client != null) {
                client.close();
            }
        }
    }
jonatan-ivanov commented 12 months ago

Is this happening all the time or occasionally? If the latter, how frequently? We changed a few things in 1.11.6 (and 1.12.0), would you mind checking if it resolves the issue?

dimovelev commented 12 months ago

@jonatan-ivanov this happens only when the application is being shut down - e.g. a new revision is being deployed and all old revision instances are being shut down. The bigger impact is actually that it also happens in cloudrun jobs when they finish and as a result the metrics are not pushed when they finish.

dimovelev commented 12 months ago

@jonatan-ivanov i changed the micrometer.version maven property in the reproducer to use a different micrometer version (and verified that it is being used) and it still fails both with 1.11.6 and with 1.12.0

mieseprem commented 11 months ago

Hey there, it seems we are facing the same issue. To get more ideas of what changes lead to the observed behaviour I will write this comment.

We started to face this issue when we migrated from Springboot 2.7 to 3.1. Our workaround was (still is) to pin Micrometer to version 1.10.6 (all afterwards - including 1.10.7) shows all the error messages you described above. However, as of Springboot 3.2 the (ugly) Micrometer version pinning isn't working anyway. Thats why I was googleing around and stranded here.

def micrometer_version_pin = '1.10.6'
dependencies {
    // [...]

    implementation "io.micrometer:micrometer-commons:${micrometer_version_pin}!!"
    implementation "io.micrometer:micrometer-core:${micrometer_version_pin}!!"
    implementation "io.micrometer:micrometer-observation:${micrometer_version_pin}!!"
    implementation "io.micrometer:micrometer-registry-stackdriver:${micrometer_version_pin}!!"

   // [...]
}
shakuzen commented 11 months ago

Our workaround was (still is) to pin Micrometer to version 1.10.6 (all afterwards - including 1.10.7) shows all the error messages you described above.

Thank you for the details. That sounds like it was https://github.com/micrometer-metrics/micrometer/issues/3759, included in 1.10.7, that caused this to start happening, which makes sense. The proposed fix sounds reasonable. I'd like to see if there is any way we can effectively test the behavior locally with a stub of the stackdriver metrics service. That could help us avoid reintroducing this issue in the future and make sure it stays fixed.

dimovelev commented 11 months ago

@shakuzen I could not find an emulator or anything remotely helpful to integration test it :(

marcingrzejszczak commented 10 months ago

So for StackdriverMeterRegistry, wouldn't a change from

@Override
    public void stop() {
        if (client != null)
            client.shutdownNow();
        super.stop();
    }

to

@Override
    public void stop() {
        if (client != null && !client.isShutdown())
            client.shutdownNow();
        super.stop();
    }

fix the problem? I understand that we're closing an already closed client? We would need to check if the metrics are still being sent on stop. Still no luck with providing some stub for Stackdriver metrics server?

dimovelev commented 10 months ago

@marcingrzejszczak would that not mean that no metrics will be pushed upon shutdown? We do not want to get the exception, but we also want to push the metrics one last time upon shutdown.

marcingrzejszczak commented 10 months ago

From my understanding the suggested change would shut down the client only if it hasn't been already shut down but maybe I'm missing sth.

mieseprem commented 9 months ago

For me it looks like the most fundamental change is to move shutdown of the client from stop() to close() method. I don't know in which phase of application lifecycle the phases take place. But maybe this already does the trick.

shakuzen commented 9 months ago

Could folks affected by this issue please try with the latest snapshots to make sure the fix is working and not causing any other issues? Specifically, it's in 1.10.14-SNAPSHOT, 1.11.9-SNAPSHOT, and 1.12.3-SNAPSHOT available from the snapshot repo.

sebastiankonieczek commented 9 months ago

1.12.3-SNAPSHOT -> Fix seems to work properly, metrics are being published and client is shutdown gracefully afterwards:

2024-02-06T10:55:07.340+01:00 DEBUG 19503 --- [tp1973538135-41] i.m.s.StackdriverMeterRegistry           : successfully sent 9 TimeSeries to Stackdriver
2024-02-06T10:55:07.341+01:00 DEBUG 19503 --- [tp1973538135-41] com.google.api.gax.grpc.ChannelPool      : Initiating immediate shutdown due to explicit request
2024-02-06T10:55:07.347+01:00 DEBUG 19503 --- [tp1973538135-41] com.google.api.gax.grpc.ChannelPool      : Initiating graceful shutdown due to explicit request

No other issue recognized so far.

dimovelev commented 9 months ago

I was a bit slow to respond - we have this fix already rolled out to multiple applications and it works. Thank you for merging it