Closed loudywen closed 4 years ago
Scenario (has all CCM setup, has this sample for testing https://github.com/sophiaso/spring-cosmosdb-refresh/blob/master/src/main/java/com/example/cosmosdbrefresh/CosmosConfiguration.java ) steps:
update some wrong value for the URL in ccm (threads before refresh)
hit the application rest endpoint to trigger refresh
application will pick the new URL, refresh the connection and raise exception after some seconds
Caused by: reactor.core.Exceptions$ReactiveException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: sql-dev-azure-cosmosdb.documents.azure.com/104.215.113.77:4435
at reactor.core.Exceptions.propagate(Exceptions.java:336)
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91)
at reactor.core.publisher.Mono.block(Mono.java:1663)
at com.azure.data.cosmos.internal.RxDocumentClientImpl.initializeGatewayConfigurationReader(RxDocumentClientImpl.java:246)
at com.azure.data.cosmos.internal.RxDocumentClientImpl.init(RxDocumentClientImpl.java:265)
at com.azure.data.cosmos.internal.AsyncDocumentClient$Builder.build(AsyncDocumentClient.java:185)
at com.azure.data.cosmos.CosmosClient.<init>(CosmosClient.java:54)
at com.azure.data.cosmos.CosmosClientBuilder.build(CosmosClientBuilder.java:203)
at com.microsoft.azure.spring.data.cosmosdb.CosmosDbFactory.getCosmosClient(CosmosDbFactory.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:282)
at org.springframework.cloud.context.scope.GenericScope$LockedScopedProxyFactoryBean.invoke(GenericScope.java:494)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688)
at com.microsoft.azure.spring.data.cosmosdb.CosmosDbFactory$$EnhancerBySpringCGLIB$$f659ab01.getCosmosClient(<generated>)
at com.microsoft.azure.spring.data.cosmosdb.core.CosmosTemplate.<init>(CosmosTemplate.java:78)
at com.walmart.intech.armada.api.DatabaseConfiguration.documentDbTemplate(DatabaseConfiguration.java:52)
at com.walmart.intech.armada.api.DatabaseConfiguration$$EnhancerBySpringCGLIB$$cb8ee2a6.CGLIB$documentDbTemplate$1(<generated>)
at com.walmart.intech.armada.api.DatabaseConfiguration$$EnhancerBySpringCGLIB$$cb8ee2a6$$FastClassBySpringCGLIB$$e37f77bc.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244)
at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:363)
at com.walmart.intech.armada.api.DatabaseConfiguration$$EnhancerBySpringCGLIB$$cb8ee2a6.documentDbTemplate(<generated>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
... 113 common frames omitted
Suppressed: java.lang.Exception: #block terminated with an error
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:93)
... 141 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: sql-dev-azure-cosmosdb.documents.azure.com/104.215.113.77:4435
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoCreate] :
reactor.core.publisher.Mono.create(Mono.java:185)
reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator.connectChannel(PooledConnectionProvider.java:241)
Error has been observed at the following site(s):
|_ Mono.create ⇢ at reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator.connectChannel(PooledConnectionProvider.java:241)
|_ Mono.create ⇢ at reactor.netty.resources.PooledConnectionProvider.acquire(PooledConnectionProvider.java:135)
|_ Mono.create ⇢ at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:284)
|_ Mono.retry ⇢ at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:321)
|_ Mono.flatMapMany ⇢ at reactor.netty.http.client.HttpClientFinalizer.responseConnection(HttpClientFinalizer.java:85)
|_ Flux.single ⇢ at com.azure.data.cosmos.internal.http.ReactorNettyClient.send(ReactorNettyClient.java:106)
|_ Mono.flatMap ⇢ at com.azure.data.cosmos.internal.directconnectivity.HttpClientUtils.parseResponseAsync(HttpClientUtils.java:19)
|_ Mono.map ⇢ at com.azure.data.cosmos.internal.directconnectivity.GatewayServiceConfigurationReader.toDatabaseAccountObservable(GatewayServiceConfigurationReader.java:157)
|_ Mono.error ⇢ at com.azure.data.cosmos.internal.GlobalEndpointManager.lambda$getDatabaseAccountFromAnyLocationsAsync$2(GlobalEndpointManager.java:97)
|_ Mono.onErrorResume ⇢ at com.azure.data.cosmos.internal.GlobalEndpointManager.getDatabaseAccountFromAnyLocationsAsync(GlobalEndpointManager.java:93)
|_ Mono.doOnSuccess ⇢ at com.azure.data.cosmos.internal.directconnectivity.GatewayServiceConfigurationReader.initializeReaderAsync(GatewayServiceConfigurationReader.java:142)
Stack trace:
Caused by: java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:336)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:685)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
at this point new threads are created
now u can see parallel-6 7 8 and cosmos-rntbd-nio-7- are created and the old parallel-12345 and cosmos-rntbd-nio-2- still exist. and the parallel-12345 is taking the wrong URL as connection and it will keep trying to connect to the wrong URL after some period in the backend. But the controller is taking the correct URL without issue.
So, what is the way to kill those old threads before refresh?
Try closing the client properly using close()
API on CosmosClient
and re-opening it with new connection details when you are calling refreshScope
I did try it. but still, it looks like nothing happen. that cosmoclient is not killing the asyncDocumentClient
@EventListener(RefreshScopeRefreshedEvent.class)
public synchronized void onRefresh(RefreshScopeRefreshedEvent event) {
CosmosClient cc = (CosmosClient) this.applicationContext.getBean("cosmosClient");
cc.close();
}
I have this listener for closing the client. but still the parallel will still stay there and just keep create a new one for the new connection
@kushagraThapar can u please give any suggestions?
Is this the bug?
@loudywen - I don't think that should be a bug. However, there is a way to find out. Can you please enable debug logs for cosmos SDK and see the below logs when you close the client ?
GlobalEndpointManager closed.
@kushagraThapar I enabled the debug. but I dont see GlobalEndpointManager closed.
2020-08-11 19:31:57,620 INFO [http-nio-8090-exec-1] c.a.d.c.i.RxDocumentClientImpl <init> - Initializing DocumentClient with serviceEndpoint [https://xxxx:4435/], connectionPolicy [ConnectionPolicy{requestTimeoutInMillis=2000, mediaRequestTimeoutInMillis=300000, connectionMode=DIRECT, maxPoolSize=1000, idleConnectionTimeoutInMillis=5000, userAgentSuffix=';spring-data/2.2.4;', retryOptions=RetryOptions{maxRetryAttemptsOnThrottledRequests=0, maxRetryWaitTimeInSeconds=3}, enableEndpointDiscovery=true, preferredLocations=null, usingMultipleWriteLocations=false, inetSocketProxyAddress=null}], consistencyLevel [Session], directModeProtocol [Tcp]
2020-08-11 19:31:57,622 DEBUG [http-nio-8090-exec-1] c.a.d.c.i.GlobalEndpointManager startRefreshLocationTimerAsync - registering a refresh in [300000] ms
2020-08-11 19:31:57,636 DEBUG [parallel-6] c.a.d.c.i.GlobalEndpointManager lambda$startRefreshLocationTimerAsync$11 - startRefreshLocationTimerAsync() - Invoking refresh, I was registered on [2020-08-11T14:31:57.623]
2020-08-11 19:31:57,637 INFO [parallel-6] c.a.d.c.i.RxDocumentClientImpl getDatabaseAccountFromEndpoint - Getting database account endpoint from https://sql-dev-azure-cosmosdb.documents.azure.com:4435/
2020-08-11 19:32:18,749 ERROR [reactor-http-nio-8] c.a.d.c.internal.RxGatewayStoreModel lambda$toDocumentServiceResponse$2 - Network failure
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: xxx.com/xxx.215.113.xx:4435
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoCreate] :
reactor.core.publisher.Mono.create(Mono.java:185)
reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator.connectChannel(PooledConnectionProvider.java:241)
Error has been observed at the following site(s):
|_ Mono.create ⇢ at reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator.connectChannel(PooledConnectionProvider.java:241)
|_ Mono.create ⇢ at reactor.netty.resources.PooledConnectionProvider.acquire(PooledConnectionProvider.java:135)
|_ Mono.create ⇢ at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:284)
|_ Mono.retry ⇢ at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:321)
|_ Mono.flatMapMany ⇢ at reactor.netty.http.client.HttpClientFinalizer.responseConnection(HttpClientFinalizer.java:85)
|_ Flux.single ⇢ at com.azure.data.cosmos.internal.http.ReactorNettyClient.send(ReactorNettyClient.java:106)
|_ Mono.flatMap ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:246)
|_ Mono.map ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:283)
Stack trace:
Caused by: java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:336)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:685)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
2020-08-11 19:32:18,755 WARN [reactor-http-nio-8] c.a.d.c.i.WebExceptionRetryPolicy shouldRetry - Received retriable web exception, will retry
com.azure.data.cosmos.CosmosClientException: null
at com.azure.data.cosmos.BridgeInternal.createCosmosClientException(BridgeInternal.java:346)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoError] :
reactor.core.publisher.Mono.error(Mono.java:281)
com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$2(RxGatewayStoreModel.java:298)
Error has been observed at the following site(s):
|_ Mono.error ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$2(RxGatewayStoreModel.java:298)
|_ Mono.onErrorResume ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:284)
|_ Flux.single ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$invokeAsync$3(RxGatewayStoreModel.java:365)
|_ Mono.defer ⇢ at com.azure.data.cosmos.internal.BackoffRetryUtility.executeRetry(BackoffRetryUtility.java:31)
Stack trace:
at com.azure.data.cosmos.BridgeInternal.createCosmosClientException(BridgeInternal.java:346)
at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$2(RxGatewayStoreModel.java:296)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:88)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onError(FluxMapFuseable.java:134)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:165)
at reactor.core.publisher.MonoSingle$SingleSubscriber.onError(MonoSingle.java:141)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:197)
at reactor.core.publisher.FluxRetryPredicate$RetryPredicateSubscriber.onError(FluxRetryPredicate.java:101)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:183)
at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect$TcpClientSubscriber.onError(HttpClientConnect.java:345)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:183)
at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.onError(PooledConnectionProvider.java:492)
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.fail(AbstractPool.java:378)
at reactor.netty.internal.shaded.reactor.pool.SimplePool.lambda$drainLoop$7(SimplePool.java:203)
at reactor.core.publisher.LambdaMonoSubscriber.doError(LambdaMonoSubscriber.java:152)
at reactor.core.publisher.LambdaMonoSubscriber.onError(LambdaMonoSubscriber.java:147)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:183)
at reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator$PooledConnectionInitializer.operationComplete(PooledConnectionProvider.java:303)
at reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator$PooledConnectionInitializer.operationComplete(PooledConnectionProvider.java:254)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:493)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:472)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:685)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: xxx:4435
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoCreate] :
reactor.core.publisher.Mono.create(Mono.java:185)
reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator.connectChannel(PooledConnectionProvider.java:241)
Error has been observed at the following site(s):
|_ Mono.create ⇢ at reactor.netty.resources.PooledConnectionProvider$PooledConnectionAllocator.connectChannel(PooledConnectionProvider.java:241)
|_ Mono.create ⇢ at reactor.netty.resources.PooledConnectionProvider.acquire(PooledConnectionProvider.java:135)
|_ Mono.create ⇢ at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:284)
|_ Mono.retry ⇢ at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:321)
|_ Mono.flatMapMany ⇢ at reactor.netty.http.client.HttpClientFinalizer.responseConnection(HttpClientFinalizer.java:85)
|_ Flux.single ⇢ at com.azure.data.cosmos.internal.http.ReactorNettyClient.send(ReactorNettyClient.java:106)
|_ Mono.flatMap ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:246)
|_ Mono.map ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:283)
@kushagraThapar from the log u can see. there is no closed happen.
2020-08-11 19:34:32,360 DEBUG [reactor-http-nio-3] c.a.d.c.i.GlobalEndpointManager lambda$startRefreshLocationTimerAsync$10 - db account retrieved
2020-08-11 19:34:32,360 DEBUG [reactor-http-nio-3] c.a.d.c.i.GlobalEndpointManager lambda$refreshLocationPrivateAsync$9 - refreshLocationPrivateAsync() refreshing locations
2020-08-11 19:34:32,518 DEBUG [reactor-http-nio-3] c.a.d.c.i.GlobalEndpointManager lambda$refreshLocationPrivateAsync$9 - shouldRefreshEndpoints: false, nothing to do.
2020-08-11 19:34:32,669 DEBUG [http-nio-8090-exec-2] c.a.d.c.i.GlobalEndpointManager lambda$refreshLocationAsync$6 - refreshLocationAsync() invoked
2020-08-11 19:34:32,670 DEBUG [http-nio-8090-exec-2] c.a.d.c.i.GlobalEndpointManager lambda$refreshLocationAsync$6 - will refresh
2020-08-11 19:34:32,671 DEBUG [http-nio-8090-exec-2] c.a.d.c.i.GlobalEndpointManager lambda$refreshLocationPrivateAsync$9 - refreshLocationPrivateAsync() refreshing locations
2020-08-11 19:34:32,672 DEBUG [http-nio-8090-exec-2] c.a.d.c.i.GlobalEndpointManager lambda$refreshLocationPrivateAsync$9 - shouldRefreshEndpoints: false, nothing to do.
here is the log when I gave the correct URL. so during the refresScope trigger, there no closed happen! @kushagraThapar
2020-08-11 19:43:22,035 DEBUG [parallel-3] c.a.d.c.i.GlobalEndpointManager lambda$startRefreshLocationTimerAsync$11 - startRefreshLocationTimerAsync() - Invoking refresh, I was registered on [2020-08-11T14:38:22.027]
2020-08-11 19:43:22,035 INFO [parallel-3] c.a.d.c.i.RxDocumentClientImpl getDatabaseAccountFromEndpoint - Getting database account endpoint from https://xxxx:4435/
@kushagraThapar see this. after some period of time. some of the paraller-x thread will pick up the previous wrong value and try to connect...
I mean, I guess there is something that triggers the RxDocumentClientImpl
to init the connection with the wrong value. but I have no idea who is triggering it. This issue can be solved if I am able to close the previous cosmos threads. Please help,.
@loudywen - thanks for attaching the logs, however, I am not seeing any logs emitting for even closing cosmos client.
That indicates that your onRefresh
API is not getting called.
If it would have been getting called, you should see logs something like this:
10:45:56.700 [reactor-http-nio-3] INFO c.a.c.i.RxDocumentClientImpl - Shutting down ...
10:45:56.700 [reactor-http-nio-3] INFO c.a.c.i.RxDocumentClientImpl - Closing Global Endpoint Manager ...
10:45:56.701 [reactor-http-nio-3] INFO c.a.c.i.RxDocumentClientImpl - Closing StoreClientFactory ...
10:45:56.703 [reactor-http-nio-3] INFO c.a.c.i.RxDocumentClientImpl - Shutting down reactorHttpClient ...
10:45:56.710 [reactor-http-nio-3] INFO c.a.c.i.RxDocumentClientImpl - Shutting down completed.
@kushagraThapar, I am not explicitly calling the refresh endpoint.
like I said I just have POJO annotated with @ConfigurationProperties
and also the configuration class https://github.com/sophiaso/spring-cosmosdb-refresh/blob/master/src/main/java/com/example/cosmosdbrefresh/CosmosConfiguration.java ) .
and use @RefreshScope
once the CCM updated, I just let the Spring handle the rest of the stuff. I didn't hit any refresh
endpoint at all.
can you provide me an example of how to trigger the refresh programmatically?
2020-08-11 20:36:52,056 INFO [scm-timer-thread-1] c.a.d.c.i.RxDocumentClientImpl close - Shutting down ...
2020-08-11 20:36:52,056 DEBUG [scm-timer-thread-1] c.a.d.c.i.GlobalEndpointManager close - GlobalEndpointManager closed.
I call this within the EnvironmentChangeEvent and I am able to see above two log only. @kushagraThapar
CosmosClient cosmosClient = (CosmosClient) this.applicationContext.getBean("cosmosClient");
cosmosClient.close();;
startRefreshLocationTimerAsync keep triggering the previous value.
2020-08-11 21:09:00,257 DEBUG [parallel-8] c.a.d.c.i.GlobalEndpointManager lambda$startRefreshLocationTimerAsync$11 - startRefreshLocationTimerAsync() - Invoking refresh, I was registered on [2020-08-11T16:04:00.254]
@kushagraThapar can u try from your end and see if the connection (threads) able to close/kill. I still think there is something wrong here.
@loudywen - sure, I will try it out and will let you know if I see these threads hanging around.
@loudywen - meanwhile, can you please provide full set of logs ?
@kushagraThapar here u go spring-boot-logger.log
again the scenario I am testing is:
I just don't understand why those previous values still exist and picked up by some of those parallel-x threads. my assumption is. if the cosmos able to shut down the old thread pool correctly and recreate a new one during refresh, this issue should be solved.
@loudywen - thanks for providing the detailed logs, I looked at them and I can't find where actually the client got closed and reinstantiated.
Since spring creates a bean initially for CosmosClient
, I think when you update the URL through your application, the existing CosmosClient
is not getting closed. I am still not sure how you have hooked up the refresh in your application - but what I am trying to say is that you will have to close the client manually and recreate it from scratch with new URL and key.
That being said, you will have to find out how spring is managing the CosmosClient
bean, and how you can reinstantiate that bean.
Since this repository is of @sophiaso, I guess he will have much more information on this.
@sophiaso - can you please help us here ? The problem is old client is not getting closed - which is leaving some old threads behind.
@kushagraThapar @sophiaso ya, close is not happening even I call the refresh (manually). that is why I said something is wrong here. @sophiaso if you run your sample and do the same scenario I did. I think u should see the same issue.
The code that @sophiaso shows in his repo should be the correct way to closes the bean and re-instantiate the bean and I don't see any issue of it
Also, the other way I tried to close the cosmosClient is just directly calling the cosmosclient.close()
after my application started, still, none of the thread pool is being closed.
@kushagraThapar Have you try to test it on your side?
@loudywen - I tested this issue with the sample code for spring-data-cosmosdb repo and I couldn't repro the issue at all. I see that closing the client actually closes GlobalEndpointManager
and there are no threads hanging around related to that.
However, I am speculating the issue with your application is that there are multiple clients and not all of them are getting closed properly in your application. I would suggest making sure all the clients are closed properly on your end. Please debug the application on your end and see if the clients are getting initialized and then are getting closed properly.
Like in the above error logs that you shared, there were no close
calls happening. I think that is the major issue.
@kushagraThapar thanks for the suggestion. But I do understand what you suggest and I do follow the steps that you asked to do. But the issue still exists no matter what.
By the way, what version are you using for test? I am using:
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>spring-data-cosmosdb</artifactId>
<version>2.2.4</version>
</dependency>
@loudywen I tested this with spring-data-cosmosdb 2.3.0 and with the new version of azure-spring-data-cosmos that will be released today - 3.0.0-beta.1 And I couldn't repro the issue..
@kushagraThapar so in the future, should I use spring-data-cosmosdb 2.3.0 or azure-spring-data-cosmos
I can see that the azure-spring-data-cosmos rewrote the AbstractCosmosConfiguration
completely and look more reasonable than the one in spring-data-cosmosdb
@loudywen - new -> azure-spring-data-cosmos
will be released today. You should use the new one going forward and let's also move this issue to the new version of the SDK, where it will much easier to debug the issue and fix it.
For your reference, here are the docs on the new version - https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/cosmos/azure-spring-data-cosmos-core/README.md And here is the changelog : https://github.com/Azure/azure-sdk-for-java/pull/14187
This will be released today and you should be able to use it.
@loudywen closing this issue here, as we are tracking it in the new repo.
Hi, I would like to know is there any ways I can kill or stop the cosmos DB programmatically? I am trying the use @RefreshScope to refresh the cosmos DB connection. But what I am seeing is after the refreshScope triggered. Spring is able to refresh the cosmos DB connection (new parallel-* threads are created) and inject it into other components. However, the old cosmosDB threads still running in the background and it will keep trying to connect to the old URL.
So what I am trying to do is have a listener to listen to the refresh event, once the method is called, I want to kill all the cosmos DB connection. But I have no idea what I should use to stop the cosmos DB instance before it reinitial by Spring