Over time with smoker running we have received a closed connection error that causes cascade failure. We need to investigate the cause and remediation for the error events.
Expected Behavior
Run continuously without error.
We will likely need to build a catch and reconnect behaviour for connection closed errors.
2020-04-22T14:29:38.480396399Z 2020-04-22 14:29:34.034 [reactor-http-epoll-4] INFO c.m.cse.helium.app.dao.ActorsDao.getActorById (39) - Call to getActorById (nm0000751)
2020-04-22T14:29:55.800365825Z 2020-04-22 14:29:42.512 [reactor-http-epoll-1] WARN r.n.http.client.HttpClientConnect.warn (299) - [id: 0x4450c32d, L:/172.16.4.7:51940 ! R:heliume2e-cosmos-centralus.documents.azure.com/52.173.148.217:443] The connection observed an error
2020-04-22T14:29:55.800411127Z reactor.netty.http.client.PrematureCloseException: Connection has been closed BEFORE response, while sending request body
2020-04-22T14:32:36.607516910Z 2020-04-22 14:31:07.144 [reactor-http-epoll-1] ERROR c.a.d.c.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$2 (295) - Network failure
2020-04-22T14:32:36.607560311Z reactor.netty.http.client.PrematureCloseException: Connection has been closed BEFORE response, while sending request body
2020-04-22T14:32:47.527074997Z 2020-04-22 14:30:58.092 [cosmos-rntbd-nio-2-2] WARN c.a.d.c.i.d.r.RntbdClientChannelPool.lambda$operationComplete$0 (573) - Channel([id: 0x27143d0c, L:0.0.0.0/0.0.0.0:55840 ! R:dz5prdddc03-docdb-1.documents.azure.com/13.89.41.245:11506]) health check request failed due to:
2020-04-22T14:32:47.527117999Z java.nio.channels.ClosedChannelException: null
2020-04-22T14:32:47.527125499Z at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdRequestManager.channelUnregistered(Unknown Source)
2020-04-22T14:33:41.767464938Z 2020-04-22 14:33:38.554 [reactor-http-epoll-4] INFO c.m.c.h.a.c.MoviesController.getMovie (46) - getMovie (movieId=tt0808510)
2020-04-22T14:33:53.823000096Z 2020-04-22 14:31:25.658 [reactor-http-epoll-3] INFO c.m.c.h.a.c.ActorsController.getActor (49) - getActor (actorId=nm0035060)
2020-04-22T14:34:05.452909249Z 2020-04-22 14:33:53.822 [reactor-http-epoll-3] INFO c.m.cse.helium.app.dao.ActorsDao.getActorById (39) - Call to getActorById (nm0035060)
2020-04-22T14:34:08.210787822Z 2020-04-22 14:34:03.130 [cosmos-rntbd-nio-2-2] WARN i.n.util.concurrent.DefaultPromise.notifyListener0 (580) - An exception was thrown by com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool$AcquireListener$$Lambda$1259/410588397.operationComplete()
2020-04-22T14:34:08.210832123Z java.lang.IllegalStateException: null
2020-04-22T14:34:08.210839924Z at com.google.common.base.Preconditions.checkState(Preconditions.java:488)
2020-04-22T14:34:08.210845424Z at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool.decrementAndRunTaskQueue(RntbdClientChannelPool.java:422)
2020-04-22T14:34:08.210850724Z at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool.access$600(RntbdClientChannelPool.java:45)
2020-04-22T14:34:08.210855724Z at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool$AcquireListener.fail(RntbdClientChannelPool.java:588)
2020-04-22T14:34:08.210860624Z at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool$AcquireListener.lambda$operationComplete$0(RntbdClientChannelPool.java:574)
2020-04-22T14:34:08.210878925Z at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
2020-04-22T14:34:08.210884225Z at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
2020-04-22T14:34:08.210889325Z at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
2020-04-22T14:34:08.210893825Z at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
2020-04-22T14:34:08.210898325Z at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
2020-04-22T14:34:08.210903225Z at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
2020-04-22T14:34:08.210907625Z at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
2020-04-22T14:34:08.210912026Z at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
2020-04-22T14:34:08.210916526Z at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
2020-04-22T14:34:08.210921026Z at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
2020-04-22T14:34:08.210925526Z at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndCompleteAll(AbstractCoalescingBufferQueue.java:340)
2020-04-22T14:34:08.210931126Z at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndFailAll(AbstractCoalescingBufferQueue.java:207)
2020-04-22T14:34:08.210935726Z at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdRequestManager.completeAllPendingRequestsExceptionally(RntbdRequestManager.java:598)
2020-04-22T14:34:08.210940226Z at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdRequestManager.channelUnregistered(RntbdRequestManager.java:250)
2020-04-22T14:34:08.210944726Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:196)
2020-04-22T14:34:08.210949526Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:182)
2020-04-22T14:34:08.210954227Z at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:175)
2020-04-22T14:34:08.210958727Z at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelUnregistered(CombinedChannelDuplexHandler.java:406)
2020-04-22T14:34:08.210963127Z at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:57)
2020-04-22T14:34:08.210967727Z at io.netty.channel.CombinedChannelDuplexHandler.channelUnregistered(CombinedChannelDuplexHandler.java:201)
2020-04-22T14:34:08.210972127Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:196)
2020-04-22T14:34:08.210976627Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:182)
2020-04-22T14:34:08.210984827Z at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:175)
2020-04-22T14:34:08.210989428Z at io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1388)
2020-04-22T14:34:08.210993928Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:196)
2020-04-22T14:34:08.210998428Z at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:182)
2020-04-22T14:34:08.211002928Z at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:821)
2020-04-22T14:34:08.211007328Z at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
2020-04-22T14:34:08.211011728Z at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
2020-04-22T14:34:08.211016528Z at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
2020-04-22T14:34:08.211021128Z at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
2020-04-22T14:34:08.211025328Z at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
2020-04-22T14:34:08.211029929Z at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
2020-04-22T14:34:08.211034229Z at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2020-04-22T14:34:08.211038729Z at java.lang.Thread.run(Thread.java:748)
Description
Over time with smoker running we have received a closed connection error that causes cascade failure. We need to investigate the cause and remediation for the error events.
Expected Behavior
Run continuously without error.
We will likely need to build a catch and reconnect behaviour for connection closed errors.
Dependency
Reproduce
Additional Context
References: #253
Error log info: