retaildevcrews / helium-java

MIT License
2 stars 5 forks source link

Closed Connection Error Occurring #202

Closed jofultz closed 4 years ago

jofultz commented 4 years ago

Description

Over time with smoker running we have received a closed connection error that causes cascade failure. We need to investigate the cause and remediation for the error events.

Expected Behavior

Run continuously without error.

We will likely need to build a catch and reconnect behaviour for connection closed errors.

Dependency

Reproduce

  1. Run Helium Java
  2. Run Smoker
  3. Wait for an unknown about of time.
  4. See error

Additional Context

References: #253

Error log info:

2020-04-22T14:29:38.480396399Z 2020-04-22 14:29:34.034 [reactor-http-epoll-4] INFO  c.m.cse.helium.app.dao.ActorsDao.getActorById (39) - Call to getActorById (nm0000751)

2020-04-22T14:29:55.800365825Z 2020-04-22 14:29:42.512 [reactor-http-epoll-1] WARN  r.n.http.client.HttpClientConnect.warn (299) - [id: 0x4450c32d, L:/172.16.4.7:51940 ! R:heliume2e-cosmos-centralus.documents.azure.com/52.173.148.217:443] The connection observed an error
2020-04-22T14:29:55.800411127Z reactor.netty.http.client.PrematureCloseException: Connection has been closed BEFORE response, while sending request body

2020-04-22T14:32:36.607516910Z 2020-04-22 14:31:07.144 [reactor-http-epoll-1] ERROR c.a.d.c.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$2 (295) - Network failure
2020-04-22T14:32:36.607560311Z reactor.netty.http.client.PrematureCloseException: Connection has been closed BEFORE response, while sending request body

2020-04-22T14:32:47.527074997Z 2020-04-22 14:30:58.092 [cosmos-rntbd-nio-2-2] WARN  c.a.d.c.i.d.r.RntbdClientChannelPool.lambda$operationComplete$0 (573) - Channel([id: 0x27143d0c, L:0.0.0.0/0.0.0.0:55840 ! R:dz5prdddc03-docdb-1.documents.azure.com/13.89.41.245:11506]) health check request failed due to:
2020-04-22T14:32:47.527117999Z java.nio.channels.ClosedChannelException: null
2020-04-22T14:32:47.527125499Z                  at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdRequestManager.channelUnregistered(Unknown Source)

2020-04-22T14:33:41.767464938Z 2020-04-22 14:33:38.554 [reactor-http-epoll-4] INFO  c.m.c.h.a.c.MoviesController.getMovie (46) - getMovie (movieId=tt0808510)

2020-04-22T14:33:53.823000096Z 2020-04-22 14:31:25.658 [reactor-http-epoll-3] INFO  c.m.c.h.a.c.ActorsController.getActor (49) - getActor (actorId=nm0035060)

2020-04-22T14:34:05.452909249Z 2020-04-22 14:33:53.822 [reactor-http-epoll-3] INFO  c.m.cse.helium.app.dao.ActorsDao.getActorById (39) - Call to getActorById (nm0035060)
2020-04-22T14:34:08.210787822Z 2020-04-22 14:34:03.130 [cosmos-rntbd-nio-2-2] WARN  i.n.util.concurrent.DefaultPromise.notifyListener0 (580) - An exception was thrown by com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool$AcquireListener$$Lambda$1259/410588397.operationComplete()
2020-04-22T14:34:08.210832123Z java.lang.IllegalStateException: null
2020-04-22T14:34:08.210839924Z                  at com.google.common.base.Preconditions.checkState(Preconditions.java:488)
2020-04-22T14:34:08.210845424Z                  at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool.decrementAndRunTaskQueue(RntbdClientChannelPool.java:422)
2020-04-22T14:34:08.210850724Z                  at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool.access$600(RntbdClientChannelPool.java:45)
2020-04-22T14:34:08.210855724Z                  at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool$AcquireListener.fail(RntbdClientChannelPool.java:588)
2020-04-22T14:34:08.210860624Z                  at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdClientChannelPool$AcquireListener.lambda$operationComplete$0(RntbdClientChannelPool.java:574)
2020-04-22T14:34:08.210878925Z                  at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
2020-04-22T14:34:08.210884225Z                  at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
2020-04-22T14:34:08.210889325Z                  at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
2020-04-22T14:34:08.210893825Z                  at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
2020-04-22T14:34:08.210898325Z                  at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
2020-04-22T14:34:08.210903225Z                  at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
2020-04-22T14:34:08.210907625Z                  at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
2020-04-22T14:34:08.210912026Z                  at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
2020-04-22T14:34:08.210916526Z                  at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
2020-04-22T14:34:08.210921026Z                  at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
2020-04-22T14:34:08.210925526Z                  at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndCompleteAll(AbstractCoalescingBufferQueue.java:340)
2020-04-22T14:34:08.210931126Z                  at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndFailAll(AbstractCoalescingBufferQueue.java:207)
2020-04-22T14:34:08.210935726Z                  at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdRequestManager.completeAllPendingRequestsExceptionally(RntbdRequestManager.java:598)
2020-04-22T14:34:08.210940226Z                  at com.azure.data.cosmos.internal.directconnectivity.rntbd.RntbdRequestManager.channelUnregistered(RntbdRequestManager.java:250)
2020-04-22T14:34:08.210944726Z                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:196)
2020-04-22T14:34:08.210949526Z                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:182)
2020-04-22T14:34:08.210954227Z                  at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:175)
2020-04-22T14:34:08.210958727Z                  at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelUnregistered(CombinedChannelDuplexHandler.java:406)
2020-04-22T14:34:08.210963127Z                  at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:57)
2020-04-22T14:34:08.210967727Z                  at io.netty.channel.CombinedChannelDuplexHandler.channelUnregistered(CombinedChannelDuplexHandler.java:201)
2020-04-22T14:34:08.210972127Z                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:196)
2020-04-22T14:34:08.210976627Z                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:182)
2020-04-22T14:34:08.210984827Z                  at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:175)
2020-04-22T14:34:08.210989428Z                  at io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1388)
2020-04-22T14:34:08.210993928Z                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:196)
2020-04-22T14:34:08.210998428Z                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:182)
2020-04-22T14:34:08.211002928Z                  at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:821)
2020-04-22T14:34:08.211007328Z                  at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
2020-04-22T14:34:08.211011728Z                  at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
2020-04-22T14:34:08.211016528Z                  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
2020-04-22T14:34:08.211021128Z                  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
2020-04-22T14:34:08.211025328Z                  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
2020-04-22T14:34:08.211029929Z                  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
2020-04-22T14:34:08.211034229Z                  at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2020-04-22T14:34:08.211038729Z                  at java.lang.Thread.run(Thread.java:748)
jofultz commented 4 years ago

I sent a note to Kushagra to see if this is something that is familiar to him.

jofultz commented 4 years ago

Moving to in-progress since the related issue (253) is already in progress. An update to the SDK is now expected 15 July 2020.

jofultz commented 4 years ago

From Kushagra's note on issue 10440 it looks like we may move to the spring-data-cosmosdb v2.3.0 to resolve the issue.