grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.45k stars 3.85k forks source link

Improve handling of stream ID exhaustion #1809

Open gorset opened 8 years ago

gorset commented 8 years ago

HTTP/2 spec limits the number of stream identifiers to 2^30, which means a channel must create a new transport once the limit is reached.

A couple of challenges with the current implementation:

The implication here is that a channel can become unavailable for a unbounded amount of time. Here's a couple of possible improvements:

  1. Forcefully cancel active RPCs so that the transport can shutdown quickly and completely so that a new transport can be opened. This is somewhat consistent with the current behavior where RPCs fail due to Stream ID exhaustion, but is still suboptimal with the same set of problems the current implementation has with failing RPCs. Given that a client already has to deal with various error situations, this could be a good enough solution.
  2. Start new transport without waiting for the current transpot to shut down. This has a worst case scenario of creating a transport per long lived stream per ~1 billion requests, which is perhaps rarely enough given that a few thousands requests per second should last a few days.

See background discussion on https://groups.google.com/d/msg/grpc-io/GfeL3lse6lM/PZ-cy8qkAwAJ

ejona86 commented 8 years ago

Start new transport without waiting for the current transpot to shut down.

This is the way we'll probably go. There's already other cases we do the same, such as when we receive GOAWAY.

ejona86 commented 8 years ago

I've split out #1819, because the fix is trivial. I'm leaving this one open to reduce how many RPCs fail when we do the swap-over. I'd be strongly tempted to just swap over pre-maturely, such as after 2^29 RPCs (or 2^30 - some hard-coded number of RPCs*2).

gorset commented 8 years ago

Thanks, sounds good!

ejona86 commented 8 years ago

Actually, this race would be fixed once we have retry support, since we'll automatically retry rejected streams (streams that we know have not been processed on a server). That's normally done with GOAWAY, but this would apply as well.

wudan3551 commented 5 years ago

so anyone may tell me the specific solution is ? @ejona86 my application print error log like :

io.grpc.StatusRuntimeException: UNAVAILABLE: Stream IDs have been exhausted
    at io.grpc.Status.asRuntimeException(Status.java:526)
    at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482)
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.closed(ClientCallImpl.java:588)
    at io.grpc.internal.ForwardingClientStreamListener.closed(ForwardingClientStreamListener.java:39)
    at io.grpc.internal.InternalSubchannel$CallTracingTransport$1$1.closed(InternalSubchannel.java:716)
    at io.grpc.internal.AbstractClientStream$TransportState.closeListener(AbstractClientStream.java:452)
    at io.grpc.internal.AbstractClientStream$TransportState.access$400(AbstractClientStream.java:212)
    at io.grpc.internal.AbstractClientStream$TransportState$1.run(AbstractClientStream.java:435)
    at io.grpc.internal.AbstractClientStream$TransportState.deframerClosed(AbstractClientStream.java:273)
    at io.grpc.internal.Http2ClientStreamTransportState.deframerClosed(Http2ClientStreamTransportState.java:31)
    at io.grpc.internal.MessageDeframer.close(MessageDeframer.java:229)
    at io.grpc.internal.AbstractStream$TransportState.closeDeframer(AbstractStream.java:181)
    at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:438)
    at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.createStream(NettyClientHandler.java:490)
    at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.write(NettyClientHandler.java:300)
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
    at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.write(DefaultChannelPipeline.java:1061)
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannel.write(AbstractChannel.java:295)
    at io.grpc.netty.shaded.io.grpc.netty.WriteQueue$AbstractQueuedCommand.run(WriteQueue.java:174)
    at io.grpc.netty.shaded.io.grpc.netty.WriteQueue.flush(WriteQueue.java:112)
    at io.grpc.netty.shaded.io.grpc.netty.WriteQueue.access$000(WriteQueue.java:32)
    at io.grpc.netty.shaded.io.grpc.netty.WriteQueue$1.run(WriteQueue.java:44)
    at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
    at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
ejona86 commented 5 years ago

@wudan3551, that failure is what this issue is about. When we run out of stream ids we lose a few RPCs when we do a swap-over. It should only be a few though.