Closed srujann closed 5 months ago
There's a lot here. To confirm I understand, there's two problems:
Is that right?
Thats correct. In addition:
@srujann @ejona I also tried to run the helloworld example with request message having size more than 4MB and I just got the valid exceptions in server-side log (there was no unexpected exception) and also in the client side log I got a decent message
Jun 27, 2019 11:15:41 AM io.grpc.examples.helloworld.HelloWorldClient greet
INFO: Will try to greet ...
Jun 27, 2019 11:15:41 AM io.grpc.examples.helloworld.HelloWorldClient greet
WARNING: RPC failed: Status{code=CANCELLED, description=HTTP/2 error code: CANCEL
Received Rst Stream, cause=null}
But when the response message has size greater than 4MB, I also got the same exceptions.
@ejona86 Is this bug going to be fixed any time soon? I am using version 1.35.1 and still seeing the issue where the client is not receiving an informative message, in the case, the client sends a request exceeding 4 MB.
Server logs:
{"level":"WARNING","logNameSource":"io.grpc.netty.NettyServerStream","message":"Exception processing message","extendedMessage":"io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 6144592"}
Client logs:
Status{code=CANCELLED, description=RST_STREAM closed stream. HTTP/2 error code: CANCEL, cause=null}
Python server: fine
grpc-java v1.44.0 grpc-kotlin v1.2.1
OpenJDK 17.0.1 on macOS : all fine
OpenJDK 11.0.11 on Ubuntu:
io.grpc.StatusException: RESOURCE_EXHAUSTED: Received message larger than max (4516559 vs. 4194304)
at io.grpc.Status.asException(Status.java:550)
at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$1.onClose(ClientCalls.kt:295)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
Config:
val maxMsgBytes = DecimalByteUnit.MEGABYTES.toBytes(100)
val serviceConfig = mapOf(
"methodConfig" to listOf(
mapOf(
"name" to listOf(mapOf("service" to "yanic.Yanic")),
"retryPolicy" to mapOf(
"maxAttempts" to "20",
"initialBackoff" to "0.1s",
"maxBackoff" to "10s",
"backoffMultiplier" to "2",
"retryableStatusCodes" to listOf("UNAVAILABLE"),
),
"waitForReady" to true,
"maxRequestMessageBytes" to maxMsgBytes.toString(),
"maxResponseMessageBytes" to maxMsgBytes.toString(),
),
),
)
val chan =
Grpc.newChannelBuilderForAddress("localhost", port, InsecureChannelCredentials.create())
.enableRetry()
.defaultServiceConfig(serviceConfig)
.retryBufferSize(maxMsgBytes)
.maxInboundMessageSize(maxMsgBytes.toInt())
.maxInboundMetadataSize(maxMsgBytes.toInt())
.build()
val opts = CallOptions.DEFAULT
.withWaitForReady()
.withMaxInboundMessageSize(maxMsgBytes.toInt())
.withMaxOutboundMessageSize(maxMsgBytes.toInt())
I believe I have reproduced this issue on v1.64.0-SNAPSHOT (aka the tip of master
) and understand the issue, at least in the Netty-based gRPC implementation:
MessageDeframer
and propagated as an exceptionAbstractStream
, which passes the underlying Status
to a NettyServerStream.TransportState
(specifically the deframeFailed
method)CancelServerStreamCommand
which closes the stream without notifying the peer about the statusSo the client sees RST_STREAM with no context.
If we want the client to receive the status, I think we'll need to explicitly send trailers (which should also serve to close the stream). I think that https://github.com/ryanpbrewster/grpc-java/pull/1 works, but will clean it up and send it out for review.
The difficulty with fixing this is the deframer and framer are running on different threads, and there's no way for us to enqueue work to the framing thread. Cancellation/resets are already expected to be racy and handled cleanly. But if the deframer closes with trailers that 1) can trigger in the middle of sending a message which the receiver won't like (but that's probably okay), and 2) will require the netty handler to notice and throw away later data and trailers created by the framer.
To explain (1) a bit more, when serializing protobuf, the framer produces chunks of data containing 0-many messages. A large message is split into multiple chunks of data. Those are passed via a queue from the framing thread to the netty handler where they are sent. Any trailer triggered by the deframer will go into that same queue.
If the deframer enqueues a command to send trailers + close, my understanding is that:
making the receiver unhappy is probably okay, because the stream is going to be closed one way or another, and at least this way they get some context about why
Well, my point was right now you wouldn't, in certain cases. You will see an error about a truncated message. I still think that's okay, and is a step in the right direction. We just may need future work in the receiver to expose "an error in the trailers if the trailers had an error, otherwise generate an error."
if the stream is closed, won't the handler automatically notice & discard any future data
Once you get to a certain level, yes. But plenty of this code has never needed to deal with this so 1) we need to audit/be prepared for broken code and 2) consider how errors are handled.
Makes sense, and understood. I'm going to open a PR and link it to this issue. If you don't mind, I'll tag you there and we can discuss the specific implementation.
some unexpected exceptions are seen in the server log when a server response exceeds the max message size of the client channel or vice versa. Client log does show valid errors.
This issue is consistently reproducible with a hello world rpc example where request/response is a huge string with size greater than the default message size of 4MiB.
What version of gRPC are you using?
1.9.0
When server response exceeds max message size
Unexpected Exceptions in the server log:
Client log:
When client request exceeds max message size
Valid exception, followed by some unexpected exception in server log
Client log: