square / wire

gRPC and protocol buffers for Android, Kotlin, Swift and Java.
https://square.github.io/wire/
Apache License 2.0
4.23k stars 571 forks source link

java.lang.IllegalStateException: too early; can't read the trailers yet #3049

Closed YuriiKot closed 3 days ago

YuriiKot commented 1 month ago

Hello!

We have an Android app with huge amount of users and getting crash in Android app

java.lang.IllegalStateException: too early; can't read the trailers yet
        at okhttp3.internal.http2.Http2Stream.trailers(Http2Stream:163)
        at okhttp3.internal.http2.Http2ExchangeCodec.trailers(Http2ExchangeCodec:118)
        at okhttp3.internal.connection.Exchange.trailers(Exchange:140)
        at okhttp3.Response.trailers(Response:180)
        at com.squareup.wire.internal.GrpcKt.grpcResponseToException(GrpcKt:195)
        at com.squareup.wire.internal.GrpcKt$readFromResponseBodyCallback$1$onResponse$1.invokeSuspend(GrpcKt:109)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(BaseContinuationImpl:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask:108)
        at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoopImplBase:280)
        at kotlinx.coroutines.BlockingCoroutine.joinBlocking(BlockingCoroutine:85)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(BuildersKt__BuildersKt:59)
        at kotlinx.coroutines.BuildersKt.runBlocking(BuildersKt:1)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(BuildersKt__BuildersKt:38)
        at kotlinx.coroutines.BuildersKt.runBlocking$default(BuildersKt:1)
        at com.squareup.wire.internal.GrpcKt$readFromResponseBodyCallback$1.onResponse(GrpcKt:98)
        at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall:519)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
        at java.lang.Thread.run(Thread.java:1012)

Related thread trace

Thread 2919 - OkHttp https://api.xxx.com/... - (WAITING)
        at java.lang.Object.wait(Object.java:-2)
        at java.lang.Object.wait(Object.java:386)
        at java.lang.Object.wait(Object.java:524)
        at okhttp3.internal.http2.Http2Stream.waitForIo$okhttp(Http2Stream:714)
        at okhttp3.internal.http2.Http2Stream$FramingSource.read(Http2Stream:376)
        at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange:281)
        at okio.RealBufferedSource.exhausted(RealBufferedSource:200)
        at com.squareup.wire.internal.GrpcMessageSource.read(GrpcMessageSource:37)
        at com.squareup.wire.internal.GrpcKt$readFromResponseBodyCallback$1$onResponse$1.invokeSuspend(GrpcKt:104)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(BaseContinuationImpl:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask:108)
        at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoopImplBase:280)
        at kotlinx.coroutines.BlockingCoroutine.joinBlocking(BlockingCoroutine:85)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(BuildersKt__BuildersKt:59)
        at kotlinx.coroutines.BuildersKt.runBlocking(BuildersKt:1)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(BuildersKt__BuildersKt:38)
        at kotlinx.coroutines.BuildersKt.runBlocking$default(BuildersKt:1)
        at com.squareup.wire.internal.GrpcKt$readFromResponseBodyCallback$1.onResponse(GrpcKt:98)
        at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall:519)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
        at java.lang.Thread.run(Thread.java:1012)

Most of the time happens in background after activity stop or resume and trying to establish connection

Steps to reproduce:

  1. Start bidi
  2. Throw IOException here
  3. Throw IllegalStateException here

It's rare edge case for prod that you need to have those 2 exceptions, but still around 0.002%

oldergod commented 1 month ago

Thanks for reporting. I fail to see the steps you list in the stacktrace you give us; for instance the stacktrace has grpcResponseToException once but you say it's called twice.

It could be that OkHttp throws on reading, and that after OkHttp threw, Wire tries to read the trailers and this fails.

YuriiKot commented 1 month ago

@oldergod You are right, updated steps to reproduce. I guess IOException is really thrown during read. New steps give similar stacktrace

oldergod commented 1 month ago

Is there no trace from OkHttp's read? And your trace isn't analyzable by the IDE right now, are you modifying it, or is it how the play console shows it?

YuriiKot commented 1 month ago

This trace is from Bugsnag reported thread. I see trace from another thread though


Object.java:-2 java.lang.Object.wait
Object.java:386 java.lang.Object.wait
Object.java:524 java.lang.Object.wait
Http2Stream:714 okhttp3.internal.http2.Http2Stream.waitForIo$okhttp
Http2Stream:376 okhttp3.internal.http2.Http2Stream$FramingSource.read
Exchange:281 okhttp3.internal.connection.Exchange$ResponseBodySource.read
RealBufferedSource:265 okio.RealBufferedSource.read
RealBufferedSource:76 okio.RealBufferedSource.read
OkHttpNetwork:189 io.bitdrift.capture.network.okhttp.OkHttpNetwork$StreamState.consumeResponse
OkHttpNetwork:97 io.bitdrift.capture.network.okhttp.OkHttpNetwork$StreamState.access$consumeResponse
OkHttpNetwork:138 io.bitdrift.capture.network.okhttp.OkHttpNetwork$StreamState$2$onResponse$1.run
ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor.runWorker
ThreadPoolExecutor.java:644 java.util.concurrent.ThreadPoolExecutor$Worker.run
Thread.java:1012 java.lang.Thread.run
oldergod commented 1 month ago

Got it. I recommend using the "raw" version of those when sharing with others.

image

YuriiKot commented 1 month ago

Thanks, good call. I also found another thread with proper trace and updated the question accordingly. In previous messages I guess it was bitdrift logger or something

YuriiKot commented 2 weeks ago

@oldergod Do you have any recommendations or a workaround to address the issue?

oldergod commented 1 week ago

I think OkHttp throws on reading and Wire is wrong in reading the trailers if it cannot. If you don't know what the problem is in the reading (or if the cause is elsewhere), I don't see much. I'll try to update Wire so that we don't read the trailers if we are not to.

YuriiKot commented 1 week ago

Thanks. I don't see reading problem. Our current setup works in prod for 10M+ users. This might be some edge case, but still appears to be the top 1 crash

oldergod commented 3 days ago

Closed with #3087 hopefully. Will be in next release happening this week.