square / okhttp

Square’s meticulous HTTP client for the JVM, Android, and GraalVM.
https://square.github.io/okhttp/
Apache License 2.0
45.85k stars 9.16k forks source link

java.net.SocketTimeoutException from HTTP/2 connection leaves dead okhttp clients in pool #3146

Closed kenyee closed 1 year ago

kenyee commented 7 years ago

Tried writing a unit test w/ TestButler on Android w/ no luck, so I'll write up the steps to reproduce this and include some sample code. This happens if you connect to an HTTP/2 server and your network goes down while the okhttp client is connected to it: 1) create an okhttp client 2) tell it to read from the HTTP/2 server 3) bring the network down 4) tell it to read from the HTTP/2 server (it'll get a SocketTimeoutException) 5) bring the network back up 6) tell it to read from the HTTP/2 server again (it'll be stuck w/ SocketTimeoutExceptions) 7) if you create new http clients at this point, it'll work, but the dead http client will eventually come back in the pool and fail.

okhttp client should attempt to reopen the HTTP/2 connection instead of being stuck in this state

Code sample for Android (create a trivial view w/ a button and a textview):

public class MainActivity extends AppCompatActivity {
    OkHttpClient okhttpClient = new OkHttpClient();

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        Button loadButton = (Button) findViewById(R.id.loadButton);
        TextView outputView = (TextView) findViewById(R.id.outputView);

        loadButton.setOnClickListener(view -> Observable.fromCallable(() -> {
                    Request request = new Request.Builder()
                            .url(<INSERT URL TO YOUR HTTP/2 SERVER HERE>)
                            .build();

                    Response response = okhttpClient.newCall(request).execute();

                    return response.body().string();
                })
                .subscribeOn(Schedulers.io())
                .observeOn(AndroidSchedulers.mainThread())
                .subscribe(outputView::setText, t -> outputView.setText(t.toString()))
        );
    }
}
kenyee commented 7 years ago

FYI, we found a workaround...set the connectionPool in the builder so it uses a new connection pool w/ a size of zero and also turn off HTTP/2 support by setting a new protocolList in the builder with only HTTP/1.1 support.

swankjesse commented 7 years ago

You’re using 3.6.0?

kenyee commented 7 years ago

yep...3.6.0 unfortunately. Thought about rolling back to pre-http/2 support but that would mean 2.2 which is too far back because of all the okhttp3 dependencies :-(

swankjesse commented 7 years ago

Oh that's terrible. We've had problems with similar failures before but I thought we'd fixed ’em all. If you can make a test case that'd be handy, otherwise I'll try to look soon.

In the interim you can disable HTTP/2 with the protocols list in the OkHttpClient.Builder.

dave-r12 commented 7 years ago

Correct me if I'm wrong, but I think part of this is working as expected. HTTP/2 connections can carry N outstanding requests. If one of those requests times out and the HTTP/2 connection is closed, then the other N - 1 requests are also lost. I think the intent is that for HTTP/2 connections, a timeout does not necessarily mean the connection is bad.

Is it surprising to 'bring the network down' and not receive any sort of socket exception reading or writing?

kenyee commented 7 years ago

N-1 requests being lost is fine if the connection is down. The issue is that it doesn't recover when you bring the network back up...i.e., the broken idle connection objects are in the pool stay there and when you try connecting again, you can't connect until the user kills off your app to restart everything...

kenyee commented 7 years ago

@swankjesse : I couldn't figure out how to write a test for this because making all the sockets disconnected was happening at at an OS level. Tried to write and Android Test Butler one (to flip the network switch on/off on an Android emulator) but the current version of that has issues and probably wouldn't work in this code base :-)

swankjesse commented 7 years ago

So our attempts to write to the socket are failing silently? Might need to steal the automatic pings that we added for web sockets.

kenyee commented 7 years ago

Essentially...not that they're failing silently, but they're dead sockets and they're stuck in the pool. We traced through a bit of the code and saw some code that was pulling the a dead socket out of the pool each time it tried to use one which should have cleared things up after 5 dead sockets were pulled out but the network layer still appeared stuck unless we purged the pool w/ evictAll() or waited for the 5 min eviction timeout. Wasn't obvious what a proper fix was... HTTP/2 essentially behaves like web sockets so you're probably on the right track...

laurencedawson commented 7 years ago

Pretty sure this issue is another manifestation of this one:

https://github.com/square/okhttp/issues/3118

kenyee commented 7 years ago

I'm sure it's not. We don't see any SSL Handshake exceptions.

This bug is actually probably two bugs because we had to disable the connection pool and the HTTP/2 support. #3118 might be affected by the connection pool bug (it doesn't clear the broken idle connection objects in the pool).

laurencedawson commented 7 years ago

I've seen what you've described but then also the ssl exceptions. Same steps to reproduce as you outlined.

aheliver commented 7 years ago

any updates? I've same issue

kenyee commented 7 years ago

The workaround I described works in our QA testing so far :-)

aheliver commented 7 years ago

@kenyee setting new pool works, but I wonder when an update will arrive?

servlette commented 7 years ago

Is this issue resolved? I ran into the same issue using 3.5.0. I am using OkHttp to send push to Apple http/2. Yesterday I had this issue resulting in almost 80k push messages not getting delivered.

Caused by: java.net.SocketTimeoutException: timeout
    at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:587) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:595) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http2.Http2Stream.getResponseHeaders(Http2Stream.java:140) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:115) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:54) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[okhttp-3.5.0.jar:?]
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179) ~[okhttp-3.5.0.jar:?]
    at okhttp3.RealCall.execute(RealCall.java:63) ~[okhttp-3.5.0.jar:?]

After I got this error, none of my other requests succeeded.

Code:

KeyStore ks = KeyStore.getInstance("PKCS12");
       ks.load(new ByteArrayInputStream("/foo/bar/mycert"), password.toCharArray());

       KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm());
       kmf.init(ks, password.toCharArray());
       KeyManager[] keyManagers = kmf.getKeyManagers();
       SSLContext sslContext = SSLContext.getInstance("TLS");

       final TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
       tmf.init((KeyStore) null);
       sslContext.init(keyManagers, tmf.getTrustManagers(), null);

       TrustManager[] trustManagers = tmf.getTrustManagers();
       if (trustManagers != null && (trustManagers.length != 1 || !(trustManagers[0] instanceof X509TrustManager))) {
           throw new IllegalStateException("Unexpected default trust managers:"
                   + Arrays.toString(trustManagers));
       }
       final X509TrustManager trustManager = (X509TrustManager) trustManagers[0];
       final SSLSocketFactory sslSocketFactory = sslContext.getSocketFactory();

       OkHttpClient.Builder builder = new OkHttpClient.Builder();
       builder.connectTimeout(5, TimeUnit.SECONDS).writeTimeout(10, TimeUnit.SECONDS).readTimeout(10, TimeUnit.SECONDS);
       builder.connectionPool(new ConnectionPool(3, 10, TimeUnit.MINUTES));
       builder.sslSocketFactory(sslSocketFactory, trustManager);

           Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort));
           builder.proxy(proxy);  

       OkHttpClient client = builder.build();
servlette commented 7 years ago

As socket timeout exception is an instance of IO exception, I am not sure if the following approach will work. Can one of you pls get back to me?

I am calling evictAll() in the catch block of IOException.

try {
          response = client.newCall(request).execute();
          statusCode = response.code();
          responseBody = response.body().string();
      } catch (IOException ioe) {
          client.connectionPool().evictAll();
      } finally {
          if (response != null) {
              response.body().close();
          }

      }

Also how do we check if a connection is stale or not?

With Apache HttpClient, there is a way to do it to set a flag for checking stale connections. Wondering how OkHttp3 checks for it internally before it uses the connection.

CloseableHttpClient client = HttpClients.custom().setDefaultRequestConfig(
    RequestConfig.custom().setStaleConnectionCheckEnabled(true).build()
).setConnectionManager(connManager).build();
b95505017 commented 7 years ago

Any updates? I have the same issue too. :(

teobaranga commented 7 years ago

Same issue here!

ekstro commented 7 years ago

We still experiencing the same issue :-(

jpearl commented 7 years ago

I think i'm seeing another manifestation of this on 3.5.0, when the server forcibly closes the connection.

We try to establish both a h2 and http1.1 connection. The server responds with 200 to both:

06-26 15:07:55.286 22094 22380 I okhttp3.OkHttpClient: --> GET<url> http/1.1
06-26 15:07:55.524 22094 22380 I okhttp3.OkHttpClient: --> GET<url> h2

06-26 15:07:55.596 22094 22380 I okhttp3.OkHttpClient: <-- 200  <url> (71ms)
06-26 15:07:55.597 22094 22380 I okhttp3.OkHttpClient: <-- 200  <url> (303ms)

Then at some point we try to read from the http2 connection, which fails in checkNotClosed and throws a StreamResetException

06-26 15:06:01.560 22094 22126 I MyProject: Caused by: okhttp3.internal.http2.StreamResetException: stream was reset: PROTOCOL_ERROR
06-26 15:06:01.560 22094 22126 I MyProject:     at okhttp3.internal.http2.Http2Stream$FramedDataSource.checkNotClosed(Http2Stream.java:428)
06-26 15:06:01.560 22094 22126 I MyProject:     at okhttp3.internal.http2.Http2Stream$FramedDataSource.read(Http2Stream.java:330)
06-26 15:06:01.560 22094 22126 I MyProject:     at okio.ForwardingSource.read(ForwardingSource.java:35)
06-26 15:06:01.560 22094 22126 I MyProject:     at okio.RealBufferedSource$1.read(RealBufferedSource.java:409)
06-26 15:06:01.560 22094 22126 I MyProject:     at com.google.android.exoplayer.upstream.HttpDataSource.read(HttpDataSourceImpl.java:699)
06-26 15:06:01.560 22094 22126 I MyProject:     at com.google.android.exoplayer.upstream.HttpDataSource.read(HttpDataSourceImpl.java:424)

Then, since this is media, we do something that causes a seek to 0 in the media, which needs to reopen the request from the beginning. At this point, we see the same exception as is posted above:

06-26 15:08:39.387 22094 22126 I MyProject: Caused by: java.net.SocketTimeoutException: timeout
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:587)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:595)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http2.Http2Stream.getResponseHeaders(Http2Stream.java:140)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:115)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:54)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:212)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:212)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179)
06-26 15:08:39.387 22094 22126 I MyProject:     at okhttp3.RealCall.execute(RealCall.java:63)

this seems to be very similar to the other cases here, which seem to all be related to an ungraceful shutdown of the connection, and it remaining pooled.

I've also confirmed that disabling the ConnectionPool "works around" this issue:

OkHttpClient.Builder clientBuilder = new OkHttpClient.Builder()
            .connectTimeout(connectTimeoutMillis, TimeUnit.MILLISECONDS)
            .retryOnConnectionFailure(true)
            .readTimeout(readTimeoutMillis, TimeUnit.MILLISECONDS).connectionPool(new ConnectionPool(0, 1, TimeUnit.NANOSECONDS));
Kiran89kumar commented 7 years ago

is thr any update on this issue?

justinuang commented 7 years ago

Same issue here

natez0r commented 7 years ago

any idea if/when this will be fixed? We're seeing the same issue.

alessandrojp commented 7 years ago

@jpearl I can confirm that disabling ConnectionPool stops getting StreamResetException when it uses HTTP/2.

I'm also using ExoPlayer with OkHttp and in my case, it was happening when my app goes to background. Even if I turn off the Battery Optimizations it was being closed after few minutes in background getting SocketTimeoutException when the player tried to play the next track.

I was thinking to use DefaultHttpDataSource for the requests in the ExoPlayer because it also works without throwing SocketTimeoutException, but by disabling the ConnectionPool for me would better at this moment. I'll keep it disabled for now until I find a better solution.

OkHttp: 3.8.1
Model: Huawei P9 Lite
OS: 7.1.2

Thanks for sharing this!

natez0r commented 7 years ago

@alessandrojp do you know if the ExoPlayer team is aware of this issue? We've run into it only with exoplayer as well.

CoryCharlton commented 7 years ago

I'm also seeing SocketTimeout / dead client issues with OkHttp and Exoplayer.

RohitD07 commented 7 years ago

evict all connection from connection pool,resolve sockettimeout exception if(throwable instanceof SocketTimeoutException){ okHttpClient.connectionPool().evictAll(); }

samyakjain commented 6 years ago

I am also facing similar issue. When my app goes in background and if internet goes off and comes back and after if i come back to the app, nothing loads and even request doesn't go and i get timeout. After that all requests behave same. Please help me in solving this issue.

For now i have put a hack by reading the above thread is that whenever i get IOException i evict all connections from connection pool. This solves the problem but this happens at least once and user sees retry/reload screen.

NitroXenon commented 6 years ago

@swankjesse Any news on this?

natez0r commented 6 years ago

We have implemented the described a workaround, but it still causes us to hit the exceptions, which is not ideal. Would love to hear if this will be addressed soon

muhammadsr commented 6 years ago

Any updates for a fix?

auror commented 6 years ago

It would be great to get a fix for this. Any release date?

swankjesse commented 6 years ago

Will implement with this: https://github.com/square/okhttp/issues/3261

Danil4ever commented 6 years ago

Try THIS:

ConnectionSpec spec = new ConnectionSpec.Builder(ConnectionSpec.COMPATIBLE_TLS) .tlsVersions(TlsVersion.TLS_1_1) .build();

gatesn commented 6 years ago

@swankjesse could you clarify if this is fixed? The issue you link to references a couple of merged PRs, but it's not obvious if they directly address this issue or not.

kanongil commented 6 years ago

It is not fixed as far as I can tell.

kenyee commented 6 years ago

When I checked the fix, it seemed to leave the dead connection in the pool until you try using it which causes an exception to propagate into your code. So after the network comes back up, the user would get a bunch of "could not connect" errors in our app (because it didn't handle retries properly for this case) and then it would work again. We decided to just leave the pool size at zero for now until we can rewrite parts of the code to handle this case. But at least the zombies don't stay stuck in the pool....

williams98 commented 6 years ago

okhttp: 3.11 SocketTimeoutException is not fixed still its appears

kenyee commented 6 years ago

it clears out on the second call. It looks like what happens is the pool gets zombie connections. Next time you grab one of the zombies out of the pool, it throws that exception but is removed. The original bug was that the zombies got stuck in the pool. That said, this isn't great behavior either, so we've just left the pool size at zero...

c0dehunter commented 6 years ago

I can confirm this is still an issue.

swankjesse commented 6 years ago

@c0dehunter try setting a ping interval on your OkHttpClient?

https://square.github.io/okhttp/3.x/okhttp/okhttp3/OkHttpClient.Builder.html#pingInterval-long-java.util.concurrent.TimeUnit-

c0dehunter commented 6 years ago

@swankjesse Thanks for quick response. We tried setting pingInterval(1, TimeUnit.SECOND) and it seems it is behaving properly now. I don't want to say it's fixed yet as we need to do more testing, but will report back after a few days.

ahulyk commented 6 years ago

I also can confirm that that issue is reproducing on okhttp v3.11.0 Looks like setting protocols(listOf(Protocol.HTTP_1_1)) should fix it for now

c0dehunter commented 6 years ago

I'm just reporting back and my findings contradict others, so I am really not sure anymore. After updating to the latest version, we have not been able to reproduce this issue

roysG commented 6 years ago

I also reproduce this issue, please found a fix, any suggestion:

Kyson commented 6 years ago

tracking this issue...

Timothy-Duch commented 6 years ago

Tracking this issue as well. Here is my client in Android, still failing even after attempting the suggestions that worked for others:

@Provides
@Singleton
    static OkHttpClient provideOkHttpClient() {
        OkHttpClient.Builder client = new OkHttpClient.Builder()
                .protocols(Arrays.asList(Protocol.HTTP_1_1))
                .connectionPool(new ConnectionPool(0, 1,
                        TimeUnit.NANOSECONDS));
        if (!EnvironmentConstants.IS_PRODUCTION) {
            CurlLoggingInterceptor curlLoggingInterceptor = new CurlLoggingInterceptor();
            curlLoggingInterceptor.setCurlOptions("-i");
            client.addNetworkInterceptor(curlLoggingInterceptor)
                    .addInterceptor(chain -> {
                        Request request = chain.request().newBuilder()
                                .addHeader("Accept", "application/json").build();
                        return chain.proceed(request);
                    });
        }
        return client.build();
    }

It is simple to reproduce: Successfully complete POST request. Turn on airplane mode. Attempt POST request and fail. Turn off airplane mode. Attempt POST request. That fails with:

08-06 21:42:02.321 25955-25955/tech.duchess.luminawallet.staging.debug E/SendPresenter: Transaction failed java.net.SocketTimeoutException: timeout . . . 08-06 21:42:02.323 25955-25955/tech.duchess.luminawallet.staging.debug E/SendPresenter: Caused by: java.net.SocketException: socket is closed at com.android.org.conscrypt.ConscryptFileDescriptorSocket$SSLInputStream.read(ConscryptFileDescriptorSocket.java:546)

All subsequent transactions continue to fail with the same issue. Restarting my application works fine, presumably due to the new OkHttp client.

Forgot to mention, I'm also on 3.11.0

Caij commented 6 years ago

Which version can be fixed?

tucomel commented 6 years ago

You dont need to disable connectionPool, just insert inside your BroadcastReceiver when the network changes the following code

BroadcastReceiver networkStateReceiver = new BroadcastReceiver() {

 @Override
 public void onReceive(Context context, Intent intent) {
     final ConnectivityManager connectivityManager = (ConnectivityManager) context.getSystemService(Context.CONNECTIVITY_SERVICE);
     final NetworkInfo activeNetInfo = connectivityManager.getActiveNetworkInfo();
     if (activeNetInfo != null) {
          //clear here the offline pool when got online
          getInstance().okClient.connectionPool().evictAll(); 
     }
 }
 };
 IntentFilter filter = new IntentFilter(ConnectivityManager.CONNECTIVITY_ACTION);
 ApplicationLoader.appContext.registerReceiver(networkStateReceiver, filter);