square / okhttp

Square’s meticulous HTTP client for the JVM, Android, and GraalVM.
https://square.github.io/okhttp/
Apache License 2.0
45.81k stars 9.16k forks source link

SocketTimeoutException occurs all the time after an SocketTimeoutException occurs. #4981

Closed KaiXuan666 closed 4 years ago

KaiXuan666 commented 5 years ago

must reconnect to wifi or restart your app to get back to normal after a problem occurs.

swankjesse commented 5 years ago

Executable test case?

KaiXuan666 commented 5 years ago

This has nothing to do with the code. We tested different projects of multiple companies that integrate okhttp (they are completely different packages), and switched a lot of okhttp versions, all of them will have this problem, the steps are to run the app, put a few minutes , click the button to access the network. When a problem occurs, only one of the most commonly used okhttpClients in the project does not work. When there are multiple okhttpClients in a project, other okhttpClients can still access.

swankjesse commented 5 years ago

What are our next steps? We could offer a service to evict the connection pool when the network changes?

yschimke commented 5 years ago

@swankjesse if your code is android aware you can listen to networks coming and going and know whether to drop a certain set e.g. WiFi stops or 4g drops because of lack of traffic. Or did you mean just drop all?

Sent with GitHawk

KaiXuan666 commented 5 years ago

We have encountered the same problem in a project before, and finally solved it like this:

if (e instanceof SocketTimeoutException){ OkHttp.getInstance().okHttpClient.dispatcher().cancelAll(); OkHttp.getInstance().okHttpClient.connectionPool().evictAll(); } But there is a downside to this, the app must fail once and the next visit will succeed.

Based on the results of our current tests, this may be a bug in the okhttp connection pool module, which is very easy to appear on a particular machine. The step is to open the application, wait a few minutes, and then trigger the event to access the network. To be sure, the network on the machine is normal at this time, other applications can access the network normally, and a new okhttpClient can be created in the current application to access the network.

yschimke commented 5 years ago

I have a test app that shows the network availability events https://github.com/yschimke/okhttp-testapp/blob/master/app/src/main/kotlin/com/squareup/okhttptestapp/network/NetworkListener.kt

It can be that app always has some connectivity but WiFi and cellular come and go independently

Sent with GitHawk

swankjesse commented 5 years ago

It'd be Really Cool if Android closed sockets when the corresponding radios went away. Really seems like we're the wrong layer. But it's still better for us to address than every single application.

yschimke commented 5 years ago

@swankjesse I think by this point, if you care about the behaviour, you probably also care about things like fetching video only when on Wifi etc. Plus the proxy to use will differ for each network also. So you might want to control for this etc.

KaiXuan666 commented 5 years ago

I recorded the video for this phenomenon, clicked the update button, used a normal okhttpClient, clicked the confirmation button to use another new okhttpClient, they all request the same server, the server is available. The problem is that the connection of the first okhttpClient all times out at this time, the access is many times SocketTimeoutException, but the second okhttpClient can still be used. https://www.bilibili.com/video/av50721356/

KaiXuan666 commented 5 years ago

I have a test app that shows the network availability events https://github.com/yschimke/okhttp-testapp/blob/master/app/src/main/kotlin/com/squareup/okhttptestapp/network/NetworkListener.kt

It can be that app always has some connectivity but WiFi and cellular come and go independently

Sent with GitHawk

I added this listener, but when I have a problem, I didn't receive a response event and I didn't print the log.

yschimke commented 5 years ago

Can you show how you registered the listener?

KaiXuan666 commented 5 years ago

Can you show how you registered the listener?

0

I registered like this, switching wifi can see the output information. But the problem we encountered was not to switch networks, but the connections in the connection pool did not work?

KaiXuan666 commented 5 years ago

出现问题的机器是Android 7.1.2

KaiXuan666 commented 5 years ago

image

cosminstefanxp commented 5 years ago

I believe we have also stumbled upon this in production. We can't replicate it, but we've encountered a couple of situations in remote logs that can only be explained by this behaviour.

crossle commented 5 years ago

https://github.com/square/okhttp/issues/5186 should be the same problem

crossle commented 5 years ago

Use the same Okhttp Instance like use Dagger @Singleton for Okhttp, it'll easy reproduce

yschimke commented 5 years ago

I’ve been able to reproduce this on an emulator. But not on a modern Nokia 7.1. So I suspect it is some phone or older android versions. Using pings or force closing sockets definitely workarounds this. But also any stats on which phones or android versions are affected may help.

crossle commented 5 years ago

Some config on my project, set ping not resolve

    builder.connectTimeout(10, TimeUnit.SECONDS)
    builder.writeTimeout(10, TimeUnit.SECONDS)
    builder.readTimeout(10, TimeUnit.SECONDS)
    builder.pingInterval(15, TimeUnit.SECONDS)
    builder.retryOnConnectionFailure(false)
yschimke commented 5 years ago

That’s high compared to timeouts. What if you allow retries on connection failures and have a 2 second ping?

crossle commented 5 years ago

Same problem, if first time SocketTimeoutException, it will always SocketTimeoutException

shijia7 commented 5 years ago

+1

MrYang12 commented 5 years ago

+1

z-chu commented 5 years ago

+1

yschimke commented 5 years ago

Most helpful at this moment, more than thoughts and prayers would be either

1) stats by android os version, phone models, network types (wifi only) etc. 2) You can test manually whether actively responding to network switch events clears this problem e.g. https://github.com/yschimke/OkHttpAndroidApp

The downside to the 2nd option above is that if you have large unresumable uploads/downloads, the optimal thing might be to wait up to 30 seconds to see if the socket actually recovers.

swankjesse commented 5 years ago

@KaiXuan666 please give OkHttp 4.2.0 or newer a try. I suspect this fixed this problem. The fix is also backported to 3.14.3 and 3.12.5.

crossle commented 5 years ago

websocket over http also have the problem

vulpeszerda commented 4 years ago

I updated my okhttp client to 3.12.6 but socket timeout is still occurring

crossle commented 4 years ago

Any suggestions for the timeout?

swankjesse commented 4 years ago

Fixed in 4.3.

amitav13 commented 4 years ago

@swankjesse could the fix be backported to 3.x?

yschimke commented 4 years ago

For clarity, what do you mean by 3.x?

3.14.x which is Android 5.0+ (API level 21+) and Java 8+? or 3.12.x with supports older android phones as well?

amitav13 commented 4 years ago

3.12.x with supports older android phones as well

Sorry this is what I meant

Easy-Ez commented 4 years ago

@swankjesse @yschimke hey,dude! Did this fix backported to 3.12.x?

swankjesse commented 4 years ago

not yet dude, though I think we should do it bro

yschimke commented 4 years ago

not yet dude, though I think we should do it bro

The dude abides...

pbaiyy commented 3 years ago

could you please merge it into 3.12.x?

yschimke commented 3 years ago

@swankjesse which fix was this in 4.3.0? I was guessing this one, but not clear since that is TaskRunner specific. https://github.com/square/okhttp/commit/2a65bc7bf9c38d59c00408fe1aa2def3b542f47d

lzanzotto commented 3 years ago

@swankjesse has it been backported to 3.12.1? It could be useful because ksoap2-android uses that version