square / okhttp

Square’s meticulous HTTP client for the JVM, Android, and GraalVM.
https://square.github.io/okhttp/
Apache License 2.0
45.88k stars 9.16k forks source link

No callbacks: no successful callbacks and no failed callbacks #8387

Open Alexmego opened 6 months ago

Alexmego commented 6 months ago

No Reproducible! No Reproducible! No Reproducible

Environment:

okhttp version : 4.9.3 & the application protocol is Http2.0. okhttp timeout configure

截屏2024-04-26 上午10 35 45

Describe:

After the OkHttp network request is sent out, there were no callbacks, no successful callbacks, and no failed callbacks. The probability of this occurrence is approximately 0.5%.

Process:

Below is the complete event sequence printed by the EventListener for two requests.

OkHttpPrint.txt

loginfo:

9 results - 2 files

~/Downloads/A_log_0311_20240424_170747: 803283: 04-24 17:07:32.949266 3489 3489 I XX_APP_OkHttpPrint: callStart: https://xx.cn/api/search/haha/v1/rgeo?centerPoint=40.05594%2C116.26804 803349: 04-24 17:07:32.964995 3489 16482 I XX_APP_OkHttpPrint: connectStart: 805021: 04-24 17:07:32.983984 3489 16482 I XX_APP_OkHttpPrint: secureConnectStart:

~/Downloads/A_log_0312_20240424_170809: 709118: 04-24 17:07:51.892152 3489 16482 I XX_APP_OkHttpPrint: dnsStart: 709626: 04-24 17:07:51.894966 3489 16482 I XX_APP_OkHttpPrint: dnsEnd: 3 709657: 04-24 17:07:51.895477 3489 16482 I XX_APP_OkHttpPrint: connectStart: 713087: 04-24 17:07:51.929039 3489 16482 I XX_APP_OkHttpPrint: secureConnectStart: 719294: 04-24 17:07:51.967419 3489 16482 I XX_APP_OkHttpPrint: connectFailed: 719313: 04-24 17:07:51.967720 3489 16482 I XX_APP_OkHttpPrint: callFailed: https://xx.cn/api/search/haha/v1/rgeo?centerPoint=40.05607%2C116.26761

log.txt

Note:

Occasional, low-probability, cannot provide a reproducible program.

No Reproducible

yschimke commented 6 months ago

There is not enough to go on. Try upgrading to 4.12 or even 5 alphas, and see if it still happens.

But closing as we can't help with this without a repro.

Maybe try a site like stack overflow.

swankjesse commented 6 months ago

Seeing secureConnectStart() but no secureConnectEnd() or connectFailed() tells us a lot about where it’s getting stuck. I wonder if the TLS handshake is broken in a way that’s causing them to not return.

Could you capture a thread dump of a process that has demonstrated this behavior? That’d be amazing to diagnose this further.

Alexmego commented 6 months ago

Seeing secureConnectStart() but no secureConnectEnd() or connectFailed() tells us a lot about where it’s getting stuck. I wonder if the TLS handshake is broken in a way that’s causing them to not return.

Could you capture a thread dump of a process that has demonstrated this behavior? That’d be amazing to diagnose this further.

In a release environment, the probability is too low to effectively capture the thread state and actively attempt to reproduce it, but it will not occur.

However, in this case, I can be certain that the timeout is invalid, there are no callbacks, and no Exception information. If a network request is initiated again, everything is normal again.

In my app, there are over 40000 requests, and 10-20 of them are in this situation. Although the impact is not significant, QA has caught abnormal phenomena, and R&D needs to provide a reasonable explanation for this. Currently, I have not made effective progress.

Endeavour233 commented 6 months ago

secureConnectStart

The reason why connectTimeout does not work is that the time consumed in TLS handshake is not counted into connection time in the current implementation. And because the connection is not successfully established, readTimeout and writeTimeout don't take effect. Can you try to set callTimeout? I think your eventlistener would observe a canceled event then(though there was still no any callback - because when it's timeout, we just cancel the call, which would not trigger a CancelException in the case when the timeout is caused by a never-return method).

// RealCall.kt
  private val timeout =
    object : AsyncTimeout() {
      override fun timedOut() {
        this@RealCall.cancel()
      }
    }.apply {
      timeout(client.callTimeoutMillis.toLong(), MILLISECONDS)
    }