square / okhttp

Square’s meticulous HTTP client for the JVM, Android, and GraalVM.
https://square.github.io/okhttp/
Apache License 2.0
45.72k stars 9.15k forks source link

OkHttp periodically error No address associated with hostname #8200

Closed WilliamBankin closed 7 months ago

WilliamBankin commented 8 months ago

Hello,

I come to you to find out the progress of this problem which is really causing us a lot of problems currently with our application.

https://github.com/square/retrofit/issues/3915 https://github.com/square/okhttp/issues/7677

These topics have been closed and I have not found any other Thread on this discussion. We would like to know if you were able to find a solution to this connection problem, our ratings on the play stores are taking a hit.

Thanks for your feedback

yschimke commented 8 months ago

We need a reproduction, or at least more details, such as which devices/os versions it is happening on.

A similar error at https://github.com/TeamNewPipe/NewPipe/issues/8030 related to Samsung restrictions.

Are you using OkHttp 5.0.0-alpha.12, it's probably best if we agree on this for any further testing.

WilliamBankin commented 8 months ago

Thanks for your feedback. You will look to update the library so that you can have additional logs Here is a list of all users reporting the issue to us:

yschimke commented 8 months ago

I was hoping to get some clear signal out of it, like Samsung only, or Android 13.

Regarding the switch to Ktor with Android client working, it's quite possible that the platform HTTP stack is handling things in a different way.

But I believe the default implementation is a very early alternative fork version of OkHttp.

https://cs.android.com/android/platform/superproject/main/+/main:external/okhttp/repackaged/okhttp-urlconnection/src/main/java/com/android/okhttp/internal/huc/HttpURLConnectionImpl.java

calling

https://cs.android.com/android/platform/superproject/main/+/main:external/okhttp/repackaged/okhttp/src/main/java/com/android/okhttp/OkHttpClient.java

So it should be doing basically the same thing. I doubt they are installing Cronet for example.

But without some additional signal, it's hard to understand why InetAddress.getAllByName(hostname) fails for OkHttp but not for Android. It's the same platform operation.

So we still need some way to reproduce.

WilliamBankin commented 8 months ago

This bug is quite strange, we can't reproduce it either. This post someone tested with Ktor : https://github.com/square/retrofit/issues/3915#issuecomment-1826368533 and it works. I think we should also migrate to this new library

yschimke commented 8 months ago

I might write a logging event listener outputting all relevant info when it happens. Too late I guess if you are switching to URLConnection

prbprbprb commented 8 months ago

A stack trace would help too, if possible. I'm not very familiar with okhttp internals - the exception is being thrown from Inet6AddressImpl.lookupHostByName() but there are multiple public APIs that reach that and I'm also unfamiliar with okhttp internals to figure out where name resolution is happening and/or retried... Looks like mostly in Route.

The retry logic (if any) is of interest because it looks like negative results are unconditionally cached at the Java level, but with a TTL of only two seconds.

Anyway, I don't think there's any material difference between ye olde Android platform okhttp implementation and okhttp3 as used by retrofit, and if there were, we'd be seeing this from other, non-Retrofit okhttp3 users.

However, it's also very weird that Retrofit should be influencing DNS resolution in any way!

@WilliamBankin is it possible there was something in your app's setup code for Retrofit which may have affected this but isn't in your setup code for Ktor?

yschimke commented 8 months ago

https://github.com/square/okhttp/issues/7677#issue-1565399720

Caused by java.net.UnknownHostException: Unable to resolve host <HOST_NAME>: No address associated with hostname
at java.net.Inet6AddressImpl.lookupHostByName(Inet6AddressImpl.java:124)
at java.net.Inet6AddressImpl.lookupAllHostAddr(Inet6AddressImpl.java:103)
at java.net.InetAddress.getAllByName(InetAddress.java:1152)
at okhttp3.Dns$Companion$DnsSystem.lookup(Dns.kt:49)
at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.kt:164)
at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.kt:129)
at okhttp3.internal.connection.RouteSelector.next(RouteSelector.kt:71)
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:205)
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74)
at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
at java.lang.Thread.run(Thread.java:1012)
yschimke commented 8 months ago

@prbprbprb would an app making an attempt in background, affected by doze restrictions get this error. Which gets cached and then also fail if immediately brought to foreground and retried?

Basically I'm wondering if the difference is OkHttp triggering a different set and order of network operations, than URLConnection sees?

prbprbprb commented 8 months ago

I don't think so but not 100% sure. Looks like a possibility though, e.g. https://stackoverflow.com/questions/73774716/android-unknownhostexception-even-there-is-an-active-and-working-internet-connec - that looks like a roundabout way to clear the cache as there doesn't seem to be a public API that can reach InetAddress.clearAddressCache()

Still curious as to what Retrofit can be doing to trigger this though.

Also worth noting, Inet6AddressImpl.lookupHostByName() is a pretty thin wrapper around getaddrinfo(3) but that hides a lot of complexity. The underlying resolver is part of core networking which present a pretty standard interface at the C level, but in the background handles firewalling etc to isolate apps from each other.

JakeWharton commented 8 months ago

Retrofit is not doing anything here. It only interacts with OkHttp through the Call.Factory API which does not offer the ability to configure behavior. It quite simply issues requests and parses responses as if OkHttp is a black box. And you can actually replace OkHttp with any other HTTP client for use with Retrofit as a result.

prbprbprb commented 8 months ago

Retrofit is not doing anything here. It only interacts with OkHttp through the Call.Factory API which does not offer the ability to configure behavior. It quite simply issues requests and parses responses as if OkHttp is a black box. And you can actually replace OkHttp with any other HTTP client for use with Retrofit as a result.

Appreciate what you're saying but iff retrofit is the only common factor (still awaiting a reply above, e.g. about other things like app setup code) then something is going on, even if it's as esoteric as it's the trigger for an obscure platform caching bug.

yschimke commented 8 months ago

I think the difference is.

InetAddress.getAllByName via a) OkHttp and b) Android URLConnection.

Not retrofit. Retrofit is just where it's surfaced.

yschimke commented 8 months ago

Thanks for the https://stackoverflow.com/questions/73774716/android-unknownhostexception-even-there-is-an-active-and-working-internet-connec link.

Very interesting.

yschimke commented 8 months ago

OK, this gives me something to repro with a strict device. And a possible workaround, unfortunately with non-public APIs.

yschimke commented 8 months ago

I don't have a real repro with doze mode or background restrictions. But It's pretty easy to simulate something similar with airplane mode.

https://github.com/square/okhttp/pull/8217

Basically any app recently experiencing a failed network request, responding to a change that would now allow it to succeed, will likely immediately fail.

a) App listening to airplane mode being disabled and making network requests. b) An app under normal doze (https://developer.android.com/training/monitoring-device-state/doze-standby) or OEM restrictions (https://dontkillmyapp.com/) being removed, such as moving to foreground.

23:20:25.352  I  looping(isFlightModeOn = true) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:25.608  I  looping(isFlightModeOn = true) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:25.866  I  looping(isFlightModeOn = true) android.system.GaiException: android_getaddrinfo failed: EAI_NODATA (No address associated with hostname)
23:20:26.120  I  looping(isFlightModeOn = true) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:26.376  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:26.632  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:26.885  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:27.140  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:27.392  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:27.644  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:27.911  I  looping(isFlightModeOn = false) android.system.GaiException: android_getaddrinfo failed: EAI_NODATA (No address associated with hostname)
23:20:28.164  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:28.418  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:28.673  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:28.929  I  looping(isFlightModeOn = false) java.net.UnknownHostException: Unable to resolve host "www.google.com": No address associated with hostname
23:20:29.201  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:29.457  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:29.737  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:29.992  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:30.249  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:30.505  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:30.759  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:31.069  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:31.325  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:31.707  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100]
23:20:32.079  I  looping(isFlightModeOn = false) [www.google.com/216.58.201.100, www.google.com/2a00:1450:4009:820::2004]

This doesn't prove the problem, it probably takes some time when coming out of airplane mode. But proves I can detect the real problem or cached repeat.

At the moment, I'm inclined to set a bar for any real repros of this bug, that they must last for 2 seconds past when network connectivity is regained. Network calls can always fail. But it also leaves more room for OkHttp to be doing something observably different than URLConnection.

yschimke commented 7 months ago

I'm going to close this, it's an Android limitation, probably exacerbated by different OEM restrictions? Or app behaviour, such as network traffic in the background or when restricted.

Simplest fix, is retry after 2 seconds when you know you now have network connectivity, and the exception doesn't have a cause. The non cached failures have a root cause of android.system.GaiException.

Caused by: java.net.UnknownHostException: Unable to resolve host "google.invalid": No address associated with hostname
at java.net.Inet6AddressImpl.lookupHostByName(Inet6AddressImpl.java:156)
at java.net.Inet6AddressImpl.lookupAllHostAddr(Inet6AddressImpl.java:103)
at java.net.InetAddress.getAllByName(InetAddress.java:1152)
at okhttp3.Dns$Companion$DnsSystem.lookup(Dns.kt:50)
...
Caused by: android.system.GaiException: android_getaddrinfo failed: EAI_NODATA (No address associated with hostname)
at libcore.io.Linux.android_getaddrinfo(Native Method)
at libcore.io.ForwardingOs.android_getaddrinfo(ForwardingOs.java:133)
at libcore.io.BlockGuardOs.android_getaddrinfo(BlockGuardOs.java:222)
at libcore.io.ForwardingOs.android_getaddrinfo(ForwardingOs.java:133)
at java.net.Inet6AddressImpl.lookupHostByName(Inet6AddressImpl.java:135)

If you want to move this forward, you could try running this workaround Dns implementation and seeing if it helps.

https://github.com/square/okhttp/pull/8226/files

yschimke commented 7 months ago

Similar issue in Firebase

https://github.com/firebase/firebase-android-sdk/issues/2637

With fixes to avoid traffic while backgrounded, and then retry after foregrounding.

https://github.com/firebase/firebase-android-sdk/pull/2693 https://github.com/firebase/firebase-android-sdk/pull/2763

The use com.google.android.gms.common.api.internal.BackgroundDetector and ConnectivityManager