I've setup a SOCKS proxy using Tor and trying to scrape a Tor onion site.
Proxy proxy=new Proxy(Proxy.Type.SOCKS, new InetSocketAddress("127.0.0.1", 7150));
Jsoup.connect("http://uj3wazyk5u4hnvtk.onion").proxy(proxy).get();
It works fine on Desktop (x64 Linux). But causes java.net.UnknownHostException: Unable to resolve host "uj3wazyk5u4hnvtk.onion": No address associated with hostname on Android.
After a little bit of research, I found a comment in this code which says,
// Perform explicit SOCKS4a connection request. SOCKS4a supports remote host name resolution
// (i.e., Tor resolves the hostname, which may be an onion address).
// The Android (Apache Harmony) Socket class appears to support only SOCKS4 and throws an
// exception on an address created using INetAddress.createUnresolved() -- so the typical
// technique for using Java SOCKS4a/5 doesn't appear to work on Android
Here is my stack trace:
W/System.err: java.net.UnknownHostException: Unable to resolve host "uj3wazyk5u4hnvtk.onion": No address associated with hostname
W/System.err: at java.net.Inet6AddressImpl.lookupHostByName(Inet6AddressImpl.java:125)
W/System.err: at java.net.Inet6AddressImpl.lookupAllHostAddr(Inet6AddressImpl.java:74)
at java.net.InetAddress.getAllByName(InetAddress.java:752)
at com.android.okhttp.internal.Network$1.resolveInetAddresses(Network.java:29)
at com.android.okhttp.internal.http.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:187)
at com.android.okhttp.internal.http.RouteSelector.nextProxy(RouteSelector.java:156)
at com.android.okhttp.internal.http.RouteSelector.next(RouteSelector.java:98)
at com.android.okhttp.internal.http.HttpEngine.createNextConnection(HttpEngine.java:346)
at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:329)
at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:247)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:457)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:126)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:746)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:722)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:306)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:295)
at com.github.torrentfetcher.sources.ThePirateBay.parsePirateBay(ThePirateBay.java:117)
That seems like an issue for Android and out of the scope of Jsoup. We don't deal with name resolution. Looks like its not passing DNS off to the socks proxy.
I've setup a SOCKS proxy using Tor and trying to scrape a Tor onion site.
It works fine on Desktop (x64 Linux). But causes
java.net.UnknownHostException: Unable to resolve host "uj3wazyk5u4hnvtk.onion": No address associated with hostname
on Android.After a little bit of research, I found a comment in this code which says,
Here is my stack trace: