Closed nicolassmith closed 11 years ago
Shoot, I was supposed to file that bug. Sorry about that.
can you put a log of your telnet session?
leo@conservation:~> telnet ow.ly 80
Trying 204.15.172.228...
Connected to ow.ly.
Escape character is '^]'.
HEAD http://ow.ly/kxPwZ HTTP/1.1
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
Connection closed by foreign host.
Compared to including a Host header:
leo@conservation:~> telnet ow.ly 80
Trying 204.15.172.246...
Connected to ow.ly.
Escape character is '^]'.
HEAD http://ow.ly/kxPwZ HTTP/1.1
Host:ow.ly
HTTP/1.1 301 Moved Permanently
Date: Tue, 30 Apr 2013 23:02:34 GMT
Server: Apache/2.2.14 (Ubuntu)
X-Powered-By: PHP/5.3.2-1ubuntu4.18
Set-Cookie: OWLYSID=bfd4ee918a70da06f1d5c98e81dfde089c36fd74; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: http://engineering.quora.com/Continuous-Deployment-at-Quora
X-Gridnum: 61
Vary: Accept-Encoding
Connection: close
Content-Type: text/html
Connection closed by foreign host.
leo@conservation:~>
Update: this has nothing to do with setting the Host property—I am quite confident that the HttpURLConnection already sets this property by default. Instead, this has to do with responses which are gzip compressed. It seems that by default, HttpURLConnection requests compressed responses. When I turn that off, I am able to get the redirects from ow.ly and qr.ae. The bug is in the internal gzip decompression routines. Here is a partial stack trace (my line numbers are probably different, but it's on the line that reads responseCode = con.getResponseCode();
):
04-30 22:27:07.915: W/System.err(17925): java.io.EOFException
04-30 22:27:07.915: W/System.err(17925): at java.util.zip.GZIPInputStream.readFully(GZIPInputStream.java:206)
04-30 22:27:07.915: W/System.err(17925): at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:98)
04-30 22:27:07.925: W/System.err(17925): at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
04-30 22:27:07.925: W/System.err(17925): at libcore.net.http.HttpEngine.initContentStream(HttpEngine.java:528)
04-30 22:27:07.925: W/System.err(17925): at libcore.net.http.HttpEngine.readResponse(HttpEngine.java:836)
04-30 22:27:07.925: W/System.err(17925): at libcore.net.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:274)
04-30 22:27:07.925: W/System.err(17925): at libcore.net.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:486)
04-30 22:27:07.925: W/System.err(17925): at com.github.nicolassmith.urlevaluator.GeneralEvaluatorTask.evaluate(GeneralEvaluatorTask.java:34)
...
Did a bit more probing and found out that this occurs when a server lies about a response being gzip encoded. That is, if a server responds with plain text but claims that it's responding in gzip encoding, then the HttpEngine gets confused and tries to gunzip anyway ...
The best solution I have for now is to just not request gzip encoding. This is addressed in https://github.com/duetosymmetry/urlevaluator/commit/18c26820deb1da4ccf760f68958ca24f02e34873
Fixed in 8080ca3
Thanks, Leo.
1:56 PM Leo: bork bork bork something is borked in urlevaluator i hvaen't debugged it 1:57 PM but i imagine it has something to do with HEAD requests not working out? i should look into it 7 minutes 2:04 PM me: can you email me a url that is borked? Leo: lemme see 2:06 PM check this one ... http://ow.ly/kxPwZ 2:07 PM except i'm not sure if that is the actual URL because on twitter sometimes people tweet short URLs and twitter reshortens them, stupidly 2:08 PM also http://qr.ae/TEhXB 5 minutes 2:13 PM me: yes, i get errors on those but i can't debug right now 2:14 PM Leo: no prob bob do you send any request headers? 2:15 PM it looks like ow.ly requires a Host:ow.ly request header i tried this with telnet by hand :P old school, yo if I leave off that header, it gives a 503. If I give just that one header, I get a 301 Moved 2:17 PM me: did owly work before we made request("HEAD")? 2:18 PM Leo: yeah me: hmmm Leo: oh maybe? idk i wsa just assuming ass u me me: can you open an issue and put this info in it Leo: haha good idea