nicolassmith / urlevaluator

URL evaluator for android - lengthens shortened URLs for correct handling in android
https://play.google.com/store/apps/details?id=com.github.nicolassmith.urlevaluator
Other
17 stars 3 forks source link

Some URLs not evaluating #9

Closed nicolassmith closed 11 years ago

nicolassmith commented 11 years ago

1:56 PM Leo: bork bork bork something is borked in urlevaluator i hvaen't debugged it 1:57 PM but i imagine it has something to do with HEAD requests not working out? i should look into it 7 minutes 2:04 PM me: can you email me a url that is borked? Leo: lemme see 2:06 PM check this one ... http://ow.ly/kxPwZ 2:07 PM except i'm not sure if that is the actual URL because on twitter sometimes people tweet short URLs and twitter reshortens them, stupidly 2:08 PM also http://qr.ae/TEhXB 5 minutes 2:13 PM me: yes, i get errors on those but i can't debug right now 2:14 PM Leo: no prob bob do you send any request headers? 2:15 PM it looks like ow.ly requires a Host:ow.ly request header i tried this with telnet by hand :P old school, yo if I leave off that header, it gives a 503. If I give just that one header, I get a 301 Moved 2:17 PM me: did owly work before we made request("HEAD")? 2:18 PM Leo: yeah me: hmmm Leo: oh maybe? idk i wsa just assuming ass u me me: can you open an issue and put this info in it Leo: haha good idea

duetosymmetry commented 11 years ago

Shoot, I was supposed to file that bug. Sorry about that.

nicolassmith commented 11 years ago

can you put a log of your telnet session?

duetosymmetry commented 11 years ago
leo@conservation:~> telnet ow.ly 80
Trying 204.15.172.228...
Connected to ow.ly.
Escape character is '^]'.
HEAD http://ow.ly/kxPwZ HTTP/1.1

HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
Connection closed by foreign host.

Compared to including a Host header:

leo@conservation:~> telnet ow.ly 80
Trying 204.15.172.246...
Connected to ow.ly.
Escape character is '^]'.
HEAD http://ow.ly/kxPwZ HTTP/1.1
Host:ow.ly

HTTP/1.1 301 Moved Permanently
Date: Tue, 30 Apr 2013 23:02:34 GMT
Server: Apache/2.2.14 (Ubuntu)
X-Powered-By: PHP/5.3.2-1ubuntu4.18
Set-Cookie: OWLYSID=bfd4ee918a70da06f1d5c98e81dfde089c36fd74; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: http://engineering.quora.com/Continuous-Deployment-at-Quora
X-Gridnum: 61
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

Connection closed by foreign host.
leo@conservation:~> 
duetosymmetry commented 11 years ago

Update: this has nothing to do with setting the Host property—I am quite confident that the HttpURLConnection already sets this property by default. Instead, this has to do with responses which are gzip compressed. It seems that by default, HttpURLConnection requests compressed responses. When I turn that off, I am able to get the redirects from ow.ly and qr.ae. The bug is in the internal gzip decompression routines. Here is a partial stack trace (my line numbers are probably different, but it's on the line that reads responseCode = con.getResponseCode();):

04-30 22:27:07.915: W/System.err(17925): java.io.EOFException
04-30 22:27:07.915: W/System.err(17925):    at java.util.zip.GZIPInputStream.readFully(GZIPInputStream.java:206)
04-30 22:27:07.915: W/System.err(17925):    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:98)
04-30 22:27:07.925: W/System.err(17925):    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
04-30 22:27:07.925: W/System.err(17925):    at libcore.net.http.HttpEngine.initContentStream(HttpEngine.java:528)
04-30 22:27:07.925: W/System.err(17925):    at libcore.net.http.HttpEngine.readResponse(HttpEngine.java:836)
04-30 22:27:07.925: W/System.err(17925):    at libcore.net.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:274)
04-30 22:27:07.925: W/System.err(17925):    at libcore.net.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:486)
04-30 22:27:07.925: W/System.err(17925):    at com.github.nicolassmith.urlevaluator.GeneralEvaluatorTask.evaluate(GeneralEvaluatorTask.java:34)
...
duetosymmetry commented 11 years ago

Did a bit more probing and found out that this occurs when a server lies about a response being gzip encoded. That is, if a server responds with plain text but claims that it's responding in gzip encoding, then the HttpEngine gets confused and tries to gunzip anyway ...

The best solution I have for now is to just not request gzip encoding. This is addressed in https://github.com/duetosymmetry/urlevaluator/commit/18c26820deb1da4ccf760f68958ca24f02e34873

nicolassmith commented 11 years ago

Fixed in 8080ca3

nicolassmith commented 11 years ago

Thanks, Leo.