rnewman / clj-apache-http

Clojure HTTP library using the Apache HttpClient.
32 stars 14 forks source link

Escaped spaces in URL breaks call #9

Open reuableahcim opened 13 years ago

reuableahcim commented 13 years ago

If I make a call with escaped spaces in the URL, it breaks when parsing into a full URI. For example, if I make a call like this:

(http/get "http://www.someurl.com/page/A%20Page/123" :query {:some_param "value"} :as :string)

I get the following error:

java.net.URISyntaxException: Illegal character in path at index 52: http://www.someurl.com/page/A Page/123?some_param=value

It looks like whatever is merging the querystring into the base URL is decoding the space when it attaches the bit after the ? and this fails the java.net.URI check.

When I pass in a full URL string, with the querystring already attached, everything is fine, e.g.:

(http/get "http://www.someurl.com/page/A%20Page/123?some_param=value" :as :string)

Here's the stack trace, starting from the point I make my call to http/get:

at java.net.URI$Parser.fail(URI.java:2809)
at java.net.URI$Parser.checkChars(URI.java:2982)
at java.net.URI$Parser.parseHierarchical(URI.java:3066)
at java.net.URI$Parser.parse(URI.java:3014)
at java.net.URI.<init>(URI.java:578)
at org.apache.http.client.utils.URIUtils.createURI(URIUtils.java:106)
at com.twinql.clojure.http$resolve_uri.invoke(http.clj:415)
at com.twinql.clojure.http$get.doInvoke(http.clj:495)
at clojure.lang.RestFn.invoke(RestFn.java:486)
at singularity.request$make_request.invoke(request.clj:37)
sbowman commented 13 years ago

The problem stems from lines 424-436 in http.clj. The URI gets parsed then reassembled with the query string. Unfortunately, java.net.URI is helpful enough that it decodes the path for you when you call .getPath on it:

user=> (def uri (java.net.URI. "http://localhost:4000/p/abc/d/Just%20For%20Fun/m"))
#'user/uri
user=> uri
#<URI http://localhost:4000/p/abc/d/Just%20For%20Fun/m>
user=> (.getPath uri)
"/p/abc/d/Just For Fun/m"

So when you call (.getPath u) on line 431, you assemble a decoded string back in to the URI, which causes java.net.URI to barf. I'm pretty sure the problem will occur with UTF-8 paths too, since obviously java.net.URI isn't automatically encoding the path.

sbowman commented 13 years ago

Instead of (.getPath u), you should be using (.getRawPath u):

user=> (def uri (java.net.URI. "http://localhost:4000/p/abc/d/Just%20For%20Fun/m"))
#'user/uri
user=> uri
#<URI http://localhost:4000/p/abc/d/Just%20For%20Fun/m>
user=> (.getPath uri)
"/p/abc/d/Just For Fun/m"
user=> (.getRawPath uri)
"/p/abc/d/Just%20For%20Fun/m"