joshbuddy / http_router

A kick-ass HTTP router for use in Rack
MIT License
198 stars 44 forks source link

Path#uri_escape! does not work for multibyte strings in ruby 1.9.2 #16

Closed Burgestrand closed 13 years ago

Burgestrand commented 13 years ago

Since Ruby 1.9 is string encoding aware, there is an issue with multibyte characters in Path#url. Assuming UTF-8 encoding (for example) "ä".size will return 1, whereas in ruby 1.8 it will return 2.

So, demonstration of the issue (ruby 1.9): uri_escape! "ä" => "%C3" (expected, "%C3%A4")

Method source: https://github.com/joshbuddy/http_router/blob/acf7b158e0449fcd54fc02d97b0e7721e34986a8/lib/http_router/path.rb#L58

I propose changing $1.size to $1.bytesize, which ought to work in both 1.8 and 1.9:

def uri_escape!(s)
  s.to_s.gsub!(/([^:\/?\[\]\-_~\.!\$&'\(\)\*\+,;=@a-zA-Z0-9]+)/n) { "%#{$1.unpack('H2'*$1.bytesize).join('%').upcase}" }
end

The reason I didn’t fork, patch and pull request is that I couldn’t figure out how to write tests for this issue.

PS: what is the reason for using this way of escaping an URI instead of using URI.escape in ruby stdlib?

joshbuddy commented 13 years ago

why not indeed?