NO-BREAK SPACE is Unicode code point A0. In UTF-8, it's encoded as C2 A0. When those bytes come at the end of a URL, Rinku is chopping them up, making the C2 part of the URL and the A0 part of text after the link, resulting in illegal UTF-8.
Here, Runku has included the INVERTED EXCLAMATION MARK as part of the URL. I think it would be better logic to parse it as being after the URL—but regardless, it doesn't split the bytes apart.
This is environment dependent. I'm running Ruby 2.0.0-p353. I'm running the same version on Heroku, where I don't see this issue:
NO-BREAK SPACE is Unicode code point A0. In UTF-8, it's encoded as C2 A0. When those bytes come at the end of a URL, Rinku is chopping them up, making the C2 part of the URL and the A0 part of text after the link, resulting in illegal UTF-8.
Oddly, this does not happen with INVERTED EXCLAMATION MARK, the very next Unicode code point (A1):
Here, Runku has included the INVERTED EXCLAMATION MARK as part of the URL. I think it would be better logic to parse it as being after the URL—but regardless, it doesn't split the bytes apart.
This is environment dependent. I'm running Ruby 2.0.0-p353. I'm running the same version on Heroku, where I don't see this issue: