rsl / stringex

Some [hopefully] useful extensions to Ruby’s String class. It is made up of three libraries: ActsAsUrl [permalink solution with better character translation], Unidecoder [Unicode to Ascii transliteration], and StringExtensions [miscellaneous helper methods for the String class].
MIT License
984 stars 158 forks source link

handle control characters #174

Closed telzul closed 8 years ago

telzul commented 9 years ago
"test\u0003test".to_url
=>test\x03test"

i have crawled data that for some reason has \u0003 used within their text. It happens that they also use it in their title; as these chars \u0000 - \u001f are not human readable characters, could they be removed from the result?

senotrusov commented 9 years ago

This is indeed the issue. Recently I had a case with surprising input:

"Jörg Immendor\u0014. Les théâtres de la peinture"

and with the help of stringex it was turned to:

"jorg-immendor\x14-les-theatres-de-la-peinture"
rsl commented 9 years ago

sure. pull request welcome. sorry for slow reply. my inbox is... yeah. that.

tamaloa commented 8 years ago

any chance of PR #178 being merged and a new gem version release any time soon?

rsl commented 8 years ago

new gem out there now. thanks for reminding me