hannesm / jackline

minimalistic secure XMPP client in OCaml
BSD 2-Clause "Simplified" License
251 stars 20 forks source link

Change validate_utf8 to iterate over strings by UChar instead of by char #43

Closed cfcs closed 9 years ago

cfcs commented 9 years ago

Changes src/xmpp_callbacks.ml -> validate_utf8 to check UChars instead of chars, also to use 0xFFfd for unknown characters, and to ignore CRs completely.

I'd like to see something that took care of all non-printable characters, especially things like vertical tabs, right-to-left-markers and the like, perhaps someone can help out with that.

dbuenzli commented 9 years ago

I'd like to see something that took care of all non-printable characters, especially things like vertical tabs, right-to-left-markers and the like, perhaps someone can help out with that.

A broad classification is given by general category Unicode property. You can look it up for an up-to-date version of unicode using Uucp.Gc. But I'm not sure it's a good idea to remove bidi control characters.

hannesm commented 9 years ago

thanks ; merged via #45