The wrapping functionality previously counted the number of bytes
in a string when determining whether or not to wrap a line. As the
todo-comment right above already stated, this obviously breaks line
wrapping for character encodings that might use more than a single
byte to encode a character. More specifically, it also breaks line
wrapping for the w3m dumps of The Register's On Call column that
are delivered to my email inbox every Saturday, since these often
contain unicode quotation marks which use up to three bytes each.
Using the unicode-aware count_codepoints predicate to determine
line lengths solves the issue.
The wrapping functionality previously counted the number of bytes in a string when determining whether or not to wrap a line. As the todo-comment right above already stated, this obviously breaks line wrapping for character encodings that might use more than a single byte to encode a character. More specifically, it also breaks line wrapping for the w3m dumps of The Register's On Call column that are delivered to my email inbox every Saturday, since these often contain unicode quotation marks which use up to three bytes each. Using the unicode-aware count_codepoints predicate to determine line lengths solves the issue.