Use a unicode-aware count function for string wrapping

seifferth commented 1 year ago

The wrapping functionality previously counted the number of bytes in a string when determining whether or not to wrap a line. As the todo-comment right above already stated, this obviously breaks line wrapping for character encodings that might use more than a single byte to encode a character. More specifically, it also breaks line wrapping for the w3m dumps of The Register's On Call column that are delivered to my email inbox every Saturday, since these often contain unicode quotation marks which use up to three bytes each. Using the unicode-aware count_codepoints predicate to determine line lengths solves the issue.

wangp commented 1 year ago

Thanks, I wrote a more efficient version on master.

seifferth commented 1 year ago

That's nice. Thanks.

wangp / bower

Use a unicode-aware count function for string wrapping #114