Closed danielbachhuber closed 7 years ago
Yay! Character encoding!
This is due to combining marks in the string. mb_strlen()
counts characters, but when a combining mark is printed it gets combined with another character to form a grapheme with the width of one character. That's what we actually want to count in a situation like this where the number of printed characters is important.
mb_strlen()
will be short by one character for every combining mark present in the string.
You can count the number of graphemes in a string with grapheme_strlen()
but this requires the intl
extension, and I've no idea how widespread that is. Stack Overflow tells me that preg_match_all( '/\X/u', $str)
is an alternative.
The character counting issue in php-cli-tools is in \cli\safe_strlen()
.
This might be a solution, but it's untested. I'll take a proper look later:
function safe_strlen( $str ) {
return preg_match_all( '/\X/u', $str );
}
Yay! Character encoding!
I think I detect sarcasm here but I'm not quite sure... ;)
This might be a solution, but it's untested. I'll take a proper look later
Sounds good, thanks!
Wow. Turns out that the font makes a difference to how some combined characters appear. Here's the same output in two different fonts:
SF Mono:
Menlo:
Turns out that the font makes a difference to how some combined characters appear.
:(
I don't think there's much one can do about fonts not displaying stuff correctly but I got the original Nepali example working (on Ubuntu at least) by using the suggested grapheme_strlen()
(with preg_match_all( '/\X/u' )
backup) in a new function strwidth()
, to be called by safe_str_pad()
, with adjustments for East Asian Width.
PR to follow.
Edit: just noticed the padding for post_title
is off so pushed a fix for that.
Resolved (as good as possible) through #107 .
From https://github.com/wp-cli/wp-cli/issues/3038#issuecomment-230158804