Closed masaccio closed 4 months ago
Looks like 0.2.9 is where the change happened.
@masaccio which terminal?
https://github.com/jquast/wcwidth/pull/91#issuecomment-1785693243 related issue and comment about the change
This was iTerm2 on a Mac with TERM=xterm-256color
This is the text I expect to see aligned - https://raw.githubusercontent.com/masaccio/compact-json/main/tests/data/test-issue-4.ref-1.json
Though in my browser it's not aligned so I don't know what the right answer is.
I will say that I also use iTerm2, and that it is not a great indicator of multilanguage support. I have since authored a testing and reporting tool, ucs-detect, and have published results for ~27 terminals.
The following terminals match this library's measurements for Hindi:
The other ~23 terminals, including iTerm2, do not. iTerm2 gets an overall score of "B" rating for LANG score while the ones listed above get A's.
Some of them are systematic errors and I may create bug reports for their respective projects. However, languages like Hindi of script Devanagari are very excessive with combining characters (Category codes Mc and Mn), and, strictly following the Unicode Specifications, as these 4 terminals and this library do, may result in so much "squeezing" to be totally illegible!
On your findings of the browser, I have found that they do not make the effort to align by column as a terminal is expected to (see screenshots in https://github.com/jquast/wcwidth/issues/123#issuecomment-2028115594)
I have authored a dummy "check" function to display a sequence where '|' should align,
def check(n, phrase):
print('|'+(' '*wcwidth.wcswidth(phrase))+'|'+'\n'+'|'+phrase+'|\n')
And these are the results for iTerm (left) and WezTerm (right)
I don't know Devanagari enough to say for sure, I would say that iTerm2 appears to fail to correctly combine characters of category Mc and Mn, while wezterm does combine them but also sometimes reduces the font size to accommodate their expected width and maybe some combining characters are also poorly aligned
Thanks for the comprehensive debug. I can see I'm staring a large rabbit hole of encodings I don't understand so I'll step away! Wezterm does indeed agree with your library (though not editing in vim) and that is enough for me.
I recently updated from 0.2.6 to 0.2.13 and I have some tests breaking in a package that uses
wcswidth
. The following test fails every check in 0.2.13 but passes in 0.2.6:Aligning some ASCII text in my terminal, I believe that the check lengths are correct: