shlomif / Text-Table

CPAN Distribution to render text / ASCII-art / Unicode tables
https://metacpan.org/release/Text-Table
ISC License
4 stars 6 forks source link

Unicode Combining Chars' length not handled #11

Open jaggzh opened 3 years ago

jaggzh commented 3 years ago

G'day! I found we're not using Unicode::GCString, which leads to table misalignment when we over-'estimate' the length of a string. I forked and began going through the code, but length() is used in so many places I wasn't sure where to start fixing it up.

https://metacpan.org/pod/Unicode::GCString -- gives us the actual visible length (ie. the number of "grapheme clusters") of a string.

Here are some tests where you can see some differences resulting from the existence of the little vowel marks on the letters: Screenshot_20201203_230730

`#!/usr/bin/perl use Text::Table; use utf8; my %unich = ( "lrm" => "\x{200e}", "rlm" => "\x{200f}", ); my $tb = Text::Table->new(0,1,2,3); binmode STDOUT, ":encoding(UTF-8)";

$tb->load( ["[ 5:90:13]", "\033[31;1mfa{jotanibuwhu\033[0m", "فَٱجْتَنِبُوهُ", "so avoid\nit"], [ $tb->rule ], ["[21:19:13]", "\033[31;1myasotaHosiruwna\033[0m", "حسر", "they\ntire"], [ $tb->rule ], ["[21:19:12]", "walA", "$unich{rlm}وَلَا$unich{lrm}", "$unich{lrm}and\nnot$unich{lrm}"], [ $tb->rule ], ["[21:19:12]", "walA", "$unich{rlm}وَلَا$unich{lrm}", "and\nnot"], [ $tb->rule ], ["[21:19:12]", "walA", "وَلَا", "and\nnot"], [ $tb->rule ], ["[21:19:12]", "walA", "ولَا", "and\nnot"], ); print $tb; `

Grinnz commented 3 years ago

Note that the length you want from Unicode::GCString is the "columns" count, not the number of grapheme clusters (since a grapheme cluster is not necessarily rendered to one column).

amutiso commented 1 year ago

Any hope this will one day be addressed or is there a recommendation on how to get a column alignment for rows with mixed ascii and unicode values?