jquast / wcwidth

Python library that measures the width of unicode strings rendered to a terminal
Other
393 stars 58 forks source link

Bugfixes for zero-width characters #91

Closed jquast closed 10 months ago

jquast commented 11 months ago

Major

Bugfix zero-with characters, closes #57, #47, #45, #39, #26, #25, #24, #22, #8, wow !

This is mostly achieved by replacing ZERO_WIDTH_CF with dynamic parsing by Category codes in bin/update-tables.py and putting those in the zero-wide tables.

Tests

codecov[bot] commented 11 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Files Coverage Δ
wcwidth/table_zero.py 100.00% <ø> (ø)
wcwidth/wcwidth.py 100.00% <100.00%> (ø)

:loudspeaker: Thoughts on this report? Let us know!.

jquast commented 10 months ago

Using latest release of 'ucs-detect' tool, https://pypi.org/project/ucs-detect/ to verify this branch,

kitty

results of using kitty terminal emulator on macos, with master branch of wcwidth, version 0.2.8:

Displaying success results of wide and zero-width characters by language
Language Support: 25 of 41
                 Failed Language:  Total, Success Pct
                         Sinhala:     93,   46.2 %
     Mongolian, Halh (Mongolian):     33,   90.9 %
                       Malayalam:     57,   12.3 %
                        Gujarati:     68,   26.5 %
                          Telugu:     68,   26.5 %
                Panjabi, Eastern:     93,   46.2 %
                         Bengali:     56,   10.7 %
                           Tamil:     54,    7.4 %
                  Khmer, Central:     58,   13.8 %
                            Khün:     52,    3.8 %
                         Burmese:     61,   18.0 %
                          Chakma:    118,   57.6 %
             Javanese (Javanese):     53,    5.7 %
                           Hindi:     73,   31.5 %
              Sanskrit (Grantha):     53,    5.7 %
                         Kannada:     56,   10.7 %

With this branch, verison 0.2.9

Language Support: 38 of 41
                 Failed Language:  Total, Success Pct
                         Bengali:    201,   99.5 %
                         Sinhala:    220,   90.9 %
                       Malayalam:    214,   93.5 %

mlterm

results of using mlterm terminal emulator on linux, master branch of wcwidth, version 0.2.8:

Language Support: 24 of 41
                 Failed Language:  Total, Success Pct
                Arabic, Standard:  1,099,   91.0 %
                         Bengali:    116,   13.8 %
                         Burmese:    115,   13.0 %
                          Chakma:    248,   59.7 %
                        Gujarati:    143,   30.1 %
                           Hindi:    146,   31.5 %
             Javanese (Javanese):    114,   12.3 %
                         Kannada:    109,    8.3 %
                  Khmer, Central:    114,   12.3 %
                            Khün:    121,   17.4 %
                       Malayalam:    118,   15.3 %
                Panjabi, Eastern:    173,   42.2 %
              Sanskrit (Grantha):    107,    6.5 %
                         Sinhala:    201,   50.2 %
                           Tamil:    105,    4.8 %
                          Telugu:    141,   29.1 %
                Tibetan, Central:    297,   99.7 %

With this branch, verison 0.2.9

Language Support: 33 of 41
                 Failed Language:  Total, Success Pct
                Arabic, Standard:  1,099,   91.0 %
                         Bengali:  1,061,   94.3 %
             Javanese (Javanese):    178,   43.8 %
                         Kannada:  1,001,   99.9 %
                       Malayalam:    375,   73.3 %
     Mongolian, Halh (Mongolian):     33,   90.9 %
                         Sinhala:  1,057,   94.6 %
                Tibetan, Central:    292,   99.3 %