haskell / haskeline

A Haskell library for line input in command-line programs.
https://hackage.haskell.org/package/haskeline
BSD 3-Clause "New" or "Revised" License
223 stars 75 forks source link

Incorrect layout of list completion for double-width characters #142

Closed matil019 closed 4 years ago

matil019 commented 4 years ago

Nice to meet you!

As of haskeline-0.8.0.0, the layout of the list completion breaks if candidates contains double-width characters such as CJK characters.

Example:

wrong

Expected:

correct

Currently I workaround this by prepending an appropriate number of zero-width spaces (\x200b) to Completion.display after determining the visible width of a string using wcwidth in the package wcwidth.

I guess System.Console.Haskeline.Command.Completion.makeLines can be fixed to add support for double-width characters, like:

-    maxLength = min printWidth (maximum (map length ws) + minColPad)
+    wcswidth = sum . map wcwidth
+    maxLength = min printWidth (maximum (map wcswidth ws) + minColPad)

I'm happy to write a PR myself but I doubt that it's OK to add wcwidth to the dependencies of haskeline considering it is bundled with ghc.

judah commented 4 years ago

Thank you for the report! Is this issue with Windows? If so, I believe we have a fix in #125. We're due for another release; I'll look into doing that in the next few days.

Also, if you look at the line above the one you linked (l.122) you'll see we call the width function which actually does use wcwidth already: https://github.com/judah/haskeline/blob/master/System/Console/Haskeline/Backend/WCWidth.hs (We link in the C library explicitly by including its source in the Haskeline distribution.)

matil019 commented 4 years ago

I can reproduce it on every environment I have; not only on Windows (cmd/powershell/msys2) but also on Linux (Arch Linux, with/without tmux, locale=en_US.UTF-8 and ja_JP.UTF-8).

It looks like the width function is just a field of Layout; I think it represents the width of the terminal but has nothing to do with each width of the completions. Do you mean gWidth/gsWidth?

By the way, CJK letters (2E00-9FFF) are part of BMP characters (0000-FFFF). AFAICS the PR mainly addresses surrogate pairs, not character widths.

matil019 commented 4 years ago

Also reproduces on master (5f16b76).

judah commented 4 years ago

Ah, sorry, you're correct; I was mixing up width and gsWidth.

If you're willing to make a PR, that would be very appreciated! I think we could fix the issue by using System.Console.Haskeline.Backend.WCWidth.gsWidth instead of length in the Completion module. Note also this line: https://github.com/judah/haskeline/blob/5f16b76168f13c6413413386efc44fb1152048d5/System/Console/Haskeline/Command/Completion.hs#L143 Where we should probably be using gsWidth . stringToGraphemes instead of length . stringToGraphemes. The same fix seems appropriate for the code you originally pointed to.

matil019 commented 4 years ago

Thank you for reply! Opened a PR.