quchen / prettyprinter

A modern, extensible and well-documented prettyprinter.
BSD 2-Clause "Simplified" License
295 stars 36 forks source link

Support for wide characters and emojis #103

Open ony opened 5 years ago

ony commented 5 years ago

Currently Data.Text.length is used to identify length of the text in characters. But this is not true for Unicode wide characters and emojis that can occupy 2 cells on the terminal. As well some of the characters might be invisible or lead to vertical tab etc.

Most likely it should be on the rendering side (e.g. HTML may use different ways to align text than spaces) and in this case it worth considering indent/column just a guideline based on other characters in layout. I.e.

|W|i|d|e|W|
|0 |1 |2 |3 |4|5 - virtual columns during layout
|0 |2 |4 |6 |8|9 - actual terminal columns during render

(dependening on font may be rendered unaligned in browser, but should be fine in terminal)

It should mostly work fine for right alignment as long as people don't use characters to pad, but use proper indent. Though last character might not always be tightly attached to right side.

More examples can be found in https://github.com/simonmichael/hledger/issues/895

sjakobi commented 4 years ago

This sounds like a nice feature!

I believe this would be easier to implement in the layouter than the renderer. The layouter is already somewhat output-environment-aware, via PageWidth:

https://github.com/quchen/prettyprinter/blob/7da3b1d32f8a74efb3bcf1b2d062f8f24a64c918/prettyprinter/src/Data/Text/Prettyprint/Doc/Internal.hs#L1631-L1632

Otherwise I believe we'd need to make some big changes to SimpleDocStream.

As a first step, we'd need a way to get the correct character widths – is there a nice, well-maintained Haskell library for that? How does hledger address the problem?

sjakobi commented 4 years ago

As a first step, we'd need a way to get the correct character widths – is there a nice, well-maintained Haskell library for that?

I see that tasty relies on wcwidth: https://github.com/feuerbach/tasty/blob/072ecb1cd4f6755f3b974b1c00a36fbd66266181/core/Test/Tasty/Ingredients/ConsoleReporter.hs#L598-L613

ony commented 4 years ago

As a first step, we'd need a way to get the correct character widths – is there a nice, well-maintained Haskell library for that?

You can see a bit of summary about usages in this comment.

I see that tasty relies on wcwidth: https://github.com/feuerbach/tasty/blob/072ecb1cd4f6755f3b974b1c00a36fbd66266181/core/Test/Tasty/Ingredients/ConsoleReporter.hs#L598-L613

If it would be that simple it would be nice. Unfortunately that lwcwidth library is not actively maintained at the moment. See https://github.com/solidsnack/wcwidth/issues/2 .

sjakobi commented 3 years ago

I noticed that doclayout includes some logic for wide characters via its realLength function: http://hackage.haskell.org/package/doclayout-0.3/docs/src/Text.DocLayout.html#realLength

I suspect it's too simple for your needs, @ony, but it might be worth investigating how much it would take to make it good enough.

quchen commented 3 years ago

A simple fix would be adding an unsafeTextWithLength :: Text -> Int -> Doc ann function, where the text length can be specified by the programmer. I don’t expect emojis etc. are the main use case for this library. Determining char width is something nontrivial (and font-dependent!) that’s out of scope here, but we could offer a keyhole to plug other utilities in that offer something like charWidth :: Char -> Font -> Float.