muesli4 / table-layout

Layout data in grids and pretty tables. Provides a lot of tools to get the cell formatting right (positional alignment, alignment on specific characters and limiting of cell width)..
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

Add WideString and WideText. #23

Closed Xitian9 closed 2 years ago

Xitian9 commented 2 years ago

These calculate width differently than string and text, taking into account that some characters are wider than 1 space (e.g. Han characters) and some are narrower (e.g. unicode combining characters).

Also adds some instances for Text and Text Builder.

Fixes #8.

muesli4 commented 2 years ago

I will probably need more time to review this.

Xitian9 commented 2 years ago

I think text is widespread enough that if you're writing a text handling-library, you're pretty safe incorporating it.

By unifying with a type class, I presume you mean:

newtype WideString a = WideString a

instance Cell (WideString String) where …
instance Cell (WideString Text) where…

I considered unifying them in a type class. I would be happy to do that, but I initially decided against it for a couple of reasons:

  1. There are really only three types that we would want instances for: String, Text, and lazy Text
  2. It would help type inference and be shorter to refer to WideString and WideText instead of WideString String and WideString Text.
  3. The most general definition would be instance HasChars a => Cell (WideString a), but the typeclass methods actually wouldn't let you short-circuit within dropLeft, so you would probably want separate definitions for each anyway.

If you want to break out the wide stuff into a separate package I would be happy with that too. The only transitive dependencies that doclayout adds are doclayout itself and safe.

Xitian9 commented 2 years ago

I've made the requested changes (I think).

I also corrected a bug and added a test. The bug occurred where, when dropping from the left, zero-width characters which immediately followed characters of width 2 or greater were not dropped when the wide character would have been cut in half (and so dropped and padded with space).

To demonstrate, let XX be a character of width 2, 0 a character of width 0, and Y another character. dropLeft 1 "XX0Y" should be " Y", but previously it would have been " 0Y".

muesli4 commented 2 years ago

I resolved the conflicts locally and rebased manually.

elldritch commented 2 years ago

I'm really looking forward to seeing this release! I've been working on a quick project that displays Twitter profile names, and some profile names with emojis or other Unicode seem to cause misalignment. In case you want to test whether this fix addresses that, here are some examples:

+------+--------------+-----------------+---------------------------------------------------+-------------------------+-----------+-----------+-----------+--------+
| 14   | 36           | rakyll          | Jaana Dogan ヤナ ドガン                                | 2007-12-03 13:08:02 UTC | True      | 73260     | 1084      | 86769  |
+------+--------------+-----------------+---------------------------------------------------+-------------------------+-----------+-----------+-----------+--------+
+------+--------------+-----------------+---------------------------------------------------+-------------------------+-----------+-----------+-----------+--------+
| 32   | 20           | ashleyloob      | 🍍ashley🍍                                          | 2020-01-17 07:07:17 UTC | False     | 21505     | 400       | 8536   |
+------+--------------+-----------------+---------------------------------------------------+-------------------------+-----------+-----------+-----------+--------+
+------+--------------+-----------------+---------------------------------------------------+-------------------------+-----------+-----------+-----------+--------+
| 52   | 13           | ryantakesoff    | Ryan Breslow 🕺                                    | 2011-09-07 21:18:44 UTC | True      | 25770     | 525       | 3669   |
+------+--------------+-----------------+---------------------------------------------------+-------------------------+-----------+-----------+-----------+--------+
Xitian9 commented 2 years ago

It looks like your examples don't quite align properly. I'll do some more testing to see if the bug can be fixed here or if it's an upstream issue with doclayout.

Xitian9 commented 2 years ago

I get the katakana aligning properly on my screen, but the emoji are incorrectly detected as a single space wide, leading to misalignment. I think this is an upstream bug. (Reported here)