muesli4 / table-layout

Layout data in grids and pretty tables. Provides a lot of tools to get the cell formatting right (positional alignment, alignment on specific characters and limiting of cell width)..
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

Cell formatting does not measure visible length in backend #4

Closed wbyrnetx closed 4 years ago

wbyrnetx commented 4 years ago

If you output text that contains ANSI color characters to a cell, the right border line isn't drawn in the correct place.

Screen Shot 2019-09-11 at 9 01 40 PM
muesli4 commented 4 years ago

My guess is that this is because it doesn't calculate the text width properly (I guess too long). By providing a custom length function this issue could be solved. Since that also effects how the text type is cut and centered, a type class is probably the best approach. Then an instance for a colored text type is easy to provide. However, this might involve changes that take a bit of time.

ony commented 4 years ago

Don't forget about emojis and wide characters.... hledger hit this in https://github.com/simonmichael/hledger/pull/905

simonmichael commented 4 years ago

Possibly of interest: wide character utilities at https://github.com/simonmichael/hledger/blob/master/hledger-lib/Hledger/Utils/String.hs#L257 and https://github.com/simonmichael/hledger/blob/master/hledger-lib/Hledger/Utils/Text.hs

muesli4 commented 4 years ago

Thanks for the input. I have some good news. I managed to rewrite and refactor the relevant parts of the formatting to be independent of String. As a bonus I also simplified some of the code drastically. That means, not only will the next release support every possible string type that is able to implement the type class Cell but it will also be easy to implement support for colored text and wide characters. Most likely it will also be possible to specify the builder with which the table is constructed.

class Cell a where
    -- | Drop a number of characters from the left side. Treats negative numbers
    -- as zero.
    dropLeft :: Int -> a -> a

    -- | Drop a number of characters from the right side. Treats negative
    -- numbers as zero.
    dropRight :: Int -> a -> a

    -- | Drop characters from both sides. Treats negative numbers as zero.
    dropBoth :: Int -> Int -> a -> a
    dropBoth l r = dropRight r . dropLeft l

    -- | Returns the length of the visible characters as displayed on the
    -- output medium.
    visibleLength :: a -> Int

    -- | Measure the preceeding and following characters
    measureAlignment :: (Char -> Bool) -> a -> AlignInfo

    buildCell :: StringBuilder b => a -> b

class Semigroup a => StringBuilder a where
    stringB :: String -> a
    charB :: Char -> a
    replicateCharB :: Int -> Char -> a
    replicateCharB i = StringB . replicate i

instance StringBuilder String where
    stringB = id
    charB = (: [])
    replicateCharB = replicate

instance StringBuilder (Endo String) where
    stringB s = Endo (s ++)
    charB = Endo . (:)
    replicateCharB i c = Endo $ \s -> foldr ($) s $ replicate i (c :) 

instance Cell String where
    dropLeft = drop
    dropRight n = reverse . drop n . reverse
    visibleLength = length
    measureAlignment p xs = case break p xs of
        (ls, s : rs) -> AlignInfo (length ls) $ Just (length rs)
        (ls, []) -> AlignInfo (length ls) Nothing

    buildCell = stringB

instance Cell T.Text where
    dropLeft = T.drop
    dropRight = T.dropEnd
    visibleLength = T.length
    measureAlignment p t = case T.break p t of
        (lt, rt) | T.length rt == 0 -> AlignInfo (T.length lt) Nothing
                 | otherwise        -> AlignInfo (T.length lt) $ Just $ T.length $ T.tail rt

Coloration can then be implemented with a rose tree. For wide char support a newtype wrapper should be enough.

The column modifier functions usually have the following signature:

pad :: (Cell a, StringBuilder b) => Position o -> Int -> a -> b

It should be similar for building tables. I'm not sure yet where is the best place to put the instances. Maybe one or several separate packages.

ony commented 4 years ago

Ehhh.... Basically problem of identifying length of the char in "cells" were delegated to users :) . This makes this library more attractive for projects like hledger to benefit from table layouts.

... For wide char support a newtype wrapper should be enough.

The thing is that default String and Text both include wide characters. Thus I'd say it is opposite. I think, to support only non-wide chars you need to create newtype wrapper with hidden constructor and function that expose it via validation that String contains no unsupported characters.
Instances of Cell you provided are valid only for subset of String and T.Text.

P.S. Somehow I remembered mono-traversable package that is used in conduit package to go through types like Text, ByteString.

muesli4 commented 4 years ago

Ehhh.... Basically problem of identifying length of the char in "cells" were delegated to users :) . This makes this library more attractive for projects like hledger to benefit from table layouts.

Not necessarily. After all the library will provide most of the instances but you always have the flexibility to use your own. I have not decided yet what and how. But I have the feature already integrated and specialized to String to provide the existing functionality.

... For wide char support a newtype wrapper should be enough.

The thing is that default String and Text both include wide characters. Thus I'd say it is opposite. I think, to support only non-wide chars you need to create newtype wrapper with hidden constructor and function that expose it via validation that String contains no unsupported characters. Instances of Cell you provided are valid only for subset of String and T.Text.

We're in full agreement.

P.S. Somehow I remembered mono-traversable package that is used in conduit package to go through types like Text, ByteString.

I actually thought of that while developing. Let's see, maybe I can derive a very general instance based on MonoTraversable that catches a lot of the cases.

muesli4 commented 4 years ago

@ony @simonmichael I'm a bit confused that there is basically nothing commented in that function. Where does it get the ranges from? I tried reading myself into the topic of multi-width unicode characters and it seems to me that this issue is not solved in general. I'm also not particularly fond of tinkering with this.

Anyway, I will open a separate issue for this. See #8.

muesli4 commented 4 years ago

example

Unfortunately, at the moment all cell types have to be the same. So there is a bit of boilerplate involved. And Formatted only supports one color for the whole cell. If there is demand for it then I will support more. Let me know if that works for your use case @wbyrnetx.

See here.

simonmichael commented 4 years ago

As it says: "From Pandoc." I guess Pandoc gets it from the official Unicode standard.