muesli4 / table-layout

Layout data in grids and pretty tables. Provides a lot of tools to get the cell formatting right (positional alignment, alignment on specific characters and limiting of cell width)..
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

Add support for column groups #20

Closed Xitian9 closed 1 year ago

Xitian9 commented 2 years ago

The venerable tabular has support for column groups, which allows you to have different separators between different columns. We use this feature in hledger to give more or less clear separation between different kinds of columns. For example:

    || memtest 1 | memtest 2 ||  time test  | time test 2
====++===========+===========++=============+============
A 1 ||       hog |  terrible ||        slow |      slower
A 2 ||       pig |   not bad ||        fast |     slowest
----++-----------+-----------++-------------+------------
B 1 ||      good |     awful || intolerable |    bearable
B 2 ||    better | no chance ||    crawling |     amazing
B 3 ||       meh |   well... ||  worst ever |          ok

Adding support for this would involve a bit of a rethink to how TableStyles works. Currently it works by specifying the character to use for each role (e.g. Header top left corner, group vertical separator, header group separator). Supporting this would require different answers for different kinds of group separators. I can think of two ways to do this:

  1. Define a smallish number of acceptable kinds of column separators (tabular uses NoLine, SingleLine, and DoubleLine) and give the character for each of them. Pros: Complete. Cons: Verbose.
  2. Define a function which returns the join to use for different vertical/horizontal intersections (e.g. joint '-' '|' is Just '+'). It would also need to return the joins in different contexts (in the header, at the top, in the middle, etc.). Pros: More flexible, cons: can create incomplete table styles.

An even better solution than tabular's column groups would be to allow column groupings with joint headers. For example:

    || memtest columns       ||               
----++-----------+-----------||-------------+------------
    || memtest 1 | memtest 2 ||  time test  | time test 2
====++===========+===========++=============+============
A 1 ||       hog |  terrible ||        slow |      slower
A 2 ||       pig |   not bad ||        fast |     slowest
----++-----------+-----------++-------------+------------
B 1 ||      good |     awful || intolerable |    bearable
B 2 ||    better | no chance ||    crawling |     amazing
B 3 ||       meh |   well... ||  worst ever |          ok
Xitian9 commented 2 years ago

I see two issues/things to be done to move in this direction.

The type of HeaderSpec would need to be changed. It is currently

data HeaderSpec
    = NoneHS 
    | HeaderHS [HeaderColSpec] [String]

For the more basic solution above it could just be (modulo concerns with NoneHS being allowed as more than just a top-level value).

data HeaderSpec
    = NoneHS
    | Group DividerStyle [HeaderSpec]
    | HeaderHS HeaderColSpec String

The more complicated solution would be something like

data HeaderSpec
    = NoneHS
    | PhantomGroup DividerStyle [HeaderSpec]
    | LabelledGroup DividerStyle HeaderColSpec String [HeaderSpec]
    | HeaderHS HeaderColSpec String

The next issue would be making sure that HeaderSpec and ColSpec are compatible. Currently this is just a matter of making sure both lists have the same length, but under either proposed system you would instead want ColSpec to be a list of the same length as the leaves of HeaderSpec. This is more complicated, but perhaps not unreasonable. We could avoid it by including a ColSpec with the constructor HeaderHS (and possibly also NoneHS under some systems).

muesli4 commented 2 years ago

An even better solution than tabular's column groups would be to allow column groupings with joint headers.

I like that idea a lot. Although that adds extra logic on top for generating headers. It would certainly be an interesting task.

Adding support for this would involve a bit of a rethink to how TableStyles works.

We could give a function as an argument that modifies the table style accordingly and provide some predefined function (e.g., doubleVerticalSeperators).

The more complicated solution would be something like

Could you give an example of how you would model the table from your first comment with it in code? As far as I understand it, the separators are separating the groups themselves. Thus, the separator would be a property of the constructor holding a list of groups (i.e. Concat or whatever you want to call it).

The next issue would be making sure that HeaderSpec and ColSpec are compatible.

What do you mean by compatible? As far as the length of the lists is concerned, this makes sense to me. I did not spend any time thinking about the details but I would assume you have to flatten the tree anyway to extract the HeaderColSpec and the separators. Then you can still have a mismatch with the data rows.

At some point I had the idea that both the rows and the specifications share the same container. Then you can enforce the same length on the type level. For example, for a fixed width table you can use a vector type that encodes the length with Peano numbers. But in the end the time investment was not worth it for me. However, introducing a tree-like structure probably makes this a little bit more complex. I guess you could encode the length of a tree in its type as well and then provide a type class that converts to the dynamic representation. See also #13 which is somewhat related. But I think we can ignore this.

Xitian9 commented 2 years ago

Adding support for this would involve a bit of a rethink to how TableStyles works.

We could give a function as an argument that modifies the table style accordingly and provide some predefined function (e.g., doubleVerticalSeperators).

Could you give an example of this? I'm not sure I understand.

Could you give an example of how you would model the table from your first comment with it in code? As far as I understand it, the separators are separating the groups themselves. Thus, the separator would be a property of the constructor holding a list of groups (i.e. Concat or whatever you want to call it).

For this table:

    || memtest columns       ||               
----++-----------+-----------++-------------+------------
    || memtest 1 | memtest 2 ||  time test  | time test 2
====++===========+===========++=============+============
A 1 ||       hog |  terrible ||        slow |      slower
A 2 ||       pig |   not bad ||        fast |     slowest
----++-----------+-----------++-------------+------------
B 1 ||      good |     awful || intolerable |    bearable
B 2 ||    better | no chance ||    crawling |     amazing
B 3 ||       meh |   well... ||  worst ever |          ok

I would model the header this way, for some choice of HeaderColSpecs:

PhantomGroup DoubleLine
  [ NoneHS
  , LabelledGroup SingleLine hcspec1 "memtest columns"
      [ HeaderHS hcspec2 "memtest 1"
      , HeaderHS hcspec3 "memtest 2"
      ]
  , PhantomGroup SingleLine
      [ HeaderHS hcspec4 "timetest 1"
      , HeaderHS hcspec5 "timetest 2"
      ]
  ]

The next issue would be making sure that HeaderSpec and ColSpec are compatible.

What do you mean by compatible? As far as the length of the lists is concerned, this makes sense to me. I did not spend any time thinking about the details but I would assume you have to flatten the tree anyway to extract the HeaderColSpec and the separators. Then you can still have a mismatch with the data rows.

Yes, good point. Probably there's nothing that needs to be done here.

At some point I had the idea that both the rows and the specifications share the same container. Then you can enforce the same length on the type level. For example, for a fixed width table you can use a vector type that encodes the length with Peano numbers. But in the end the time investment was not worth it for me. However, introducing a tree-like structure probably makes this a little bit more complex. I guess you could encode the length of a tree in its type as well and then provide a type class that converts to the dynamic representation. See also #13 which is somewhat related. But I think we can ignore this.

Agreed. I think this falls into ‘maybe useful, but not necessary and will slow us down’.

Xitian9 commented 2 years ago

This may be what you already had in mind, but how about this?

(Names chosen for clarity of example, not because they're best)

data TopBottom a = TopBorder | BottomBorder | MiddleTB a
    deriving (Functor)

data LeftRight a = LeftBorder | RightBorder | MiddleLR a
    deriving (Functor)

data TableStyle a = TableStyle
   { horizontal   :: TopBottom a -> String
   , vertical     :: LeftRight a -> String
   , intersection :: TopBottom a -> LeftRight a -> String
   }

instance Contravariant (TableStyle a) where
    contramap f (TableStyle h v i) = TableStyle (h . fmap f) (v . fmap f) (\a b -> i (fmap f a) (fmap f b))

This would not quite capture the difference between headerL/groupL etc. (further work to add this in?), but otherwise all the existing TableStyles could be re-encoded this way. It would be extensible, people could define their own categories, and would have a convenient way to map their custom styles into the standard styles using contramap :: (a -> b) -> TableStyle b -> TableStyle a.

Edited to add Further implementation thoughts.

Xitian9 commented 2 years ago

And here's a somewhat expanded example of implementing asciiRoundS in the above way.

data AsciiLines = NoLine | SingleLine | DoubleLine

asciiHorizontal NoLine     = ""
asciiHorizontal SingleLine = "-"
asciiHorizontal DoubleLine = "="

roundTopBottom TopBorder    = "-"
roundTopBottom BottomBorder = "-"
roundTopBottom (MiddleTB a) = asciiHorizontal a

asciiVertical NoLine     = ""
asciiVertical SingleLine = "|"
asciiVertical DoubleLine = "||"

roundLeftRight LeftBorder   = "|"
roundLeftRight RightBorder  = "|"
roundLeftRight (MiddleLR a) = asciiVertical a

roundIntersection NoLine     _      = ""
roundIntersection _          NoLine = ""
roundIntersection SingleLine SingleLine = "+"
roundIntersection SingleLine DoubleLine = "++"
roundIntersection DoubleLine SingleLine = ":"
roundIntersection DoubleLine DoubleLine = "::"

roundFullIntersections (MiddleTB a)          (MiddleLR b)          = roundIntersections a b
roundFullIntersections (MiddleTB NoLine)     _                     = ""
roundFullIntersections _                     (MiddleLR NoLine)     = ""
roundFullIntersections TopBorder             (MiddleLR DoubleLine) = ".."
roundFullIntersections TopBorder             _                     = "."
roundFullIntersections BottomBorder          (MiddleLR DoubleLine) = "''"
roundFullIntersections BottomBorder          _                     = "'"
roundFullIntersections (MiddleTB SingleLine) _                     = "."
roundFullIntersections (MiddleTB DoubleLine) _                     = ":"

asciiRoundS :: TableStyle
asciiRoundS = TableStyle
    { horizontal   = roundTopBottom
    , vertical     = roundLeftRight
    , intersection = roundFullIntersections
    }

This may look a little bit more verbose than the current definition of asciiRoundS, but note that:

  1. This allows for double vertical lines as well as double horizontal lines.
  2. Most of this can be reused to define asciiS and asciiDoubleS with only a couple more lines.
  3. You can easily use contramap to define semantically-named styles.

As an example of the last point, consider:

data ContentDivisions
    = PrimaryDivisions
    | SecondaryDivisions
    | TertiaryDivisions

divisionLines PrimaryDivisions   = DoubleLine
divisionLines SecondaryDivisions = SingleLine
divisionLines TertiaryDivisions  = SingleLine

myStyle :: TableStyle ContentDivisions
myStyle = contramap divisionLines asciiRoundS
Xitian9 commented 2 years ago

It's debatable whether this is an advantage, but an illustration of the power of this approach is that we can be fully general about unicode box-drawing characters.

data UnicodeLines
    = NoLine
    | SingleLine
    | HeavyLine
    | DoubleLine
    | DashLine
    | HeavyDashLine
    | Dash4Line
    | HeavyDash4Line
    | Dash2Line
    | HeavyDash2Line

Now for the low low price of us manually writing a case match of size 102 (can be cut significantly with some clever thought, I'm sure), people can use any of the box drawing characters in their tables.

Xitian9 commented 2 years ago

I've implemented some of these ideas in #27. This does not yet allow different column groups, but it sets the stage by allowing tables to handle different kinds of vertical (and horizontal) separators. This now allows for different column and row groups within tables. See the sample styles!

It also allows for easy generation of TableStyles which can handle any of the 10 different line styles listed above. There are still some kinks to work out, but I think this is a reasonable start.

muesli4 commented 2 years ago

@Xitian9, this will probably take a bit of time to review. I am not sure I am convinced of all the ideas but then again I don't have any opinion yet. You do not seem to have a shortage of time. ;)

Xitian9 commented 2 years ago

That's fine. The PR was created in a somewhat exploratory fashion, since I was still ironing out certain details of the specification as I went along. If it looks like something is introduced and then eliminated later, it probably is. I can clean this up later if it is desired, but I didn't want to go overboard polishing something that might be thrown away.

muesli4 commented 2 years ago

The changes from #27 have been merged. I just wanted to say thank you again to @Xitian9 for the great work. Overall, I am very happy with how it turned out and I think it brings a lot of value to users.

Work that still needs to be done:

There also seems to be a bug with row headers:

> putStrLn $  tableString [def, def] unicodeS noneH (titlesH ["c", "d"]) [rowG ["1", "2"]]
┌───┬───┐
│ c │ d │
╞═══╪═══╡
│ 1 │ 2 │
└───┴───┘
> putStrLn $  tableString [def, def] unicodeS (titlesH ["a", "b"]) (titlesH ["c", "d"]) [rowG ["1", "2"]]
┌───┬───┐
│ c │ d │
╞═══╪═══╡
│ 1 │ 2 │
├───┼───┤
└───┴───┘

I am also very sorry about the delay. I am having a rough time at the moment (or more like the last one and a half years) since I have some low frequency noise issues with my apartment and sometimes I barely sleep 3 hours.

Xitian9 commented 2 years ago

The changes from #27 have been merged. I just wanted to say thank you again to @Xitian9 for the great work. Overall, I am very happy with how it turned out and I think it brings a lot of value to users.

I'm glad you're happy. I've felt a bit guilty making so much work for you.

  • Limit interaction of HeaderColSpec in Pandoc tables. (In progress.) For this I was planning to use Default for separators. Were there any issues with those? (Or was the perceived reluctance to use Default based on the author not publishing it on stackage?)

That sounds fine to me. I've added a small PR (#26) which makes defaults a bit more concrete, which I think addresses most of the problems with them (mostly being too general and confusing type inference). It also adds a default instance for LineStyle: I'm not sure if that would solve the issue you describe.

  • Allowing titles in row headers when this feature is not implemented leads to unexpected behavior. My suggestion would be to limit the cell type for row headers to some dummy type that has only one value. For example: newtype EmptyCell = EmptyCell () and a Default and Cell instance.

That sounds reasonable.

There also seems to be a bug with row headers:

> putStrLn $  tableString [def, def] unicodeS noneH (titlesH ["c", "d"]) [rowG ["1", "2"]]
┌───┬───┐
│ c │ d │
╞═══╪═══╡
│ 1 │ 2 │
└───┴───┘
> putStrLn $  tableString [def, def] unicodeS (titlesH ["a", "b"]) (titlesH ["c", "d"]) [rowG ["1", "2"]]
┌───┬───┐
│ c │ d │
╞═══╪═══╡
│ 1 │ 2 │
├───┼───┤
└───┴───┘

I agree that this is somewhat counter-intuitive, but it is consistent with how extra column headers are handled when there is not enough data:

> putStrLn $  tableString [def, def] unicodeS (titlesH ["a", "b"]) (titlesH ["c", "d", "e"]) [rowG ["1", "2"]]
┌───┬───┬──┐
│ c │ d │  │
╞═══╪═══╪══╡
│ 1 │ 2 │  │
├───┼───┼──┤
└───┴───┴──┘ 

This may or not be related, but there is some confusion going on regarding the overlap of RowGroup and the new row header infrastructure. I think part of the issue is that RowGroup is basically a simple version of row groupings, where the internal separator is NoLine and the external separator is SingleLine. In order to preserve backwards compatibility I left the RowGroup stuff in there, but now the same thing can now be done in two different ways. It's not clear if we should leave this open for backwards compatibility or simplify things by eliminating RowGroup, lettings rows be treated the same way as columns.

> putStrLn $  tableString [def, def] unicodeS (titlesH ["a"]) (titlesH ["c", "d"]) [rowsG [["1", "2"], ["3", "4"]]]
┌───┬───┐
│ c │ d │
╞═══╪═══╡
│ 1 │ 2 │
│ 3 │ 4 │
└───┴───┘
> putStrLn $  tableString [def, def] unicodeS (fullSepH NoLine (repeat def)  ["a", "b"]) (titlesH ["c", "d"]) [rowG ["1", "2"], rowG ["3", "4"]]
┌───┬───┐
│ c │ d │
╞═══╪═══╡
│ 1 │ 2 │
│ 3 │ 4 │
└───┴───┘

I am also very sorry about the delay. I am having a rough time at the moment (or more like the last one and a half years) since I have some low frequency noise issues with my apartment and sometimes I barely sleep 3 hours.

Don't worry at all about the delay. Thank you for all the work you've put into this.

The noise issues sound terrible, I hope there's some hope of it clearing up soon.