html-to-text / node-html-to-text

Advanced html to text converter
Other
1.59k stars 224 forks source link

Should a table outputted using the dataTable formatter be longer than the wordwrap limit? #251

Open scorgn opened 2 years ago

scorgn commented 2 years ago

I am setting in the config wordwrap: 80 (just as an example) so that all of the output is wrapped at 80 characters. It seems to work beautifully, except when it comes to dataTable formatted tables. The tables seem to be however long they need to be, but then will wrap each individual cell after the cell length reaches 80 characters.

Looking at the documentation, technically that is what it says about the dataTables maxColumnWidth config.

maxColumnWidth: Data table cell content will be wrapped to fit this width instead of global wordwrap limit. Set this to undefined in order to fall back to wordwrap limit.

This appears to directly say that the max column width will default to the global wordwrap limit. Each column individually won't be longer than the wordwrap limit, but all of the columns side by side will. That also seems to be how it works. But, it seems so counterintuitive that I thought it would be worth asking to see if I'm doing something or understanding something wrong.

I can't imagine a scenario where you would want all output lines to be wrapped at 80 characters except for those lines that are part of a table. Is that how the dataTable formatter is supposed to work?

KillyMXI commented 2 years ago

Sorry for the late reply.

Yes, you got it right. Existing behavior is motivated more by the ease of implementation than by any specific usage scenario.

Trying to fit an arbitrary table into an overall width limit will require more complicated logic. In the current implementation the width of each column is computed independently in a single pass (and it is designed to work with colspan/rowspan nicely). Adding an overall budget will introduce an optimization problem - which columns should be shrinked when the available width is not enough, how much any column can be shrinked...

I have plans for other dataTable formatter improvements. It may or may not make it easier to approach this one later. I would love to make it possible to stretch smaller tables to a certain width as well. I think I will have to rewrite the table formatting code couple times before I can get through this. I still have higher priority goals, so no eta.

scorgn commented 2 years ago

Ah okay I see, that makes sense. I am using this for formatting emails for NeoMutt and a lot of emails use tables to align things a certain way. I created a JavaScript formatter that will format the table in the same way that the browser does (as described here). I tested it and validated that the output is the same as it is in Chrome for the same table when I use a monospace font and set the width of the table in ch (eg. 100 characters wide).

I took a stab at it from an email viewpoint based on a few emails I found. These are some things that I took into consideration with my implementation.

"Invalid" tables have each cell displayed just as a normal block element.

The way I have it implemented plays nicely with colspan/rowspan (was a little bit messy when finding empty rows/columns), and would be fairly easy to add an option to add minimum table widths (stretching them) as well. If you're open to it, once you make other dataTable formatter changes I can adjust my formatter to those changes, write some tests, and open up an MR.