pandoc-ext / list-table

Write tables as list of cells and rows.
MIT License
25 stars 2 forks source link

If some cells are undefined, the filter can generate an invalid AST #2

Open wlupton opened 1 year ago

wlupton commented 1 year ago

If some cells are undefined, the filter can generate an invalid AST. For example, with this markdown:

::: list-table
* - A
  - B

* - 1
:::

...we get the following (edited to show just the offending rows; rep.lua reports the AST using the logging module; see below):

% pandoc-3.1.8 -L list-table.lua list-table-bug.md -L rep.lua -t html -o /dev/null
(#) blocks Blocks[1] {
  [1] Table {
    ...
    bodies: List[1] {
      [1] {
        ...
        body: List[1] {
          [1] pandoc.Row {
            ...
            cells: List[1] {
              [1] pandoc.Cell {
                alignment: "AlignDefault"
                attr: Attr {
                  attributes: AttributeList {}
                  classes: List {}
                  identifier: ""
                }
                col_span: 1
                contents: Blocks[1] {
                  [1] Plain {
                    content: Inlines[1] {
                      [1] Str "1"
                    }
                  }
                }
                row_span: 1
              }
            }
          }
        }
        ...

The row only contains a single cell, so this is an invalid AST and isn't handled correctly by all writers (see jgm/pandoc#9096 for further discussion).

The filter should be updated to add empty cells for any missing ones.

Here's rep.lua

local logging = require 'logging'
function Pandoc(pandoc)
    if logging.loglevel > 0 then
        logging.temp('meta', pandoc.meta)
    end
    logging.temp('blocks', pandoc.blocks)
end
wlupton commented 1 week ago

Looking at this again, I'm not sure that the filter should be responsible for adding any missing cells.

Updating the filter to do this would add considerable complexity, because it currently just copies list items to table cells without having to worry about whether any are missing and, in particular, without having to do rowspan and colspan processing.

So I propose that this should be the responsibility of the writer. If we agree on this then this issue can be closed, but #3 would still be relevant.

Any thoughts on this? @jgm? @tarleb?

tarleb commented 1 week ago

In general it seems okay for a writer not to render malformed tables, but it should only affect that table, not the whole document. At the same time, I think it would be appropriate for pandoc to provide a function that turns a malformed table into a well-formed one, and that such a function should be exported to Lua.

wlupton commented 1 day ago

Thanks @tarleb. Should I create a jgm/pandoc issue for this, or does such an issue already exist?

I propose making this issue depend on the above jgm/pandoc issue.

jgm commented 1 day ago

I thought the existing table builder in Text.Pandoc.Builder already did something like this?

wlupton commented 1 day ago

In this case the list-table lua filter created the Table object directly. Does this bypass Text.Pandoc.Builder?

tarleb commented 1 day ago

I thought the existing table builder in Text.Pandoc.Builder already did something like this?

I forgot about that. The filters bypass the builder functions, but that would offer a good way to normalize the AST. I'll look into it.

tarleb commented 1 day ago

Please do open a new issue

wlupton commented 20 hours ago

I've created jgm/pandoc#10356.

bpj commented 15 hours ago

FWIW my (no longer working) list-table filter did go out of its way to make sure that all rows were the same length simply by padding short rows with empty cells towards the end. It contained code like

-- NB at this stage rows[1] is the header row
local col_count = 0
-- First loop: get column count
for _, row in ipairs(rows) do
  if #row > col_count then
    col_count = #row
  end
end
-- Second loop: adjust row lengths
for _, row in ipairs(rows) do
  while #row < col_count do
    row[#row + 1] = { }
  end
end
-- Adjust widths list
while #widths < col_count do
  widths[#widths + 1] = 0.0
end
-- Adjust the alignments list
-- by simply copying the last explicit alignment!
if 0 == #aligns then
  aligns[1] = 'AlignDefault'
end
while #aligns < col_count do
  aligns[#aligns + 1] = aligns[#aligns]
end

(This wasn’t the actual code since my filter was written in MoonScript, but it compiled to Lua code equivalent to the above.)

My filter generated a SimpleTable, so things like colspan weren’t any concern.

It did require two loops over the rows list but neither loop is very complicated.

Instead of setting missing alignments to AlignDefault — unless there were no alignments — I copied the last alignment actually present. The rationale for that is that often all colums have the same aligment or the stub (leftmost) column is left aligned and all the rest are right or center aligned. The div around the list (which of course contained the class telling the filter that this was a list-table, but also some other formatting attributes) took an attribute aligns. In the common case where all cols had the same alignment you could just say aligns=c and in the second commonest case you could say aligns="lc" rather than aligns="ccccc" or aligns="lcccc". Of course there was a mapping with fields like d = 'AlignDefault', somewhere.

Nowadays I would use an re pattern {| ( %s* [dlcr] -> align_for %s* )* !. |} where align_for is the above-mentioned mapping, to parse the attribute value and create the list variable in one go, and similarly for the widths {| ( %s* %d+ -> pcnt2dec %s* )* !. |} where

pcnt2dec = function(p)
  return tonumber(p / 100)
end

(since I prefer using percentages for the widths!)