jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.89k stars 3.34k forks source link

org-mode tables overflow the page #6669

Closed jeromenerf closed 3 years ago

jeromenerf commented 4 years ago

Hello pandoc!

Problem

I use an org-mode feature (capturing column view) that generates a table automatically, using the sections title and calculating sums and such from properties. it's a regular org pipe table.

Input

| Task                                                                                                                                          | TODO | Foo | Bar |  Boz | Sarge |
|-----------------------------------------------------------------------------------------------------------------------------------------------+------+-----+-----+------+-------|
| This is a long line, actually one of the document's titles, usually 60-100 characters, but it can also be much longer, say 200-300 characters | TODO |  10 |   5 | 1000 |   500 |
| It's auto generated by org mode                                                                                                               | TODO |  10 |   5 | 1000 |   500 |
| This is a long line, actually one of the document's titles, usually 60-100 characters                                                         | TODO |  10 |   5 | 1000 |   500 |
| This is a long line, actually one of the document's titles, usually 60-100 characters                                                         | TODO |  10 |   5 | 1000 |   500 |
| This is a long line, actually one of the document's titles, usually 60-100 characters                                                         | TODO |  10 |   5 | 1000 |   500 |
| This is a long line, actually one of the document's titles, usually 60-100 characters                                                         | TODO |  10 |   5 | 1000 |   500 |

Emacs auto-generates the formatting of the table. It doesn't support multiple lines for a cell, so the source table can be pretty long.

Build command

using a fresh cabal build (2.10.1) or the debian stable version:

~/.cabal/bin/pandoc -f org -o test.pdf test.org

NB: I have tried many variants, including --columns=10|...|400 to no avail.

Output

The table overflows on the right. Whatever build command I use, I can never get the columns content to wrap.

screenshot-2020-09-08-11:39:05

Expected output:

Something similar to the HTML table output: the cells content wraps if necessary to make the table fit the page's width:

screenshot-2020-09-08-11:49:22

Tests

I have read the manual, regarding --columns and the pipe arrays, but still can't figure how to solve my problem. It seems nothing changes the final output, it always overflows.

I don't want to tweak the auto-generated table itself if I can, I would much prefer altering the build command or the template.

Org mode allows to add some metadata before and after the table, as documented here: https://orgmode.org/manual/Capturing-column-view.html#Capturing-column-view, but they have no effect as far as I can tell.

As of now, I have tried many combinations, without any improvements:

  1. passing -columns=40 (upto 400)
  2. editing the number of - for the first column
  3. using different pdf-engines
  4. using different templates

Thanks for all the good work with pandoc!

Delanii commented 4 years ago

Default LaTeX template uses longtable package, if I am not mistaken. Did you tried to generate only .tex file and compile it multiple times? Other option, as I am scouring through the docs, would be to specify either multiline_tables or grid_tables, have you tried that? Otherwise, you would have to change latex template to use p{width} columns for your table. In that, package longtable or tabularx docs can be of use here and here.

tarleb commented 4 years ago

The use of longtable for tables is hard-wired into pandoc. It currently cannot be changed.

This problem should go away if all columns were given a relative width: pandoc wrapps cells in minipage environments if a column width has been defined. Unfortunately, that is not possible with org (as far as I know, I'd love to be corrected there). Using a filter might work.

Is there a way to make this work when exporting via Emacs? I tried it, but got a page-overflowing table there as well.

jeromenerf commented 4 years ago

@Delanii thanks, but both extensions are not supported for org.

@tarleb thanks for your input, I was starting to think I was missing something obvious.

jgm commented 4 years ago

As @tarleb says, this is a "simple table" which doesn't encode relative width information. One solution could be to use a Lua filter to add relative widths to simple tables.

jeromenerf commented 4 years ago

OK, so from the doc, and without prior lua experience, a dumb working filter is:

function Table(elem)
    local colnum = #elem.widths
    for i = 1, colnum, 1
    do
        elem.widths[i]=1/colnum
    end
    return elem
end

There is quite some work to get it right. Maybe rank by size, then weight by triangle number...

jeromenerf commented 4 years ago

@jgm @tarleb I went a little further, trying to weight the sizes of columns according to the length of the first row cells contents, but the AST is pretty deep and I am getting lost in the weeds.

Could you give me a hint at getting a cell content from the Table.rows?

I can use either debian (2.2.1) or the latest cabal (2.10.1) pandoc version, which seem quite different.

function Table(elem)
    local ncols = #elem.widths
    local lengths = {}

    for i = 1, ncols, 1 do
        -- FIXME: get the cell content length
        lengths[i] = {i, math.random(1,100)}
     end

    table.sort(lengths, function(l, r)
        return l[2] < r[2]
    end)

    for rank, col in pairs(lengths) do
        elem.widths[col[1]]=rank/triangle(ncols)
    end
    return elem
end

local function triangle(i)
    return i*(i+1)/2
end

I guess the best way to deal with all this would be to use the "--columns" switch to behave like markdown pipe tables.

tarleb commented 4 years ago

You are running into pandoc/lua-filters#109. Sorry about that. I'd recommend to stick with 2.2.1 for now (or any pandoc version before 2.10); the old table format is much simpler and easier to handle.

jeromenerf commented 4 years ago

@tarleb thanks for the advice. I could not find examples anywhere dealing with tables to extract the information I need. So I went on iterating over the list of lists of cells only to find entries per word ... I guess I am not using the AST how it's supposed to.

Is there an obvious way, a higher level method I am supposed to call (I have seen .content, .text, .map ...) I am missing?

tarleb commented 4 years ago

You are probably looking for pandoc.utils.stringify.

tarleb commented 4 years ago

Closing, because I believe that there is nothing we could (easily) change in pandoc to fix this. But we are still here to help. Either post in this issue, or (preferably) on the pandoc-discuss mailing list.

jeromenerf commented 3 years ago

@tarleb I believe the --columns behavior that is implemented for markdown pipe table should be implemented for org in the long run.

Meanwhile, maybe update the documentation to make in clear that it is not.

In the short term, thanks to your advice, I got a lua filter working, that emulates the above mentioned behavior:

require 'pandoc.utils'
require 'math'

local columns = (PANDOC_READER_OPTIONS.columns or 30)

function Table(elem)
    local ncols = #elem.widths
    local lengths = {}
    local tl = 0

    for i = 1, ncols, 1 do
        local lr = #pandoc.utils.stringify(elem.rows[1][i])
        local lh = #pandoc.utils.stringify(elem.headers[i])
        local l = math.max(lr, lh, 1)
        if l > columns then l = columns end
        tl = tl + l
        lengths[i] = l
     end

    for i = 1, ncols, 1 do
        elem.widths[i]=lengths[i]/tl
    end

    return elem
end

that can be used with or without --columns : pandoc --columns=20 --lua-filter=org-table-width-fix.lua -f org -t latex -o test.pdf test.org

If you use ox-pandoc to export your org files, you could then setup emacs with:

  (use-package ox-pandoc
    :init
    (setq org-pandoc-options-for-latex-pdf '(
                                             (columns . 25)
                                             (lua-filter . "/path/to/org-table-width-fix.lua")
                                             (template . "/path/to/default.latex")
                                             ))
  )

or eventually add a header to your org file when needed:

#+PANDOC_OPTIONS: lua-filter:/path/to/filter.lua
#+PANDOC_OPTIONS: columns:15

o/

tarleb commented 3 years ago

A paragraph in docs/org.md would be a good idea. Reopening, so we won't forget. Would you be ok if I included your code there?

I believe there might be a misunderstanding about --columns. Citing the manual (emphasis mine):

--columns=NUMBER

Specify length of lines in characters. This affects text wrapping in the generated source code (see --wrap). It also affects calculation of column widths for plain text tables (see [Tables] below).

It has no effect on PDF output.The misunderstanding was actually on my part.

jgm commented 3 years ago

To clarify, --columns can have an effect on PDF output, e.g. if you have a markdown pipe table or grid table or multiline table (for which relative column widths are calculated). The issue here is that relative column widths are not calculated for the org tables, so --columns is irrelevant.

tarleb commented 3 years ago

Oh, I didn't know! I misunderstood the manual then.

jeromenerf commented 3 years ago

@tarleb @jgm thanks. You can include this snippet if it's decent enough for you. I just picked up lua and pandoc AST/libs, so it looks a bit rough to say the least.

Here's a more valuable variant that uses the "average length" per column instead of the first row values.

 require 'pandoc.utils'
require 'math'

local columns = (PANDOC_READER_OPTIONS.columns or 30)

print(columns)

function Table(elem)
    local ncols = #elem.widths
    local lengths = {}
    local tl = 0

    for i = 1, ncols, 1 do
        local lr = rowavgl(elem.rows,i)
        local lh = #pandoc.utils.stringify(elem.headers[i])
        local l = math.max(lr, lh)
        if l > columns then l = columns end
        tl = tl + l
        lengths[i] = l
     end

    for i = 1, ncols, 1 do
        elem.widths[i]=lengths[i]/tl
        print(lengths[i], elem.widths[i])
    end

    return elem
end

function rowavgl(rows, i)
    local tl = 0
    local avgl = 0
    for r=1,#rows,1 do
    tl = tl + #pandoc.utils.stringify(rows[r][i])
    end
    return tl/#rows
end
tarleb commented 3 years ago

As it happens, I had implemented a way to specify column widths in org mode some time ago: the reader respects columns widths. This is actually not what we should be doing, as we aim for full Emacs compatibility, and Emacs only uses that info when rendering the org file. Nobody complained about it yet (and I had forgotten about it), so we can probably leave it in. It should be documented though.

jeromenerf commented 3 years ago

@tarleb this is good to know indeed. However in my case, the table is generated by org mode itself, so I can't add a <n> tag to fix the column width. When using org mode for interactive programming (think jupyter notebook, literate programming, etc) and technical documents, most tables are auto generated, via babel.

tarleb commented 3 years ago

I added a short section on table handling to doc/org.md in 29baaa2ac.