quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.75k stars 306 forks source link

PDF output of tables using `multirow` too wide #9046

Open rmnldwg opened 6 months ago

rmnldwg commented 6 months ago

Bug description

When using pandas to create tables with a multi-index, Quarto uses the \multirow LaTeX command to display that in the PDF. However, the resulting columns are far too wide.

I think the reason is that the LaTeX code contains multirow commands that look like this: \multirow{2}{=}{foo}, because when I use \multirow{2}*{foo}, then the table looks nice.

Steps to reproduce

With an installation of Python 3.10 and pandas 2.2.1, compile the below document to PDF via quarto render test.qmd --to pdf. The command runs without any errors or warnings.

---
title: "Test Title"
authors:
  - name: Testing Tester
format:
  html:
    code-fold: true
  pdf:
    keep-tex: true
    include-in-header:
    - text: |
        \usepackage{multirow}
jupyter: python3
execute: 
  cache: true
---

# Table Test

```{python}
#| echo: false
#| label: tbl-contra
#| tbl-cap: This table is far too wide for its content.

import pandas as pd
import numpy as np

data = pd.DataFrame({
  'a': np.random.uniform(size=8),
  'b': 100 * np.random.uniform(size=8),
  'c': np.random.randint(low=0, high=1000, size=8),
})
idx = pd.MultiIndex.from_product([['x', 'y'], ['foo', 'bar'], ['1', '2']])
data.index = idx

data.style \
  .format(precision=2)

In my case, because I set `keep-tex: true`, I get an intermediate LaTeX file, which looks like this:

```tex
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
\PassOptionsToPackage{dvipsnames,svgnames,x11names}{xcolor}
%
\documentclass[
  letterpaper,
  DIV=11,
  numbers=noendperiod]{scrartcl}

\usepackage{amsmath,amssymb}
\usepackage{iftex}
\ifPDFTeX
  \usepackage[T1]{fontenc}
  \usepackage[utf8]{inputenc}
  \usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
  \usepackage{unicode-math}
  \defaultfontfeatures{Scale=MatchLowercase}
  \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
\usepackage{lmodern}
\ifPDFTeX\else  
    % xetex/luatex font selection
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
  \usepackage[]{microtype}
  \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
  \IfFileExists{parskip.sty}{%
    \usepackage{parskip}
  }{% else
    \setlength{\parindent}{0pt}
    \setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
  \KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{-\maxdimen} % remove section numbering
% Make \paragraph and \subparagraph free-standing
\ifx\paragraph\undefined\else
  \let\oldparagraph\paragraph
  \renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
  \let\oldsubparagraph\subparagraph
  \renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi

\providecommand{\tightlist}{%
  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother

\usepackage{multirow}
\KOMAoption{captions}{tableheading}
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\AtBeginDocument{%
\ifdefined\contentsname
  \renewcommand*\contentsname{Table of contents}
\else
  \newcommand\contentsname{Table of contents}
\fi
\ifdefined\listfigurename
  \renewcommand*\listfigurename{List of Figures}
\else
  \newcommand\listfigurename{List of Figures}
\fi
\ifdefined\listtablename
  \renewcommand*\listtablename{List of Tables}
\else
  \newcommand\listtablename{List of Tables}
\fi
\ifdefined\figurename
  \renewcommand*\figurename{Figure}
\else
  \newcommand\figurename{Figure}
\fi
\ifdefined\tablename
  \renewcommand*\tablename{Table}
\else
  \newcommand\tablename{Table}
\fi
}
\@ifpackageloaded{float}{}{\usepackage{float}}
\floatstyle{ruled}
\@ifundefined{c@chapter}{\newfloat{codelisting}{h}{lop}}{\newfloat{codelisting}{h}{lop}[chapter]}
\floatname{codelisting}{Listing}
\newcommand*\listoflistings{\listof{codelisting}{List of Listings}}
\makeatother
\makeatletter
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\@ifpackageloaded{subcaption}{}{\usepackage{subcaption}}
\makeatother
\ifLuaTeX
  \usepackage{selnolig}  % disable illegal ligatures
\fi
\usepackage{bookmark}

\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\urlstyle{same} % disable monospaced font for URLs
\hypersetup{
  pdftitle={Test Title},
  colorlinks=true,
  linkcolor={blue},
  filecolor={Maroon},
  citecolor={Blue},
  urlcolor={Blue},
  pdfcreator={LaTeX via pandoc}}

\title{Test Title}
\author{Testing Tester}
\date{}

\begin{document}
\maketitle

\section{Table Test}\label{table-test}

\begin{longtable}[]{@{}llllll@{}}

\caption{\label{tbl-contra}This table is far too wide for its content.}

\tabularnewline

\caption{}\label{T_331e1}\tabularnewline
\toprule\noalign{}
~ & ~ & ~ & a & b & c \\
\midrule\noalign{}
\endfirsthead
\toprule\noalign{}
~ & ~ & ~ & a & b & c \\
\midrule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
\multirow{4}{=}{x} & \multirow{2}{=}{foo} & 1 & 0.50 & 39.44 & 817 \\
& & 2 & 0.44 & 34.31 & 655 \\
& \multirow{2}{=}{bar} & 1 & 0.47 & 72.27 & 454 \\
& & 2 & 0.40 & 87.38 & 332 \\
\multirow{4}{=}{y} & \multirow{2}{=}{foo} & 1 & 0.24 & 28.72 & 254 \\
& & 2 & 0.20 & 89.23 & 599 \\
& \multirow{2}{=}{bar} & 1 & 0.03 & 26.64 & 619 \\
& & 2 & 0.52 & 10.13 & 323 \\

\end{longtable}

\end{document}

Expected behavior

The table should fit on the page, as in this screenshot:

image

Actual behavior

All columns are stretched wide and the table extends beyond the right side of the page. See this screenshot:

image

Your environment

Quarto check output

Quarto 1.5.24
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.5.24
      Path: /opt/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.03
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/rmnldwg/.TinyTeX/bin/x86_64-linux
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.12
      Path: /home/rmnldwg/repos/test/.venv/bin/python3
      Jupyter: 5.7.1
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........(None)

      Unable to locate an installed version of R.
      Install R from https://cloud.r-project.org/
cderv commented 6 months ago

Thanks for the report.

I think the reason is that the LaTeX code contains multirow commands that look like this: \multirow{2}{=}{foo}, because when I use \multirow{2}*{foo}, then the table looks nice.

I don't know exactly for the specific with Quarto but regarding this generated latex, I believe this is directly Pandoc https://github.com/jgm/pandoc/blob/32280fd919ca2bdba27b5beff9182d8691eb6c1d/src/Text/Pandoc/Writers/LaTeX/Table.hs#L359C1-L364C53

They do use = only with no option for *

Here is an example with a HTML table having a rowspan attribute (like the one created with the python code above)

> quarto pandoc -t latex -f html
<table>
  <tr>
    <th>Month</th>
    <th>Savings</th>
    <th>Savings for holiday!</th>
  </tr>
  <tr>
    <td>January</td>
    <td>$100</td>
    <td rowspan="2">$50</td>
  </tr>
  <tr>
    <td>February</td>
    <td>$80</td>
  </tr>
</table>
^Z
\begin{longtable}[]{@{}lll@{}}
\toprule\noalign{}
Month & Savings & Savings for holiday! \\
\midrule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
January & \$100 & \multirow{2}{=}{\$50} \\
February & \$80 \\
\end{longtable}

It creates in Latex a \multirow{2}{=}{\$50 with the =

Using this in a document will also make the table spread along...

---
title: "Test Title"
format:
  pdf:
    keep-tex: true
keep-md: true
---

```{=html}
<table>
  <tr>
    <th>Month</th>
    <th>Savings</th>
    <th>Savings for holiday!</th>
  </tr>
  <tr>
    <td>January</td>
    <td>$100</td>
    <td rowspan="2">$50</td>
  </tr>
  <tr>
    <td>February</td>
    <td>$80</td>
  </tr>
</table>

![image](https://github.com/quarto-dev/quarto-cli/assets/6791940/f6e52a6a-4bb8-4f09-a74d-1947a26f8c58)

I think they uses `=` to get the column width  (from https://texlive.mycozy.space/macros/latex/contrib/multirow/multirow.pdf)
> (width) is the width to which the text is to be set. Special values are * to indicate that the text parameter's natural width is to be used, and = to indicate that the specified width of the column in which the \multirow entry is set should be used.

So this is probably something related to how Pandoc convert to LaTeX or something due to our own processing... 

- If the former, then this should be a report in Pandoc so that they allow `*` in some case. 
- If the latter, not sure what yet 🤔 

It seems to me there could be some problem in column width handling in pandoc for multiline table when `ColWidthDefault` is used with the writer. 
JBroihan commented 3 weeks ago

I am facing the same issue and would appreciate a fix or workaround.