Open tuetenk0pp opened 2 days ago
@tuetenk0pp - can you include a reproducible example. We cannot reproduce your example without your input.xlsx
file. It would be preferable to have your example not depend on external files at all, e.g.
df = pd.DataFrame(...)
@rhshadrach, here is a more in depth example of what is happening:
>>> import pandas as pd
>>> d = {'location': ['Supermarket', 'Supermarket', 'Supermarket', 'Supermarket', 'Drugstore', 'Drugstore', 'Drugstore', 'Drugstore', 'Farmers Market', 'Farmers Market', 'Farmers Market', 'Farmers Market'], 'category': ['Produce', 'Produce', 'Dry Goods', 'Dry Goods', 'Personal Care', 'Personal Care', 'Medicine', 'Medicine', 'Vegetables', 'Vegetables', 'Fruits', 'Fruits'], 'item': ['Apples', 'Bananas', 'Rice', 'Pasta', 'Shampoo', 'Toothpaste', 'Pain Reliever', 'Cough Sirup', 'Carrots', 'Spinach', 'Strawberrys', 'Oranges'], 'count': [6, 12, 2, 1, 1, 1, 1, 1, 1, 2, 1, 6]}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(df.columns[:3].tolist())
>>> df = df.sort_index()
The DataFrame now looks like this:
>>> df
count
location category item
Drugstore Medicine Cough Sirup 1
Pain Reliever 1
Personal Care Shampoo 1
Toothpaste 1
Farmers Market Fruits Oranges 6
Strawberrys 1
Vegetables Carrots 1
Spinach 2
Supermarket Dry Goods Pasta 1
Rice 2
Produce Apples 6
Bananas 12
Now I want to export the DataFrame to LaTeX:
>>> latex = r"""
... \documentclass[border=0.5cm]{standalone}
...
... \usepackage{array}
... \usepackage{multirow}
... \usepackage{booktabs}
...
... \begin{document}
... """
>>> column_format: str = r'p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r'
>>> latex += '\n' + df.to_latex(column_format=column_format, multirow=True) + '\n' + r'\end{document}'
The LaTeX Code now looks like this:
>>> print(latex)
\documentclass[border=0.5cm]{standalone}
\usepackage{array}
\usepackage{multirow}
\usepackage{booktabs}
\begin{document}
\begin{tabular}{p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r}
\toprule
& & & count \\
location & category & item & \\
\midrule
\multirow[t]{4}{*}{Drugstore} & \multirow[t]{2}{*}{Medicine} & Cough Sirup & 1 \\
& & Pain Reliever & 1 \\
\cline{2-4}
& \multirow[t]{2}{*}{Personal Care} & Shampoo & 1 \\
& & Toothpaste & 1 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{*}{Farmers Market} & \multirow[t]{2}{*}{Fruits} & Oranges & 6 \\
& & Strawberrys & 1 \\
\cline{2-4}
& \multirow[t]{2}{*}{Vegetables} & Carrots & 1 \\
& & Spinach & 2 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{*}{Supermarket} & \multirow[t]{2}{*}{Dry Goods} & Pasta & 1 \\
& & Rice & 2 \\
\cline{2-4}
& \multirow[t]{2}{*}{Produce} & Apples & 6 \\
& & Bananas & 12 \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}
\end{document}
Here I have a screenshot of the compiled document:
As you can see, the multicolumn
does not respect the fixed column width. The Farmers Market
and Personal Care
strings should wrap. This can be fixed with replacing the *
with =
in \multirow{<nrows>}{<width>}{<text>}
:
latex_wrap = latex.replace('{*}', '{=}')
The LaTeX Code now looks like this:
>>> print(latex_wrap)
\documentclass[border=0.5cm]{standalone}
\usepackage{array}
\usepackage{multirow}
\usepackage{booktabs}
\begin{document}
\begin{tabular}{p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r}
\toprule
& & & count \\
location & category & item & \\
\midrule
\multirow[t]{4}{=}{Drugstore} & \multirow[t]{2}{=}{Medicine} & Cough Sirup & 1 \\
& & Pain Reliever & 1 \\
\cline{2-4}
& \multirow[t]{2}{=}{Personal Care} & Shampoo & 1 \\
& & Toothpaste & 1 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{=}{Farmers Market} & \multirow[t]{2}{=}{Fruits} & Oranges & 6 \\
& & Strawberrys & 1 \\
\cline{2-4}
& \multirow[t]{2}{=}{Vegetables} & Carrots & 1 \\
& & Spinach & 2 \\
\cline{1-4} \cline{2-4}
\multirow[t]{4}{=}{Supermarket} & \multirow[t]{2}{=}{Dry Goods} & Pasta & 1 \\
& & Rice & 2 \\
\cline{2-4}
& \multirow[t]{2}{=}{Produce} & Apples & 6 \\
& & Bananas & 12 \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}
\end{document}
As you can see, the issue no longer exists in the compiled document. The Farmers Market
and Personal Care
strings now wrap into the next line.
[!NOTE] Finding
*
and replacing them with=
becomes much harder, once not every column containing multirows has a fixed width. Therefore the workaround as presented would no longer work. This is why I think this should acutally become a feature. Manipulating other portions oflatex
, e.g. replacing thetabular
environment with atabularx
environment remains simple.
Here is the summarized code for reproduction:
import pandas as pd
import subprocess
d = {'location': ['Supermarket', 'Supermarket', 'Supermarket', 'Supermarket', 'Drugstore', 'Drugstore', 'Drugstore', 'Drugstore', 'Farmers Market', 'Farmers Market', 'Farmers Market', 'Farmers Market'], 'category': ['Produce', 'Produce', 'Dry Goods', 'Dry Goods', 'Personal Care', 'Personal Care', 'Medicine', 'Medicine', 'Vegetables', 'Vegetables', 'Fruits', 'Fruits'], 'item': ['Apples', 'Bananas', 'Rice', 'Pasta', 'Shampoo', 'Toothpaste', 'Pain Reliever', 'Cough Sirup', 'Carrots', 'Spinach', 'Strawberrys', 'Oranges'], 'count': [6, 12, 2, 1, 1, 1, 1, 1, 1, 2, 1, 6]}
df = pd.DataFrame(data=d)
df = df.set_index(df.columns[:3].tolist())
df = df.sort_index()
print(df)
latex = r"""
\documentclass[border=0.5cm]{standalone}
\usepackage{array}
\usepackage{multirow}
\usepackage{booktabs}
\begin{document}
"""
column_format: str = r'p{2cm}<{\raggedright} p{2cm}<{\raggedright} p{2cm}<{\raggedright} r'
latex += '\n' + df.to_latex(column_format=column_format, multirow=True) + '\n' + r'\end{document}'
latex_flex = latex.replace('{*}', '{=}')
with open('multirow.tex', 'w') as f:
f.write(latex)
with open('multirow_flex.tex', 'w') as f:
f.write(latex_flex)
for file in ['multirow.tex', 'multirow_flex.tex']:
cmd = ['latexmk', '-lualatex', '-interaction=nonstopmode', '-file-line-error', '-silent', '-auxdir=./tmp', file]
try:
subprocess.run(cmd)
except:
print(f'could not run command: {cmd}')
Thanks for the additional information! From the OP, you have the logic:
IF columntype NOT ONE OF l, c, r
I do not understand branching on this condition, can you elaborate here?
The multirow
documentation says:
The width can also be given as = when the \multirow entry is given in a column that has a defined width, for example in a p{} column, an X column in tabularx or a L, C, R or J column in a tabulary environment. The text will be set in a \parbox of that width. If you give “=” in other situations, you will get strange results (usually a too wide column).
So this actually means that multirow
is able to adjust the width of the cell if the column has a defined width. This is generally the case for the LaTeX column types that are mentioned in the quote; so not for the standard column types l
, c
and r
although the column type S
from the siunitx
package does not define a column width as well.
Thats why I thought it would be a good idea to check the columntype (from the column_format
argument of to_latex()
) and return the \multirow ...
string accordingly.
With the added S
column type the logic would look like this:
IF columntype NOT ONE OF l, c, r, S
RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{*}}{{{display_val}}}"
ELSE
RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{=}}{{{display_val}}}"
Or a more safe approach but with no support for any other column types than mentioned in the quote above:
IF columntype ONE OF p{}, m{}, b{}, X, L, C, R, J
RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{=}}{{{display_val}}}"
ELSE
RETURN f"\\multirow[{multirow_align}]{{{rowspan}}}{{*}}{{{display_val}}}"
Feature Type
[X] Adding new functionality to pandas
[X] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
It would be nice to be able to use the
=
option with multirow. Maybe this is something that can be decided automatically according to the column type.Feature Description
Add logic to
_parse_latex_header_span()
to check for cloumn types.Alternative Solutions
One could also find/replace the return value of
pandas.DataFrame.to_latex
manually:Additional Context
https://github.com/pandas-dev/pandas/blob/156e67e1c6f32d2a794b61604a9f89d411e5e915/pandas/io/formats/style_render.py#L2451
See this snippet from the multirow documentation: