Closed toobaz closed 2 years ago
I'm really not happy with the weird syntax to squeeze LaTeX command into CSS... It gets worse, if a user wants to use siunitx
s S
columns. (I believe this is a pretty standard setup.)
Here, I made a demonstration which steps are required to format tables correctly when using siunitx
.
Basically, a cell has to be formatted the following way:
markup = protected + unprotected
protected = ["{\cellcolor{...}"}]
unprotected = ["\color{...}"] + ["\bfseries"] + ["\textit"]
It would be really annoying to get this using the proposed CSS syntax.
ExcelFormatter
has a CSSToExcelConverter
that uses a CSSResolver
to parse the markup and generate the markup required by Excel.
Styler.to_latex
should do the same.
While flexibility is great, I think it is more important to cover the common cases with ease (background-color, color, font-weight: bold, font-style: italic). (Also, it will not be very hard to extend the existing protected and unprotected formatting commands.)
@moi90 allowing Styler
to operate as CSS ('attribute', 'value') pairs or as LaTeX ('command', 'options') creates a non-duplicated and maintainable codebase that is flexible in both formats. Another advantage is that the unit tests and formatting methods are available in both versions.
You should not eliminate all of this flexibility, for the sake of covering the common cases, which I have said are already very easy to translate into the LaTeX format. See this unpublished PR (not included in my original PR because this is an extension which just complicates the PR reviewers)
So far the parameters that you have raised that have been solved are:
r
for numeric and l
for non-numeric (in .to_latex()
)r
for numeric and l
for non-numeric (in .render(latex=True)
- this is harder to implement without code duplication)--wrap
to change LaTeX \<command><ops>{<display>}
to {\<command><ops> <display>}
siunitx
(probably fairly easy to incorporate)Very much this discussion has followed the line of you challenging my development's capabilities. This has been valuable since it has driven the development of these options. However, without a suitable challenger model it is difficult for me to question how you plan to deal with some of these items you have raised and others, if you still believe that working with a LateXFormatter is preferable (I don't)?
bfseries
and textbf
, or textit
and emph
i.e. the cases where there is not a 1-1 direct CSS translation?siunitx
or other packages?font-size
for which Large
and Huge
are not valid CSS values?Here is an example of the extension module from the above PR:
@attack68 I greatly appreciate your skills as a developer and am impressed that you have always skillfully addressed my "challenges". :+1:
I'm sorry I haven't come up with a challenger model. I tried to dig into the existing code from multiple angles but I currently lack the time to get to a solution that would worth discussing. Also, you convinced me that using Jinja is not so bad after all.
@moi90 allowing Styler to operate as CSS ('attribute', 'value') pairs or as LaTeX ('command', 'options') creates a non-duplicated and maintainable codebase that is flexible in both formats. Another advantage is that the unit tests and formatting methods are available in both versions.
I don't see how a translation between CSS properties and LaTeX commands would lead to more code duplication or less maintainability.
automatic alignment of columns r for numeric and l for non-numeric (in .to_latex())
Maybe we can have an additional option for siunitx
that uses the S
column type for numeric columns?
automatic alignment of columns r for numeric and l for non-numeric (in .render(latex=True) - this is harder to implement without code duplication)
Maybe we can drop render(latex=True)
altogether? What is the advantage over to_latex
?
How do you differentiate between bfseries and textbf, or textit and emph i.e. the cases where there is not a 1-1 direct CSS translation?
textbf
or textit
(as it leads to problems with siunitx)emph
(as it is semantic markup, not a certain style)How do you plan to deal with the positioning of braces which may be different for siunitx or other packages?
I answered here in detail. Basically: Always be compatible with siunitx
; this does not harm other setups.
How do you plan on translating font-size for which Large and Huge are not valid CSS values?
I would make this a problem. On the CSS side, there is large, x-large, xx-large, and xxx-large. On the LaTeX side, there is \large, \Large, \LARGE, \huge, \Huge. These can be matched nicely (when excluding \Huge). (Same for the small sizes.)
Here is an example of the extension module from the above PR
Cool! (Albeit not yet siunitx
-compatible.)
if anyone would like to review / try out / comment on the PR https://github.com/pandas-dev/pandas/pull/40422 would be appreciated.
As an end user, when I have a Styler object that renders fine while viewing in a notebook, but exported to PDF it becomes class display (....Styler), do I need to explicitly call .toLatex() or will that be done behind the scenes?
Thank you for working on this issue!
Could you please provide some more advanced examples in documentation of usage of to_latex()?
what, specifically , would you like to see?
Building more complex tables, with multicols and multirows, other Latex elements such as \cmidrule and if possible (can't understand from the documentation if that's possible or not) if there's way to include logic in the table building (e.g. include \midrule after a specific row, etc.). I have to publish lot's of tables in LateX, but I'm unable to explore all possibilities with the current examples.
Building more complex tables, with multicols and multirows,
You need a MultiIndex and use the mulitrow_align, multicol_align, sparse_index, sparse_columns
arguments. The rest is handled by default. Data values will never be multi-columned or multi-rowed, only indexes.
other Latex elements such as \cmidrule and if possible (can't understand from the documentation if that's possible or not) if there's way to include logic in the table building (e.g. include \midrule after a specific row, etc.).
No you cant add conditional out of cell logic. The only custom commands you can add are as described in the docs page, akin to the example for \rowcolors{1}{pink}{red}
Thanks for your feedback. But for a novice user like me, it's been impossible to make it work from the examples (both in Styler.to_latex() and Dataframe.to_latex()).
include \midrule after a specific row
For that, I usually split the generated output into individual lines, insert the extra rules at pre-defined locations and concatenate the result. This breaks easily but it is better than nothing.
include \midrule after a specific row
For that, I usually split the generated output into individual lines, insert the extra rules at pre-defined locations and concatenate the result. This breaks easily but it is better than nothing. @moi90: How do you split the generated output into individual lines?
(Sorry to put this here, but I'm finding several questions on StackOverflow (e.g. this or this) unanswered, and my time is running out, so I bring this question into where I know there is knowledge to solve this. Please excuse me)
why Styler.to_latex() does not produce the same outputs (namely the \cline
) that DataFrame.to_latex().
My original dataframe after aggregating results (dfg
) is this:
F-1 F-2
dataset Model
G Baseline 5.825 5.804
Version2 5.825 5.804
H Baseline 4.677 4.571
Version2 4.802 4.660
S Baseline 2.406 1.921
Version2 2.719 2.189
T Baseline 5.284 4.949
Version2 5.931 5.909
Then I use the following code:
pd.options.display.float_format = '{:,.3f}'.format
styler_latex = dfg.style.to_latex(position="H", hrules=True, multirow_align="c", multicol_align="r", sparse_index=True)
dfg_latex = dfg.to_latex(position='H', escape=False, sparsify=True, multirow=True, multicolumn=True)
print('styler:', styler_latex)
print('dfg_latex', dfg_latex)
Output from Styler.to_latex() (Latex Code and Image)
\begin{table}[H]
\centering
\begin{tabular}{llrr}
\toprule
{} & {} & {F-1} & {F-2} \\
{dataset} & {Model} & {} & {} \\
\midrule
\multirow[c]{2}{*}{G} & Baseline & 5.824811 & 5.804303 \\
& Version2 & 5.824811 & 5.804303 \\
\multirow[c]{2}{*}{H} & Baseline & 4.677066 & 4.570626 \\
& Version2 & 4.801857 & 4.660115 \\
\multirow[c]{2}{*}{S} & Baseline & 2.406244 & 1.921260 \\
& Version2 & 2.719123 & 2.189293 \\
\multirow[c]{2}{*}{T} & Baseline & 5.284241 & 4.949087 \\
& Version2 & 5.931376 & 5.909215 \\
\bottomrule
\end{tabular}
\end{table}
Output from DataFrame.to_latex() (Latex Code and Image)
\begin{table}[H]
\centering
\begin{tabular}{llrr}
\toprule
& & F-1 & F-2 \\
dataset & Model & & \\
\midrule
\multirow{2}{*}{G} & Baseline & 5.825 & 5.804 \\
& Version2 & 5.825 & 5.804 \\
\cline{1-4}
\multirow{2}{*}{H} & Baseline & 4.677 & 4.571 \\
& Version2 & 4.802 & 4.660 \\
\cline{1-4}
\multirow{2}{*}{S} & Baseline & 2.406 & 1.921 \\
& Version2 & 2.719 & 2.189 \\
\cline{1-4}
\multirow{2}{*}{T} & Baseline & 5.284 & 4.949 \\
& Version2 & 5.931 & 5.909 \\
\bottomrule
\end{tabular}
\end{table}
Questions:
\cline
, contrarily to DataFrame.to_latex() ? Is there any way to "force" this behaviour into Styler.to_latex() . Why does not Styler.to_latex() include \cline
, contrarily to DataFrame.to_latex() ? Is there any way to "force" this behaviour into Styler.to_latex() ?I tried to do
my_dfstyle = my_dfstyle.set_table_styles([
{'selector': 'toprule', 'props': ':toprule;'},
{'selector': 'midrule', 'props': ':midrule;'},
{'selector': 'bottomrule', 'props': ':bottomrule;'},
], overwrite=False)
but I was unsuccessful. Is there any way to accomplish this type of control (e.g. force \midrule between multirows)?
pandas is a volunteer library. cline is not (yet) implemented in Styler.to_latex, no one has volunteered the time to develop it. toprule, midrule, and bottomrule will not help you here.
no, datacell merging is not possible. not sure what you mean by "two lines" but irrespective i am confident the fearures you are looking for have not been developed.
@attack68: Just to be clear I know that pandas is a volunteer library and I'm extremely grateful for pandas!! I was just checking if there was any option to accomplish what I wanted. About the 2 lines, I meant these 2 lines in the header:
those lines are the toprule and midrule. they are visible in both DF and Styler version.
In Styler version you can take them both away with hrules=False. you can also take just the midrule away by adding table styles for toprule and bottomrule and not including the midrule (and keeping Hrules=false)
@attack68 : sorry, I misguided you: I understand what you explained about the lines, but what I meant was "is there any way I can put the header in one row (instead of 2 rows) without losing the ability of the multiindex grouping under the "dataset" column?
I have created a branch adding a
to_latex
method toStyler
.It is nowhere near readiness, and in particular:
.astype(str)
, which is plain wrong because it discards features such as float formatting. For a good result, a better integration is needed with the code that currently does cell formatting inDataFrame.to_latex
.This said, if anyone feels like experimenting, it might be slightly better than starting from scratch.
One aspect which I think is useful, even aside from this branch, and might benefit some discussion, is the pair of methods
_latex_preserve
and_latex_restore
, which basically replace LaTeX commands so that they are not disturbed by escaping, and then restores them. There might be better way to code this, but I really think this is something we need to implement, and to offer to users who happen to nest LaTeX code in their cells.