pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.17k stars 17.77k forks source link

ENH: allow `convert_css` in `Styler.to_latex` to parse css classes set with `set_td_classes` and `set_table_styles` #48038

Open vnmabus opened 2 years ago

vnmabus commented 2 years ago

Feature Type

Problem Description

I wanted to make a function to create and format a table with experiment results. However I am facing some problems due to the way Pandas actually works:

Feature Description

Alternative Solutions

The only "solution" I was able to implement was to include the desired output as a parameter and doing things differently for each, without using CSS classes in the Latex case, and even manually changing the text of cells (which is not ideal).

Additional Context

No response

attack68 commented 2 years ago

There is a lot of stuff going on here, mixing CSS and Latex etc. Can you give a definitive, minimalist example of a simple feature and LaTeX output you are trying to achieve.

As for other comments:

Allow to set the styler object for a particular DataFrame

Unless I completely misunderstand this is an entire mechanics change and won't happen. What is wrong with df.style, which is available?

It is impossible to actually return a styled DataFrame as far as I know, one must return a Styler object instead. Thus, users must know about the Styler class, and it also has a bad text representation in terminal.

Correct, and a documentation plan is being written, but you are welcome to make suggestions about a better console representation. Note that in Jupyter notebook the styler is automatically rendered as either HTML or LaTeX at the end of a cell.

Allow to set per-CSS-class styles that work for Latex output (ideally, also allow for styles that work for both HTML and Latex). Allow to append/prepend text (or maybe something more general) to both HTML and Latex outputs based on the CSS class.

These are very complicated to implement mechanically into the existing format and I am not convinced it is worth the added code complexity. However, will understand more with an example I think.

vnmabus commented 2 years ago

Ok, so I will be more concrete:

There is a lot of stuff going on here, mixing CSS and Latex etc. Can you give a definitive, minimalist example of a simple feature and LaTeX output you are trying to achieve.

I think the most important part is to be able to use CSS classes to add format also in Latex. This way I can set the classes, maybe provide a default formatting style, and then let the user override that.

For example in my context I have a table with classification scores for different classifiers and datasets. Some of them are significant, calculated with a statistical test. Thus, I can set a specific class for significant results and additional classes for the ranking, letting the user specify how they should be formatted (for example, best results in bold, with an asterisk if they are significant). Unfortunately, this is currently not possible for Latex output, AFAIK.

Allow to set the styler object for a particular DataFrame

Unless I completely misunderstand this is an entire mechanics change and won't happen. What is wrong with df.style, which is available?

You can't set df.style, so you must return a Styler object. Not terrible, as the data can be recovered from an attribute, but not transparent for the end user.

It is impossible to actually return a styled DataFrame as far as I know, one must return a Styler object instead. Thus, users must know about the Styler class, and it also has a bad text representation in terminal.

Correct, and a documentation plan is being written, but you are welcome to make suggestions about a better console representation. Note that in Jupyter notebook the styler is automatically rendered as either HTML or LaTeX at the end of a cell.

Just the default DataFrame representation without styling at all is better than no user-readable representation. I knew that it rendered as HTML in Jupyter, but I do not know how to render it as Latex in Jupyter without explicitly saying it.

Allow to set per-CSS-class styles that work for Latex output (ideally, also allow for styles that work for both HTML and Latex). Allow to append/prepend text (or maybe something more general) to both HTML and Latex outputs based on the CSS class.

These are very complicated to implement mechanically into the existing format and I am not convinced it is worth the added code complexity. However, will understand more with an example I think.

The first one (setting styles per CSS class) is very interesting for flexibly changing the style.

Styles that work for both HTML and Latex is a nice-to-have feature. Alternatively, if one could specify that a style should be used for a particular output only, that would be enough.

I managed to append/prepend text using HTML (I just had to include embedded quotes). For Latex I have to check more, but it probably is doable (in the worst case you could output a custom command and define that in the document).

I forgot to mention again that set_th_classes is also interesting to have, and far easier than any of the above.

attack68 commented 2 years ago

forget using css to achieve your ends, can you give a latex example of what you are trying to acheive from a dataframe (minimalist). i just want to be certain that there is not currently a way of acheieving it that you are not aware of..

attack68 commented 2 years ago

OK, what I think you are trying to achieve is possible. But, I don't want to support converting HTML to LaTeX, as that is not the point of Styler. The convert_css method of Styler.to_latex is provided for minimalist support and was reasonably basic to code and for that reason was permitted. There is currently the ability to convert CSS that is added directly to cells. But there is not the support to parse CSS classes which are attached to cells, and this will not be supported because it is too complicated to build a CSS parsing engine and it serves minimalist benefit when all the features are already available without that.

So this is what you can do...

i) create a DataFrame

df = DataFrame({"a": [1, 3], "b": [2, 4]})

ii) create pseudo CSS classes, i.e. in python variables. Here are two that will convert from CSS and a special LaTeX only.

cls1 = "color:red;font-weight:bold;"
cls2 = "color:green;"
cls3 = "Huge:--latex--rwrap;"

iii) apply these css classes to your data or indexes using appropriate conditional functions.

styler = df.style\
  .applymap(lambda v: cls1 if v > 2 else None)\
  .applymap(lambda v: cls2 if v > 3 else None)\
  .applymap_index(lambda v: cls3)\
  .applymap_index(lambda v: cls2 if v == "b" else None, axis="columns")

iv) if you want to change the actual data, you need a formatting method.

styler.format_index(lambda v: f"{str(v) + '**' if v == 0 else v}", axis="index")

Screenshot 2022-08-11 at 20 43 20 v) render the latex

styler.to_latex(convert_css=True)

\begin{tabular}{lrr}
 & a & \color{green} b \\
\Huge{0**} & 1 & 2 \\
\Huge{1} & \color{red} \bfseries 3 & \color{red} \bfseries \color{green} 4 \\
\end{tabular}

Screenshot 2022-08-11 at 20 46 01

It is my opinion that this is already a significant amount of flexibility in being able to create LaTeX documents, much of which is fairly new to pandas. And it was difficult to get sign off on including this in the codebase but managed to squeeze it in without adding too much complexity.

vnmabus commented 2 years ago

Ok, let me try to be even more explicit.

I would want to make a function that:

1) Receives some data (results from experiments) 2) Computes rankings and significancy levels 3) Creates a DataFrame with the appropriate data 4) Applies a default style to the DataFrame 5) Returns the DataFrame to the user 6) The user can easily change the style (for example different styles for different rankings, and additionally for significant results or combination of both, such as marking significant results in first position in a special way) 7) The user can choose the output format

From the previous discussion my conclusions are: a) 4 is not really possible without knowing in advance the output that the user will choose, unless we restrict ourselves to the small subset of CSS attributes available in Latex. There is no way to apply HTML-only or Latex-only attributes. b) 5 is not possible. We must return a Styler instead. c) 6 is only possible for HTML, using CSS classes and selectors that use them. As classes are stored in a public attribute, it could be possible to create a function for styling objects with a particular class combination and expose that to the user, though. d) As mentioned before, 7 is incompatible with 4 for now.

attack68 commented 2 years ago

Yes, broadly.

You are describing a pipeline of actions between a server and a client. You are referring to server api that responds to input from a client and then the output received by the client should be customisable. This is acheived by either: a) Designing an API for the server's original function with arguments for the client's specific output formats. This has the advantage of being a one-time call method but everytime the user wants to change the output they need to re-submit the data, which, for large data, might be a network drag. b) Your server has to operate dynamically, which for the format of LaTeX seems unlikely anyway, HTML is often dynamic with Javascript in a webbrowser. c) Your client has a python environment which can receive outputs and make its own function calls, including self-manipulation of a Styler object.

1-3 are DataFrame creation and array operations. They have nothing to do LaTeX styling, or output format of any kind.

  1. A DataFrame is an in memory array store. It has no concept of a "style". A style is only appropriate in the context of output formats. The Styler is one form of output styling, which has the greatest flexibility. DataFrame.to_latex is another output format method which returns a string.
  2. So this suggests your client is operating within a python environment if it is returned a python object, i.e. a DataFrame.
  3. If your client is in a python environment you can provide a DataFrame, and Styler and any prebuilt renderering functions that code specific formatting relevative to your use case. Additionally the Styler is class in Python which you can subclass and add your own methods to that your client can use with an API that your server has built and distributed.
  4. Depending upon the environment this is implemented as above.

I do not believe that this issue regards improving styling compatability with Latex. As demonstrated, complete control of the latex render is acheiveable by a single user in a python environment, as far as I can tell according to your specifications.

Your issue seems to be orienting and establishing this in a user defined server API pipeline. Pandas is designed to provide the core building blocks, and therefore it is not immediately obvious to me what, if any, enhancements should be made here.

vnmabus commented 2 years ago

I think there has been a misunderstanding... I am not talking about servers or anything like that. The function mentioned is a regular function in a Python library. The user is a programmer using the library.

I want to return a DataFrame (or a Styler if that is not possible), with a default styling (ideally, one working for both HTML and Latex) and allow the programmer to easily customize the style. To that end, I wanted to set some CSS classes so that the user can use them, or combinations of them, as a selector for applying a particular style. This works in HTML but not in Latex.

Thus, I miss:

attack68 commented 2 years ago

So I would suggest that you:

Suppose you design the following method:

def my_function(data, max_style=None)
    df = pd.DataFrame(data)
    styler = df.style
    if max_style is not None:
        styler.highlight_max(axis=None, props=max_style)
    return styler

so that

data = np.random.randn(2,2)
user_max_css = "font-weight:bold; color:red;"
user_styler = my_function(data, max_style=user_max_css)
user_styler.to_html()
user_styler.to_latex(convert_css=True)

Capture

\begin{tabular}{lrr}
 & 0 & 1 \\
0 & -0.649920 & -0.544141 \\
1 & \bfseries \color{red} 0.636417 & -0.760168 \\
\end{tabular}

Or what is the specific aspect of this solution that does not satisfy your purpose. Please specifically reference the given example.

vnmabus commented 2 years ago

So I would suggest that you:

* do not use CSS classes, or the `set_td_classes` method. It is not appropriate for your use case and will not work with LaTeX.

Yes, I know that currently this not work. However, from the last paragraph of this comment, I would expect that setting classes for styling could be considered also for non-HTML outputs.

* use CSS as string as python variable and use the `apply`, `applymap`, `apply_index`, `applymap_index` methods. This does work with LaTeX, and allows styling of column and index headers. (The fact that you are not mentioning these methods suggest you are not aware of them - it seems to me be the exact solution to your prioblem, and if not please explain why)

I already knew about them. These work if I am the one styling the DataFrame for the first time, but don't work if the user has to change it afterwards, as he don't know which cells contain significant results, for example. With CSS classes I can set a separate class for all cells that have a significant result, for styling purposes.

Or what is the specific aspect of this solution that does not satisfy your purpose. Please specifically reference the given example.

This assumes that the user will pass a set of formatting parameters for the function, which is not flexible. What if he wants also to style negative numbers (just in the case of your example, not something I am interested on)? Should I add another parameter? And if he want a specific formatting for when data is both the max and negative?

With classes, I could provide the user a set of meaningful classes to style the data, and then he can easily set a style for them, without the need of creating string arrays for apply.

For now I think that I will include the info as part of the DataFrame, using a custom struct with an object dtype, and use a custom formatter to hide it from the output representation. That would allow the user to use apply himself, but I still think that the optimal way is with style classes.

I still don't have a way to apply specific advanced styles to HTML and Latex, though.

attack68 commented 2 years ago

I would expect that setting classes for styling could be considered also for non-HTML outputs.

This won't be supported (at least be me). I dont want to develop a CSS parser to convert to other formats (and using classes is very complicated to implement into the existing mechanics, not least becuase you need a parser). The onus is on the user to understand how to use the library to output in the correct format where available.

These work if I am the one styling the DataFrame for the first time, but don't work if the user has to change it afterwards, as he don't know which cells contain significant results, for example. With CSS classes I can set a separate class for all cells that have a significant result, for styling purposes.

Correct, whilst you might be able to circumvent this in HTML by using classes this is not the point of Styler to be dynamically stylable after computation for another end user. The point of Styler is for this to work if one is styling the DataFrame for the first time. Classes were designed to be incorporated into the CSS styles of a users existing HTML website where their own CSS already influence that design.

What if he wants also to style negative numbers (just in the case of your example, not something I am interested on)? Should I add another parameter? And if he want a specific formatting for when data is both the max and negative?

Yes your method is operating as an API layer on top of pandas so, if you want to give that function to your user (as a simplification to the pandas api) then your own method has to code that logic and require additional input arguments.

Alternatively, the user creates a Styler and applies the styler methods in chain for the precise styles he wants to apply, as is designed.

I still don't have a way to apply specific advanced styles to HTML and Latex, though

You haven't provided an example of what you consider an "advanced" style in HTML or LaTeX so it remains unclear to me as to whether this could be applied or not. But I expect some specific CSS HTML styles or Latex commands will not be valid within this scope.

vnmabus commented 2 years ago

You haven't provided an example of what you consider an "advanced" style in HTML or LaTeX so it remains unclear to me as to whether this could be applied or not. But I expect some specific CSS HTML styles or Latex commands will not be valid within this scope.

I meant providing two styles, one for HTML and another for Latex, so that both outputs could be used and we are not limited to the few CSS->Latex conversions available.

attack68 commented 2 years ago

wasn't this in my example above, which allows conversion and latex specific styles?

cls1 = "color:red;font-weight:bold;"
cls2 = "color:green;"
cls3 = "Huge:--latex--rwrap;"

and there is another example of that at https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html ?

vnmabus commented 2 years ago

Yes, but you need to know that the user wants to output in Latex beforehand. What there isn't is a way to apply styles specific for HTML and styles specific for Latex. If that were possible, I could use that functionality to let the user choose afterwards the appropriate output (or even use both if he/she wants).

attack68 commented 2 years ago

Correct, while there is a design objective to make certain functions generalisable over output, such as hide and format (in most capacities), this will not be possible for other styles. A user must know in advance what their intended output format is.

vnmabus commented 2 years ago

I made this function to set a style to all cells of some class, just in case is useful to anyone else:

def set_style_from_class(
    styler: pd.io.formats.style.Styler,
    class_name: str,
    style: str,
) -> pd.io.formats.style.Styler:
    style_matrix = np.full(styler.data.shape, style)

    for row in range(style_matrix.shape[0]):
        for column in range(style_matrix.shape[1]):
            classes = styler.cell_context.get(
                (row, column),
                "",
            ).split()

            if class_name not in classes:
                style_matrix[row, column] = ""

    return styler.apply(lambda x: style_matrix, axis=None)