pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.71k stars 17.93k forks source link

Followup to Conditional HTML Styling #11610

Closed TomAugspurger closed 3 years ago

TomAugspurger commented 8 years ago

Follows #10250

For 0.17.1

For 0.18.0 / Future

will add more as we go.

jreback commented 8 years ago

@TomAugspurger I had to slightly change style.rst to get this to appear in the index. You may want to further modify.

jankatins commented 8 years ago

A comment on the API of the highlighter:

def color_negative_red(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    color = 'red' if val < 0 else 'black'
    return 'color: %s' % color

This basically only works for html representation ("color: red" is css speak) and assumes that the return value should got to the css.

In latex (see e.g. this example), you prefix/suround the value with a command:

[...]
\usepackage[table]{xcolor}% http://ctan.org/pkg/xcolor
  Some & \cellcolor{blue!25}coloured & contents \\

would render the second cell blue (by using a command from a special package).

So for latex you probably need templates ala \cellcolor{blue!25}%s or \textbf{%s} (which make the value bold).

kynan commented 8 years ago

@TomAugspurger I think you made a great start on this! A few ideas for making this approach potentially more flexible:

  1. Allow extra attributes on the <table>.

    Common use case: You want to export an html table with sortable column, e.g. using sortable. For that you would need the following opening tag:

    <table class="sortable-theme-bootstrap" data-sortable>
  2. Support "external" styling via existing custom CSS style sheets.

    For this to work, the current approach with unique ids per cell which are then targeted with a auto generated CSS doesn't really work. Instead it would be useful to be able to assign additional classes (or data attributes) to cells based on the Styler rules.

    Example use case: assign class positive to all values > 0 and negative to all values < 0

    A potential advantage of this would be much smaller size of generated code (since we don't need a custom CSS block for each cell) and better performance when rendering in the browser (fewer rules to apply).

  3. Allow setting arbitrary attributes on table cells based on rules.

    This extends the previous suggestion and would allow using exported html tables with any kind of JavaScript library using specific attributes.

Sorry for being so late in making these suggestions - I didn't manage to read through all of #10250 before it was merged.

TomAugspurger commented 8 years ago

@kynan fantastic, thanks for the feedback. I'll go through it in more detail later.

Your item 1. sounds pretty simple. Is attribute the correct term for the items in the opening tag? <table class="sortable-theme-bootstrap" data-sortable> We could include a method like .set_table_attributes for that.

kynan commented 8 years ago

@TomAugspurger I believe attribute is the common term and also the one the W3C uses. set_table_attributes sounds sensible to me.

jreback commented 8 years ago

@TomAugspurger merged your PR; I'll leave you to close when you are ready.

TomAugspurger commented 8 years ago

Thanks, I want to get a better solution in place for including notebooks in the sphinx build, but that works for now.

@kynan, for your second item, assigning classes to cells. My current thinking is to have a method on Styler (would it be terrible to call it .classify?) that takes a function to be evaluated and a class to assign to the cells where that function evaluates to True. The limitation here is that the class that's assigned doesn't get to refer to the data. So you couldn't (easily) do something like our .background_gradient, which is why I discarded this approach originally. But it might make sense to have in addition to the one-class-per cell approach we have. This will probably need to wait for the next release though.

TomAugspurger commented 8 years ago

docs are up if anyone sees anything. It does still have the [In] and [Out] tags and ¶ markers I might be able to hide.

jreback commented 8 years ago

@TomAugspurger yep look great!

on the css side, I think its possible if we tag with the SAME names, e.g.

df.style.highlight_null(css='null_class').background_gradient(css='gradients_class')

then as long as you tag THOSE cells with that class it would work. we could have default class names (based on the function name), and have this kw to override.

TomAugspurger commented 8 years ago

The df.style.highlight_null(css='null_class') I could see working like that, since that fits the binary "if this condition is true, apply this style".

This could be my limited understanding of CSS, but I don't see how you could accomplish df.style.background_gradient(css='gradients_class') in CSS, just knowing that these columns have this class. (I think) you'd need a class per color you want to assign.

I suppose we could add a data attribute to each cell with the value of that cell... You might be able to pull off some CSS wizardry to accomplish it in that case, but I don't see the average python use being able to write or customize that.

jreback commented 8 years ago

@TomAugspurger I think you would actually construct the classes WITH the in this case the level embedded, (for some you wouldn't need to do this), maybe something like 'gradient_level_0_class` (e.g. say you ten levels of gradient. but this is a refinement.

mattilyra commented 8 years ago

I've been playing around with the new styling features and have a few comments, overall this is a great new addition.

The highlight_min, highlight_max and highlight_null would be a lot better if instead of taking a color argument they would actually take the css format string or **kwargs that correspond to css style names - this would allow:

  1. using inverted colors (background-color: black; color: white)
  2. other formatting options than background shading (font-weight: bold)

The documentation is also a little confusing in terms of debugging the styling functions.

Debugging Tip: If you're having trouble writing your style function, try just passing it into df.apply. Styler.apply uses that internally, so the result should be the same.```

The full stop and space between apply and Styler is somewhat difficult to spot (I only noticed it when pasting the quote here) - I was looking for df.apply.Styler.apply which obviously doesn't exist, a little rewording would fix this. (You need to look at the text at https://pandas-docs.github.io/pandas-docs-travis/style.html to spot it)

Debugging Tip: If you're having trouble writing your style function, try passing it into df.apply. Internally Styler.apply uses that, so the result should be the same.```

TomAugspurger commented 8 years ago

On your first point, I agree that would be useful. If we end up going with a .classify that takes a function returning booleans (pd.isnull) and assigns classes where that's True, we should be able to handle that pretty easily. I think we should hold off adding more keywords to .highlight_null until we decide what to do there.

I just pushed a PR to clarify the documentation. That was confusing, thanks.

jorisvandenbossche commented 8 years ago

@TomAugspurger to repeat, awesome work!

A question on the 'provisional status'. First, as I said on gitter, I think it is a good idea to put the same provisional note from the notebook in the whatsnew note (experimental = can still change + feedback wanted). Secondly, we could also emit a warning about this on first usage to be even more explicit? (but maybe that's a bit too intrusive). But if it is only on the first import of the style module, and not each time you use it, maybe it is OK?

Question for the docs: the built notebook in html form is still in the source code. Is this on purpose? (as eg in the latest PR you only updated the notebook and not the html file)

TomAugspurger commented 8 years ago

Having the generated HTML is not intended, I thought I deleted that. I'll remove it in my PR adding the provisional note.

For the warning. I never was a fan of always getting the warnings when using IPy widgets. I can go either way though. At the very least I'm going to add a note to the docstring for Styler.

jorisvandenbossche commented 8 years ago

Another small note on the docs: maybe it would be good to include a link the notebook on nbviewer? As this actually still looks better than the one included in the docs (the table styling (the borders) is 'uglier')

TomAugspurger commented 8 years ago

I was trying to figure out how to include a link that points to the same version of the notebook, but adding the link changes the notebook :) I suppose we just link to https://github.com/pydata/pandas/blob/master/doc/source/html-styling.ipynb, understanding that the contents at that URL can change?

jreback commented 8 years ago

@TomAugspurger we can host a rendered version on pandas.pydata.org easily in the doc directory and just link to it (from the docs). IIRC this was your original suggestion :)

github doesn't render these properly AFAICT

jorisvandenbossche commented 8 years ago

I think putting a link 'See this notebook on nbviewer' that points to the one in master is OK (to be fully correct, it should point to the version in the version tag, but that is bit difficult as that does not yet exist :-))

TomAugspurger commented 8 years ago

Just iterate on the links until they converge :)

Just pushed https://github.com/pydata/pandas/pull/11664 with an NBviewer link pointing to master until we get the version uploaded to pandas.pydata.org as part of the doc build.

jreback commented 8 years ago

hmm, so this is the argument then for including the nbconvert outputted .html, which we can then directly link as a file

jorisvandenbossche commented 8 years ago

A possible advantage of using the link to nbviewer, is that it is then easier to download the notebook to run things yourself

TomAugspurger commented 8 years ago

Yeah I think we should definitely link to nbviewer since they already have the stuff in place to download as a link. I'm not sure how including the rendered HTML in the doc build helps (or hurts) with this.

jorisvandenbossche commented 8 years ago

OK, merging the PR then!

jorisvandenbossche commented 8 years ago

The first SO questions appear! :-) http://stackoverflow.com/questions/33875937/apply-number-formatting-to-pandas-html-css-styling

jreback commented 8 years ago

@TomAugspurger let's try to link all relevant HTML issues at the top of the tracker (as most/maybe all can be accomplished via .style).

jreback commented 8 years ago

xref #11700

kynan commented 8 years ago

@TomAugspurger Sorry for the delay. classifier sounds good to me. Is there a reason it could only return a boolean and not a string with the class name?

121onto commented 8 years ago

This feature looks really promising. Thanks all who worked on it!

Wouldn't it be great if I could render DataFrames in html outside of ipython notebooks!? I'm not a fan of developing inside ipython notebook, and working with matplotlib entails a lot of overhead.

Two workflows that come to mind are as follows. First, if you are working on a mac, keep a quicklook window open on a PDF file that you use to store current output. Then, define a my_print function that render()s an html string and prints it to your PDF file:

from weasyprint import HTML

def my_style(frame):
    return frame.style.highlight_null(null_color='red') # or whatever

def my_print_pdf(frame, styler, filename='/Users/username/temp/frame_viewer.pdf'):
    style = styler(frame)
    html = HTML(style.render())
    html.write_pdf(filename)
    return None

The Quicklook should update each time you my_print a DataFrame.

Second, use something like Browsersync to watch an HTML file. To watch a file with Browsersync, you'd type the following in your terminal:

cd ~/temp
browser-sync start --server --index "frame_viewer.html" --files "*.html"

With this approach, you'd write a my_print that dumps the output from render() to an html file. Because Browsersync expects body tags, you'll need to append those to the output from render:

def my_print_html(frame, styler, filename='/Users/username/temp/frame_viewer.html'):
    style = styler(frame)
    html = "<html><head></head><body>" + style.render() + "</body></html>"
    with open(filename, 'w') as f:
        print(html, file=f)

    return None

Each time you call my_print_html Browsersync will refresh your browser automatically.

Notes: code not tested.

Edits 1: tested Browsersync example and updated code so it works on my machine. Edits 2: updated browsersync command.

TomAugspurger commented 8 years ago

Those are both possible now right? You'd just need to write the code / startup the server? Something like your code would fit well in the cookbook documentation.

Adding "builtin" support for rendering in the notebook had a very favorable cost-benefit ratio. Two lines of code for something used by so many. I don't anticipate adding other backends.

On Nov 26, 2015, at 15:30, 121onto notifications@github.com wrote:

This feature looks really promising. Thanks all who worked on it!

Wouldn't it be great if I could render DataFrames in html outside of ipython notebooks!? I'm not a fan of developing inside ipython notebook, and working with matplotlib entails a lot of overhead.

Two workflows that come to mind are as follows. First, if you are working on a mac, keep a quicklook window open on a PDF file that you use to store current output. Then, define a my_print function that render()s an html string and prints it to your PDF file:

from weasyprint import HTML

def my_style(frame): return frame.highlight_null() # or whatever

def my_print_pdf(frame, styler, filename='~/temp/frame_viewer.pdf'): style = styler(frame) html = HTML(style.render()) html.write_pdf(filename) return None The Quicklook should update each time you my_print a DataFrame.

Second, use something like Browsersync to watch an HTML file. To watch a file with Browsersync, you'd type something like browser-sync start --server --files "~/temp/frame_viewer.html" in the terminal. With this approach, you'd write a my_print that dumps the output from render() to an html file. Because Browsersync expects body tags, you may need to append those:

def my_print_html(frame, styler, filename='~/temp/frame_viewer.html'): style = styler(frame) html = "" + style.render()) + "" with open(filename, 'w') as f: print(html, file=f)

return None

Notes: code not tested.

— Reply to this email directly or view it on GitHub.

121onto commented 8 years ago

@TomAugspurger yes, possible right now. The Browsersync example should work now.

jreback commented 8 years ago

@TomAugspurger just came across this from xlsxwriter here might be a nugget we could steal....

jankatins commented 8 years ago

This is an interesting approach in R: https://github.com/renkun-ken/formattable -> see the last example

joekane3 commented 8 years ago

Hi,

This feature is great - thanks.

I wanted a way to style a column based on data in another column. I couldn't see a way to do this so made a change to the .bar() styler method. Suggestions on how else to perform such a thing would be appreciated.

image

Apologies if I have not done this correctly.

TomAugspurger commented 8 years ago

@joekane3 something like that should be possible through the .apply method. Your style function will get the entire DataFrame, so you can use the values in column A to apply styles in columns B. Your function should return a DataFrame with background-color: <color> for column B and empty strings everywhere else.

Of course, you'll have to do all the conversion from values to colors on your own. Pandas just uses matplotlib internally, so that's probably your best bet.

monkh commented 8 years ago

I'm new to github, sorry if this is wrong place to post this. I don't think there is but is there a way to hide the index column when outputting a styled DataFrame through the .render() function?

Also it seems like the Styler is going to (in the future) make .to_html() obsolete.

TomAugspurger commented 8 years ago

Correct, the index is unstyleable right now. I plan to fix that in the future, including an option to hide it.

We'll always have to_html, but the implementation might reuse the code here.

mjmyers commented 8 years ago

Correct, the index is unstyleable right now. I plan to fix that in the future, including an option to hide it.

@TomAugspurger : Was there ever any headway on this? I can't find mention of it in the docs. I've had to do some pretty hacky css to hide the index while using the styler.

jreback commented 8 years ago

actually a bit of work here: https://github.com/pydata/pandas/issues/11655

TomAugspurger commented 8 years ago

@mjmyers nothing for styling the index yet though. The The big thing is finding an API that's nice to work with. Some possibilities

But I haven't thought too much about it yet.

attack68 commented 3 years ago

since this tracker is quite old and many items on it have been addressed or evolved I will close it in favour of more recent discussion. pls re open if you consider it useful.