pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.26k stars 1.85k forks source link

Tweak data frame HTML repr #18100

Open cbrnr opened 1 month ago

cbrnr commented 1 month ago

Description

Data frames are printed as a nice HTML table when working in Jupyter notebook:

import polars as pl

df = pl.DataFrame({
    "a": [1, 2, 3, 4, 5],
    "b": ["A", "B", "C", "D", "E"],
    "c": [1.1, -2.8, 3.4, 4.7, -5.9],
})

df

This currently results in the following output:

Screenshot 2024-08-08 at 14 04 37

I find the styling of the column types a bit distracting though, they look exactly like column headers. Therefore, I suggest to print column types in italic, which would look like this:

Screenshot 2024-08-08 at 14 14 09
deanm0000 commented 1 month ago

Assuming this is acceptable (it seems good to me), then I think it's just updating this https://github.com/pola-rs/polars/blob/7f3c636b71b65d5a83ac08e59bc6578c1dc7471b/py-polars/polars/dataframe/_html.py#L106

to

with Tag(self.elements, "i"):
    self.elements.append(dtypes[c])
cbrnr commented 1 month ago

Will this also get rid of the bold formatting?

deanm0000 commented 1 month ago

No, I didn't really notice it. I also don't see where the bolding comes in, maybe because it's part of thead.

cbrnr commented 1 month ago

It's probably some predefined style of the header row. Would this be a change that everyone would like to see?

mcrumiller commented 1 month ago

I would prefer either italics or a 10-pt Consolas (or other fixed-width font) in grey (not black) to slightly subdue it. Using fixed-width helps indicate it's an official label.

cbrnr commented 1 month ago

I definitely like using gray to slightly subdue it. I'm not sure about a fixed-width font, I slightly prefer the regular font italicized, but here's what these variants look like:

Screenshot 2024-08-08 at 17 00 35
mcrumiller commented 1 month ago

I said 10-pt assuming everything else was 12pt, since Consolas renders larger characters; what about a fixed width with a 8- or 9-pt font?

From the screenshots above I too prefer the italic, but not by a lot: the slanted italics breaks up the flow of the columns a bit, and I think the smaller grey font may look best.

mcrumiller commented 1 month ago

Also just adding more nitpickiness here. Is it possible to reduce the row height of the dtype row as well somewhat, maybe by 25%?

mcrumiller commented 1 month ago

Another nit: the str columns are center-aligned I believe (I can't tell from your screenshots) but the header is obviously right-aligned. We should be consistent here with header + column alignment.

cbrnr commented 1 month ago

I can add more mockups, let me know what you would like to see. Note that I'm not generating them with the original HTML rendering, so they might not look 100% like the real thing.

Re alignment, the entire str column is in fact right-aligned, I think it's the added quotes (and the fact that the strings are just single characters) that make it look like it was centered.

cbrnr commented 6 days ago

I think it would also be useful to show row numbers, here's a suggestion:

Screenshot 2024-09-10 at 09 37 03

Or with the same background color for the two header rows:

Screenshot 2024-09-10 at 09 40 06