Closed jrycw closed 3 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 86.08%. Comparing base (
e3649d2
) to head (a683850
). Report is 20 commits behind head on main.:exclamation: Current head a683850 differs from pull request most recent head 610542e
Please upload reports for the commit 610542e to get more accurate results.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Please hold off on merging this PR for now. I've discovered some personal notes in my Jupyter playbook explaining why I wrote the code this way. It appears that the logic between these two implementations may differ. I'll conduct further investigation soon.
I think both approaches should yield the same result. However, a more interesting question arises: what should users expect if we include the rows
parameter? For instance, if rows = [1, 3, 4]
, should we first calculate the palette for the entire column and then retrieve the corresponding colors for those rows, or should we calculate the palette based solely on the selected rows? Both cases might not look aesthetically pleasing, and I doubt their practical application.
Perhaps what we really need is an option like include_last_row=True
or include_last_row=False
(the parameter name might not be optimal, but you get the idea). I've implemented alternative logic in the last commit, although this logic might need adjustments in the future if we introduce something like GT.summary_rows()
, which could potentially affect more than one row.
import polars as pl
import polars.selectors as cs
from great_tables import GT
df = pl.DataFrame(
{
"a": [1, 2, 3, 4, 5, sum([1, 2, 3, 4, 5])],
"b": [51, 52, 53, 54, 55, sum([51, 52, 53, 54, 55])],
}
)
GT(df).data_color(columns=cs.numeric(), include_last_row=True, palette="PuBu")
GT(df).data_color(columns=cs.numeric(), include_last_row=False, palette="PuBu")
should we first calculate the palette for the entire column and then retrieve the corresponding colors for those rows, or should we calculate the palette based solely on the selected rows?
@rich-iannone do you know what happens in the gt R package? AFAICT data_color there doesn't use excluded rows, and the rationale is that this let'd you exclude extremely large values etc..
I would lean toward not using excluded in the calculations. Then, if people want to use all rows, they can always set the domain=
argument, to specify the exact range for the color values.
should we first calculate the palette for the entire column and then retrieve the corresponding colors for those rows, or should we calculate the palette based solely on the selected rows?
@rich-iannone do you know what happens in the gt R package? AFAICT data_color there doesn't use excluded rows, and the rationale is that this let'd you exclude extremely large values etc..
I would lean toward not using excluded in the calculations. Then, if people want to use all rows, they can always set the
domain=
argument, to specify the exact range for the color values.
The R package doesn't use the excluded rows in the calculations or colorization. You're right that the domain
argument approach is better for specifying the exact range. If we did include all the rows in the calculation, it might be surprising to users and I don't think it would be generally desired behavior through an extra argument.
Thanks @machow and @rich-iannone . Would 43b0ce5 achieve our goal, or are there any other modifications I should make?
I think so. Just reverting the last commit would make this right.
Related to #248.
This PR aims to support specifying a subset of rows in
GT.data_color()
. Based on the discussions, it appears that the most common use case involves excluding the last row, which is typically the total row. Consequently, rows that are not selected should not be included in the color "calculation." In this PR, I first select only the specified columns and rows, then "re-map" the "calculated color" back to the original table.A simple example might illustrate the concept well:
Bonus
In addition, I have added support for using Polars-style column and row selection. This allows for more flexible operations, such as: