posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.44k stars 50 forks source link

feat: remove mizani as dependency, re-implement logic internally #271

Closed machow closed 2 months ago

machow commented 2 months ago

This PR removes mizani as a dependency, so that we don't transitively depend on scipy and pandas. Note that our internal implementation does not depend on numpy, so that we can drop it as a dependency in a later PR.

This implementation is not as elegant as mizani's, or quite as optimized, but should do well for our purposes.

Speed

Our default implementation was about 50% slower than mizani in a simple test. But both are very fast for simple table displays (1.52 ms for 1,000 points for ours, 1ms for mizani.)

from great_tables._data_color.palettes import GradientPalette, CoeffSequence
from mizani.palettes import gradient_n_pal

palette = GradientPalette(["red", "orange", "blue", "grey", "yellow", "red", "orange"])
palette2 = GradientPalette(["red", "orange", "blue", "grey", "yellow", "red", "orange"], cls_coeff_sequence=CoeffSequence)
palette3 = gradient_n_pal(["red", "orange", "blue", "grey", "yellow", "red", "orange"])

%%timeit
# internal + bisect lookup: 1.52 ms ± 5.85 µs per loop
palette([x / 1000. for x in range(1000)])

%%timeit
# 1.76 ms ± 6.27 µs per loop
palette2([x / 1000. for x in range(1000)])

%%timeit
# mizani: 1.03 ms ± 9.82 µs per loop
palette3([x / 1000. for x in range(1000)])

The main difference is that we're using a simple bisect lookup to find cutoff corresponding to a value (in order to get coefficients for transforming within a cutoff band). Mizani uses a table lookup, which cuts the input/response space into 256 bins.

Fixes: https://github.com/posit-dev/great-tables/issues/7

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 93.02326% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 81.87%. Comparing base (ece39e8) to head (05578a6). Report is 125 commits behind head on main.

Files Patch % Lines
great_tables/_data_color/palettes.py 92.77% 6 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #271 +/- ## ========================================== + Coverage 77.01% 81.87% +4.86% ========================================== Files 40 41 +1 Lines 4229 4310 +81 ========================================== + Hits 3257 3529 +272 + Misses 972 781 -191 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

rich-iannone commented 2 months ago

Thank you for the huge amount of work you put into this and the related PR! Will review shortly.

machow commented 2 months ago

@abstractqqq in case it's useful for polars_ds, this PR should remove pandas as a dependency (except for importing from great_tables.data, which we'll tackle later!). We should be able to release soon. Definitely let us know if you run into any issues with it.