posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.43k stars 48 forks source link

The `GT.fmt_number()` method fails when formatting includes missing values in Polars DataFrames #314

Closed rich-iannone closed 2 months ago

rich-iannone commented 2 months ago

Formatting with fmt_number() fails when the DF is a Polars DF and when the formatter hits a missing value. Here is a failing case

from great_tables import GT, exibble
import polars as pl

exibble_pl = pl.from_pandas(exibble)

GT(exibble_pl).fmt_number(columns="num")

Error is:

TypeError                                 Traceback (most recent call last)
File .../great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:344, in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:

File .../great-tables/great_tables/gt.py:270, in GT._repr_html_(self)
    267 make_page = defaults["make_page"]
    268 all_important = defaults["all_important"]
--> 270 rendered = self.as_raw_html(
    271     make_page=make_page,
    272     all_important=all_important,
    273 )
    275 return rendered

File .../great-tables/great_tables/_export.py:33, in as_raw_html(self, make_page, all_important)
     12 def as_raw_html(
     13     self: GT,
     14     make_page: bool = False,
     15     all_important: bool = False,
     16 ) -> str:
     17     """
...
--> 297     x = x * scale_by
    299     # Determine whether the value is positive
    300     is_negative = _has_negative_value(value=x)

TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Error has to do with indiscriminate scaling involving a multiplication (default is 1) which assumes every value (NA or not) should undergo numeric scaling. Avoiding the missing value results in a successful table render with no error (e.g., try using GT(exibble_pl.head(4)).fmt_number(columns="num")).

I've tested variations of a Polars DF column with missing values with every other formatting method (e.g., fmt_integer(), fmt_bytes(), etc.) and it appears that this issue is isolated to fmt_number().