posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.81k stars 60 forks source link

feat: basic sub_missing and sub_zero #244

Closed machow closed 6 months ago

machow commented 6 months ago

This PR adds support for substitution functions. It address #182, by adding the following methods:

Currently, it is just using the formatter machinery. As I understand, substitutions should always go after formatters. (e.g. you should be able to .sub_zero() right away in a chain, and expect later fmt_*() calls to not override that.

Note these important pieces:

rich-iannone commented 6 months ago

From a quick test of sub_missing(), something worth fixing is allowing the missing_text= to work with the md() and html() helper functions. This currently fails:

from great_tables import GT, md, exibble

GT(exibble).sub_missing(missing_text=md("*MISSING*"))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File [~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:344](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:344), in BaseFormatter.__call__(self, obj)
    [342](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:342)     method = get_real_method(obj, self.print_method)
    [343](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:343)     if method is not None:
--> [344](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:344)         return method()
    [345](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:345)     return None
    [346](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:346) else:

File [~/py_projects/great-tables/great_tables/gt.py:195](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:195), in GT._repr_html_(self)
    [194](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:194) def _repr_html_(self):
--> [195](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:195)     return self.render(context="html")

File [~/py_projects/great-tables/great_tables/gt.py:298](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:298), in GT.render(self, context)
    [297](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:297) def render(self, context: str) -> str:
--> [298](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:298)     html_table = self._build_data(context=context)._render_as_html()
    [299](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:299)     return html_table

File [~/py_projects/great-tables/great_tables/gt.py:277](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:277), in GT._build_data(self, context)
    [274](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:274) def _build_data(self, context: str) -> Self:
    [275](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:275)     # Build the body of the table by generating a dictionary
    [276](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:276)     # of lists with cells initially set to nan values
--> [277](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:277)     built = self._render_formats(context)
    [278](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:278)     # built._body = _migrate_unformatted_to_output(body)
    [279](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:279) 
...
--> [421](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/pandas/core/arrays/string_.py:421) if len(value) and not lib.is_string_array(value, skipna=True):
    [422](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/pandas/core/arrays/string_.py:422)     raise TypeError("Must provide strings.")
    [424](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/pandas/core/arrays/string_.py:424) mask = isna(value)

TypeError: len() of unsized object
rich-iannone commented 6 months ago

Having "---" as the default for missing_text= doesn't seem ideal. What do you think of having None here as a default, then we'll use an em-dash in each output type (right now just "—" for HTML) for that default case? Then we don't have to worry about an AsIs()-type helper function at all (like we discussed before).

machow commented 6 months ago

Using None as the flag for using some default, such as an emdash, seems reasonable! I wonder if we could keep the surface area of Great Tables down, by avoiding automatic conversions of plaintext to other characters (e.g. "---" to emdash)?

In the future, some kind of enum might make it easy to separate things like raw "---" from emdash?

E.g.

from enum import Enum

class Chars(Enum):
    emdash = "emdash"
    endash = "endash"

(But I'm really spitballing here! May be some better way to represent? Or could people use unicode?!)

rich-iannone commented 6 months ago

Using None as the flag for using some default, such as an emdash, seems reasonable! I wonder if we could keep the surface area of Great Tables down, by avoiding automatic conversions of plaintext to other characters (e.g. "---" to emdash)?

I’m proposing skipping that conversion altogether. The default emdash is enough for this purpose. You’re right in that users could just supply Unicode characters for virtually anything here!