Closed machow closed 6 months ago
From a quick test of sub_missing()
, something worth fixing is allowing the missing_text=
to work with the md()
and html()
helper functions. This currently fails:
from great_tables import GT, md, exibble
GT(exibble).sub_missing(missing_text=md("*MISSING*"))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File [~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:344](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:344), in BaseFormatter.__call__(self, obj)
[342](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:342) method = get_real_method(obj, self.print_method)
[343](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:343) if method is not None:
--> [344](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:344) return method()
[345](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:345) return None
[346](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/IPython/core/formatters.py:346) else:
File [~/py_projects/great-tables/great_tables/gt.py:195](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:195), in GT._repr_html_(self)
[194](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:194) def _repr_html_(self):
--> [195](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:195) return self.render(context="html")
File [~/py_projects/great-tables/great_tables/gt.py:298](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:298), in GT.render(self, context)
[297](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:297) def render(self, context: str) -> str:
--> [298](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:298) html_table = self._build_data(context=context)._render_as_html()
[299](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:299) return html_table
File [~/py_projects/great-tables/great_tables/gt.py:277](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:277), in GT._build_data(self, context)
[274](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:274) def _build_data(self, context: str) -> Self:
[275](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:275) # Build the body of the table by generating a dictionary
[276](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:276) # of lists with cells initially set to nan values
--> [277](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:277) built = self._render_formats(context)
[278](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:278) # built._body = _migrate_unformatted_to_output(body)
[279](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/great_tables/gt.py:279)
...
--> [421](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/pandas/core/arrays/string_.py:421) if len(value) and not lib.is_string_array(value, skipna=True):
[422](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/pandas/core/arrays/string_.py:422) raise TypeError("Must provide strings.")
[424](https://file+.vscode-resource.vscode-cdn.net/Users/riannone/py_projects/great-tables/~/py_projects/great-tables/env/lib/python3.9/site-packages/pandas/core/arrays/string_.py:424) mask = isna(value)
TypeError: len() of unsized object
Having "---"
as the default for missing_text=
doesn't seem ideal. What do you think of having None
here as a default, then we'll use an em-dash in each output type (right now just "—"
for HTML) for that default case? Then we don't have to worry about an AsIs()
-type helper function at all (like we discussed before).
Using None as the flag for using some default, such as an emdash, seems reasonable! I wonder if we could keep the surface area of Great Tables down, by avoiding automatic conversions of plaintext to other characters (e.g. "---"
to emdash)?
In the future, some kind of enum might make it easy to separate things like raw "---"
from emdash?
E.g.
from enum import Enum
class Chars(Enum):
emdash = "emdash"
endash = "endash"
(But I'm really spitballing here! May be some better way to represent? Or could people use unicode?!)
Using None as the flag for using some default, such as an emdash, seems reasonable! I wonder if we could keep the surface area of Great Tables down, by avoiding automatic conversions of plaintext to other characters (e.g.
"---"
to emdash)?
I’m proposing skipping that conversion altogether. The default emdash is enough for this purpose. You’re right in that users could just supply Unicode characters for virtually anything here!
This PR adds support for substitution functions. It address #182, by adding the following methods:
sub_missing
: substitutes any missing values. This includes both null and nan..is_null() | .is_nan()
, since distinguishes between the two..isna()
, which flags both kinds of missingness.sub_zero
Currently, it is just using the formatter machinery. As I understand, substitutions should always go after formatters. (e.g. you should be able to
.sub_zero()
right away in a chain, and expect laterfmt_*()
calls to not override that.Note these important pieces:
FormatterSkipElement
, which when returned from a format call, indicates that no change should be made.<br>
.is_na
to properly detect float("nan")is_substitution=
tofmt()
. This deviates from gt. Happy to change to be more similar.