Open s-celles opened 7 years ago
@scls19fr : Thanks for the report! I think in the interest of modularity, we would probably want to move away from incorporating giant chunks of code into our codebase.
That being said, we could add functions that call those function from something like tabulate
or PrettyPandas
(assuming they're installed). However, we would probably need to choose one or the other (or make a uniform API to be able to call both).
RStudio adds color to the output of a data frame in the console:
Since IPython now supports syntax highlighting in a normal console, I was wondering if something similar was possible in pandas. This would likely not require any additional packages, but instead adding color codes to the __str__
or __repr__
methods could be all there needs to be done.
yes this certainly could be done. would likely take a separate formatting method (just to avoid having string / html / color strings code) in one place.
Great! What do you mean with separate method? Ideally, color strings should be used when I type df
in IPython, so this has to be in __repr__
, right?
ANSI escape color codes work, so it is pretty straightforward to get colorized output, e.g.:
The bigger questions are:
pandas.io.formats.format.to_string
(before strcols
get combined into text
). colorize
)?Where do we want to add this functionality?
Is this going to be doable with how to_string
is currently implemented? From what I recall, that's a bit of a minefield.
Styler used jinja, but I'm not sure what our appetite is for adopting that as a full dependency.
I guess it's doable in to_string
, but carrying around defaults and arguments might require some thinking. I wouldn't call it minefield, but the way the final string is constructed forced me to apply colors in the previous step (in the list of str_cols).
If there is an external tool that can do this coloring for us, we should use it if it means less work implementing it. It could be an optional dependency, in the sense that if it's not installed there will only be plain non-colored output.
@cbrnr wrote:
ANSI escape color codes work, so it is pretty straightforward to get colorized output
Say I wanted to colorize just specific table cells upon output to a terminal to highlight them, based on a list of iloc-compatible cell locations. How would I do that? I just need some pointers on what Pandas functions to tweak to allow this. Thanks.
I could do somehting like
from colorama import Fore, Back, Style
df[c] = Fore.RED + Style.BRIGHT + df[c].astype(str) + Style.RESET_ALL
print (df)
Interesting. This was already suggested in #459, but never implemented.
Thanks for pointing me to this. It would be nice to have this as a core Pandas capability but at least terminal styling is possible with this package.
-- Bob
On Jun 12, 2019, at 10:55 PM, Rohan Machado notifications@github.com wrote:
I could do somehting like
from colorama import Fore, Back, Style
df[c] = Fore.RED + Style.BRIGHT + df[c].astype(str) + Style.RESET_ALL
print (df)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I was looking for a way to use ANSI color codes in the terminal or qtconsole for a long time, and finally put this together. The main problem was that ANSI codes were being incorporated into the print width calculation, which messed up registration of the columns.
This is too hacky for a pull request, but it solves my problem, so I thought I'd post it in case it helps anyone else.
You can replace the "TextAdjustment" class with the version below in this file: site-packages/pandas/io/formats/format.py
class TextAdjustment(object):
def __init__(self):
import re
self.ansi_regx = re.compile(r'\x1B[@-_][0-?]*[ -/]*[@-~]')
self.encoding = get_option("display.encoding")
def len(self, text):
return compat.strlen(self.ansi_regx.sub('', text),
encoding=self.encoding)
def justify(self, texts, max_len, mode='right'):
jfunc = str.ljust if (mode == 'left') else \
str.rjust if (mode == 'right') else str.center
out = [];
for s in texts:
escapes = self.ansi_regx.findall(s)
if len(escapes) == 2:
out.append(escapes[0].strip() +
jfunc(self.ansi_regx.sub('', s), max_len) +
escapes[1].strip())
else:
out.append(jfunc(s, max_len))
return out;
def _join_unicode(self, lines, sep=''):
try:
return sep.join(lines)
except UnicodeDecodeError:
sep = compat.text_type(sep)
return sep.join([x.decode('utf-8') if isinstance(x, str) else x
for x in lines])
def adjoin(self, space, *lists, **kwargs):
# Add space for all but the last column:
pads = ([space] * (len(lists) - 1)) + [0]
max_col_len = max([len(col) for col in lists])
new_cols = []
for col, pad in zip(lists, pads):
width = max([self.len(s) for s in col]) + pad
c = self.justify(col, width, mode='left')
# Add blank cells to end of col if needed for different col lens:
if len(col) < max_col_len:
c.extend([' ' * width] * (max_col_len - len(col)))
new_cols.append(c)
rows = [self._join_unicode(row_tup) for row_tup in zip(*new_cols)]
return self._join_unicode(rows, sep='\n')
I got similar requirements to colorize specific column and here's my workaround:
from colorama import Fore, Back, Style
def color_red_green(val):
if val < 0:
color = Fore.GREEN
else:
color = Fore.RED
return color + str('{0:.2%}'.format(val)) + Style.RESET_ALL
# apply to specific column
dfs["percent"] = dfs["percent"].apply(color_red_green)
thanks @texxronn
Trying to use corlorama on a pandas dataframe, but running into the same problem with misaligned printing menioned above: https://github.com/pandas-dev/pandas/issues/18066#issuecomment-522192922
Does anyone know of a way to patch this behavior in without modifying pandas itself?
I see pandas.DataFrame.to_string has a formatters
parameter, but it's not clear to me how to use it.
@cscanlin I think pandas will need to be updated to handle this. https://github.com/pandas-dev/pandas/pull/30778 had a start, which allows things like
I'm not planning to return to that anytime soon, so feel free to take over if you want.
I've updated the workaround from @ghost711 to work with pandas 1.2.4. Hopefully this feature will be supported in the main branch somehow, but in the meantime:
def monkeypatch_pandas():
"""
References:
https://github.com/pandas-dev/pandas/issues/18066
"""
import pandas.io.formats.format as format_
from six import text_type
# Made wrt pd.__version__ == '1.2.4'
class TextAdjustmentMonkey(object):
def __init__(self):
import re
self.ansi_regx = re.compile(r'\x1B[@-_][0-?]*[ -/]*[@-~]')
self.encoding = format_.get_option("display.encoding")
def len(self, text):
return len(self.ansi_regx.sub('', text))
def justify(self, texts, max_len, mode='right'):
jfunc = str.ljust if (mode == 'left') else \
str.rjust if (mode == 'right') else str.center
out = []
for s in texts:
escapes = self.ansi_regx.findall(s)
if len(escapes) == 2:
out.append(escapes[0].strip() +
jfunc(self.ansi_regx.sub('', s), max_len) +
escapes[1].strip())
else:
out.append(jfunc(s, max_len))
return out
def _join_unicode(self, lines, sep=''):
try:
return sep.join(lines)
except UnicodeDecodeError:
sep = text_type(sep)
return sep.join([x.decode('utf-8') if isinstance(x, str) else x
for x in lines])
def adjoin(self, space, *lists, **kwargs):
# Add space for all but the last column:
pads = ([space] * (len(lists) - 1)) + [0]
max_col_len = max([len(col) for col in lists])
new_cols = []
for col, pad in zip(lists, pads):
width = max([self.len(s) for s in col]) + pad
c = self.justify(col, width, mode='left')
# Add blank cells to end of col if needed for different col lens:
if len(col) < max_col_len:
c.extend([' ' * width] * (max_col_len - len(col)))
new_cols.append(c)
rows = [self._join_unicode(row_tup) for row_tup in zip(*new_cols)]
return self._join_unicode(rows, sep='\n')
format_.TextAdjustment = TextAdjustmentMonkey
For what it's worth, I found it pretty frustrating that even something basic like representing NaNs differently on the terminal is impossible. For HTML and LaTeX output, there's the Styler and its _narep argument, but there doesn't seem to be something equivalent for terminal output.
I'm currently working around this by replacing NaNs with math.inf
(using fillna) and providing custom _floatformat function:
def float_format(value: float):
if isinf(value):
return ''
return str(value)
pd.options.display.float_format = float_format
A simple solution would be to also pass NaNs through the _floatformat function, but I guess that's not an option considering backward compatibilty.
I still think that rendering the output with Rich would be the best option. Basically, pd.DataFrames
could have a __rich_repr__()
method, which is responsible for the repr when Rich is available (https://rich.readthedocs.io/en/stable/pretty.html). Unfortunately, I don't have time to implement it myself at the moment.
Hello,
Since v0.17.1, Pandas have an interesting API for styling Jupyter HTML outputs for Series / DataFrames... https://pandas.pydata.org/pandas-docs/stable/style.html
It will be a nice feature to be able to have styling for console/terminal output I wonder if such feature should be part of Pandas core or be inside an other package (like tabulate from @astanin which was mentioned in #11052
There is also PrettyPandas from @HHammond
Kind regards