pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.78k stars 17.97k forks source link

Styling console/terminal output #18066

Open s-celles opened 7 years ago

s-celles commented 7 years ago

Hello,

Since v0.17.1, Pandas have an interesting API for styling Jupyter HTML outputs for Series / DataFrames... https://pandas.pydata.org/pandas-docs/stable/style.html

It will be a nice feature to be able to have styling for console/terminal output I wonder if such feature should be part of Pandas core or be inside an other package (like tabulate from @astanin which was mentioned in #11052

There is also PrettyPandas from @HHammond

Kind regards

gfyoung commented 7 years ago

@scls19fr : Thanks for the report! I think in the interest of modularity, we would probably want to move away from incorporating giant chunks of code into our codebase.

That being said, we could add functions that call those function from something like tabulate or PrettyPandas (assuming they're installed). However, we would probably need to choose one or the other (or make a uniform API to be able to call both).

cbrnr commented 6 years ago

RStudio adds color to the output of a data frame in the console:

dxm8a7uw4aakwvg

Since IPython now supports syntax highlighting in a normal console, I was wondering if something similar was possible in pandas. This would likely not require any additional packages, but instead adding color codes to the __str__ or __repr__ methods could be all there needs to be done.

jreback commented 6 years ago

yes this certainly could be done. would likely take a separate formatting method (just to avoid having string / html / color strings code) in one place.

cbrnr commented 6 years ago

Great! What do you mean with separate method? Ideally, color strings should be used when I type df in IPython, so this has to be in __repr__, right?

cbrnr commented 6 years ago

ANSI escape color codes work, so it is pretty straightforward to get colorized output, e.g.:

screen shot 2018-03-27 at 10 45 05

The bigger questions are:

TomAugspurger commented 6 years ago

Where do we want to add this functionality?

Is this going to be doable with how to_string is currently implemented? From what I recall, that's a bit of a minefield.

Styler used jinja, but I'm not sure what our appetite is for adopting that as a full dependency.

cbrnr commented 6 years ago

I guess it's doable in to_string, but carrying around defaults and arguments might require some thinking. I wouldn't call it minefield, but the way the final string is constructed forced me to apply colors in the previous step (in the list of str_cols).

If there is an external tool that can do this coloring for us, we should use it if it means less work implementing it. It could be an optional dependency, in the sense that if it's not installed there will only be plain non-colored output.

rswgnu commented 5 years ago

@cbrnr wrote:

ANSI escape color codes work, so it is pretty straightforward to get colorized output

Say I wanted to colorize just specific table cells upon output to a terminal to highlight them, based on a list of iloc-compatible cell locations. How would I do that? I just need some pointers on what Pandas functions to tweak to allow this. Thanks.

texxronn commented 5 years ago

I could do somehting like

from colorama import Fore, Back, Style
df[c] = Fore.RED + Style.BRIGHT + df[c].astype(str) + Style.RESET_ALL
print (df)
cbrnr commented 5 years ago

Interesting. This was already suggested in #459, but never implemented.

rswgnu commented 5 years ago

Thanks for pointing me to this. It would be nice to have this as a core Pandas capability but at least terminal styling is possible with this package.

-- Bob

On Jun 12, 2019, at 10:55 PM, Rohan Machado notifications@github.com wrote:

I could do somehting like

from colorama import Fore, Back, Style

df[c] = Fore.RED + Style.BRIGHT + df[c].astype(str) + Style.RESET_ALL

print (df)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ghost711 commented 5 years ago

I was looking for a way to use ANSI color codes in the terminal or qtconsole for a long time, and finally put this together. The main problem was that ANSI codes were being incorporated into the print width calculation, which messed up registration of the columns.

This is too hacky for a pull request, but it solves my problem, so I thought I'd post it in case it helps anyone else.

You can replace the "TextAdjustment" class with the version below in this file: site-packages/pandas/io/formats/format.py

class TextAdjustment(object): 
    def __init__(self):
        import re
        self.ansi_regx = re.compile(r'\x1B[@-_][0-?]*[ -/]*[@-~]')
        self.encoding  = get_option("display.encoding")

    def len(self, text):  
        return compat.strlen(self.ansi_regx.sub('', text), 
                             encoding=self.encoding) 

    def justify(self, texts, max_len, mode='right'):       
        jfunc = str.ljust if (mode == 'left')  else \
                str.rjust if (mode == 'right') else str.center     
        out = [];  
        for s in texts:
            escapes = self.ansi_regx.findall(s)    
            if len(escapes) == 2:
                out.append(escapes[0].strip() + 
                           jfunc(self.ansi_regx.sub('', s), max_len) + 
                           escapes[1].strip()) 
            else:
                out.append(jfunc(s, max_len)) 
        return out;  

    def _join_unicode(self, lines, sep=''):
        try:
            return sep.join(lines)
        except UnicodeDecodeError:
            sep = compat.text_type(sep)
            return sep.join([x.decode('utf-8') if isinstance(x, str) else x
                                                            for x in lines])

    def adjoin(self, space, *lists, **kwargs): 
        # Add space for all but the last column: 
        pads = ([space] * (len(lists) - 1)) + [0] 
        max_col_len = max([len(col) for col in lists])
        new_cols = []
        for col, pad in zip(lists, pads): 
            width = max([self.len(s) for s in col]) + pad
            c     = self.justify(col, width, mode='left')
            # Add blank cells to end of col if needed for different col lens: 
            if len(col) < max_col_len:
                c.extend([' ' * width] * (max_col_len - len(col)))
            new_cols.append(c)

        rows = [self._join_unicode(row_tup) for row_tup in zip(*new_cols)] 
        return self._join_unicode(rows, sep='\n') 
Beanking77 commented 5 years ago

I got similar requirements to colorize specific column and here's my workaround:

from colorama import Fore, Back, Style

def color_red_green(val):
    if val < 0:
        color = Fore.GREEN
    else:
        color = Fore.RED
    return color + str('{0:.2%}'.format(val)) + Style.RESET_ALL

# apply to specific column
dfs["percent"] = dfs["percent"].apply(color_red_green)

thanks @texxronn

cscanlin-kwh commented 4 years ago

Trying to use corlorama on a pandas dataframe, but running into the same problem with misaligned printing menioned above: https://github.com/pandas-dev/pandas/issues/18066#issuecomment-522192922

Does anyone know of a way to patch this behavior in without modifying pandas itself?

I see pandas.DataFrame.to_string has a formatters parameter, but it's not clear to me how to use it.

TomAugspurger commented 4 years ago

@cscanlin I think pandas will need to be updated to handle this. https://github.com/pandas-dev/pandas/pull/30778 had a start, which allows things like

I'm not planning to return to that anytime soon, so feel free to take over if you want.

Erotemic commented 3 years ago

I've updated the workaround from @ghost711 to work with pandas 1.2.4. Hopefully this feature will be supported in the main branch somehow, but in the meantime:

def monkeypatch_pandas():
    """
    References:
        https://github.com/pandas-dev/pandas/issues/18066
    """
    import pandas.io.formats.format as format_
    from six import text_type

    # Made wrt pd.__version__ == '1.2.4'

    class TextAdjustmentMonkey(object):
        def __init__(self):
            import re
            self.ansi_regx = re.compile(r'\x1B[@-_][0-?]*[ -/]*[@-~]')
            self.encoding  = format_.get_option("display.encoding")

        def len(self, text):
            return len(self.ansi_regx.sub('', text))

        def justify(self, texts, max_len, mode='right'):
            jfunc = str.ljust if (mode == 'left')  else \
                    str.rjust if (mode == 'right') else str.center
            out = []
            for s in texts:
                escapes = self.ansi_regx.findall(s)
                if len(escapes) == 2:
                    out.append(escapes[0].strip() +
                               jfunc(self.ansi_regx.sub('', s), max_len) +
                               escapes[1].strip())
                else:
                    out.append(jfunc(s, max_len))
            return out

        def _join_unicode(self, lines, sep=''):
            try:
                return sep.join(lines)
            except UnicodeDecodeError:
                sep = text_type(sep)
                return sep.join([x.decode('utf-8') if isinstance(x, str) else x
                                                                for x in lines])

        def adjoin(self, space, *lists, **kwargs):
            # Add space for all but the last column:
            pads = ([space] * (len(lists) - 1)) + [0]
            max_col_len = max([len(col) for col in lists])
            new_cols = []
            for col, pad in zip(lists, pads):
                width = max([self.len(s) for s in col]) + pad
                c     = self.justify(col, width, mode='left')
                # Add blank cells to end of col if needed for different col lens:
                if len(col) < max_col_len:
                    c.extend([' ' * width] * (max_col_len - len(col)))
                new_cols.append(c)

            rows = [self._join_unicode(row_tup) for row_tup in zip(*new_cols)]
            return self._join_unicode(rows, sep='\n')

    format_.TextAdjustment = TextAdjustmentMonkey
brechtm commented 2 years ago

For what it's worth, I found it pretty frustrating that even something basic like representing NaNs differently on the terminal is impossible. For HTML and LaTeX output, there's the Styler and its _narep argument, but there doesn't seem to be something equivalent for terminal output.

I'm currently working around this by replacing NaNs with math.inf (using fillna) and providing custom _floatformat function:

def float_format(value: float):
    if isinf(value):
        return ''
    return str(value)

pd.options.display.float_format = float_format

A simple solution would be to also pass NaNs through the _floatformat function, but I guess that's not an option considering backward compatibilty.

cbrnr commented 2 years ago

I still think that rendering the output with Rich would be the best option. Basically, pd.DataFrames could have a __rich_repr__() method, which is responsible for the repr when Rich is available (https://rich.readthedocs.io/en/stable/pretty.html). Unfortunately, I don't have time to implement it myself at the moment.