pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.43k stars 17.85k forks source link

Series not honoring class __repr__ or __str__ #18843

Open achapkowski opened 6 years ago

achapkowski commented 6 years ago

Code Sample, a copy-pastable example if possible

class foo(dict):
    def __init__(self, iterable=None, **kwargs):
        if iterable is None:
            iterable = ()
        super(foo, self).__init__(iterable)
        self.update(kwargs)
    def __repr__(self):
        return ",".join(self.keys())
    def __str__(self):
        return ",".join(self.keys())

f = foo({'alpha' : 'b',
    'beta' : 'c'})

import pandas as pd
pd.DataFrame(data=[['A', 1, f]], columns=['D', 'F', 'G'])

Problem description

For a given series with a custom object, I want to control the content when displayed via print or displaying on ipython notebooks. The object foo is a simple class that have the __str__ and __repr__ overwritten, but still displays the object's dictionary content, not the view I want to show the end users. How do I control that?

Expected Output

alpha,beta

what I get is:

{'alpha': 'b', 'beta': 'c'}

Output of pd.show_versions()

[paste the output of ``pd.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.21.1 pytest: 3.3.1 pip: 9.0.1 setuptools: 38.2.4 Cython: None numpy: 1.11.2 scipy: 0.18.1 pyarrow: None xarray: None IPython: 5.3.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 1.5.3 openpyxl: None xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 0.999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
TomAugspurger commented 6 years ago

This is probably the same as https://github.com/pandas-dev/pandas/issues/17695 (you inherit from dict, so your objects are iterable). It's difficult for pandas to support formatting arbitrary objects.

Your simple example could be solved by not subclassing dict, and just storing your iterable on an internal ._data attribute. But that likely isn't a solution for your real problem.

achapkowski commented 6 years ago

@TomAugspurger not subclassing from dict is not an option since the other classes are established. Is there a way to override the print function (not optimal) or set something on the class to say hey use the __repr__

TomAugspurger commented 6 years ago

I don't believe so.

On Tue, Dec 19, 2017 at 10:12 AM, Andrew notifications@github.com wrote:

@TomAugspurger https://github.com/tomaugspurger not subclassing from dict is not an option since the other classes are established. Is there a way to override the print function (not optimal) or set something on the class to say hey use the repr

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/18843#issuecomment-352806018, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHItEhFKxDvFpUMKyvOT-jdLtIstdeks5tB-BlgaJpZM4RHJWu .

jamesmyatt commented 5 years ago

In pprint_thing, why is hasattr(thing, '__next__') a special case? https://github.com/pandas-dev/pandas/blob/cfd65e98e694b2ad40e97d06ffdd9096a3dea909/pandas/io/formats/printing.py#L207

rajeee commented 4 years ago

Bump to @jamesmyatt's question: why is hasattr(thing, '__next__') a special case in pprint_thing? __next__ attribute is available in iterators, and it doesn't make sense why iterator objects would be printed directly using str and other kinds of objects are passed through as_escaped_unicode function before printing.

lgharibashvili commented 3 years ago

Quick dirty patch for those who cannot wait for the fix:

from pandas.io.formats import printing as pd_printing
pd_printing.is_sequence = lambda obj: False
mzeitlin11 commented 3 years ago

@rajeee, @jamesmyatt not sure about why that check is there, a well-tested PR trying to fix this issue by removing that would be a next step here if you (or anyone else) is interested!

danking commented 11 months ago

This bug is no longer a bug:

Out[48]: 
   D  F                            G
0  A  1  {'alpha': 'b', 'beta': 'c'}

But a similar issue arrises when you sub-class Mapping (but not dict). A Mapping is a Collection is a Iterable and a Sized which define, respectively, __iter__ and __len__, which triggers Pandas' special logic.

We could change the isinstance check to use isinstance(value, Mapping) but then our custom mappings will look like dicts. This still seems like an improvement over only seeing the keys.

It does seem a lot easier to just override is_sequence to ignore our custom classes.

ruema commented 2 weeks ago

I have a similar problem, that pandas tries to interpret my custom class either as a list or as a dict, but does not allow to use a simple string representation.

My work-around was to define a __next__-Method, because this seems to short-cut directly to str.

achapkowski commented 2 weeks ago

@ruema can you post a full example?

ruema commented 2 weeks ago

Here's an example of an object with and without the __next__-Methode.

from collections import namedtuple
import pandas as pd

Point = namedtuple("Point", "x,y")
series = pd.Series({'pos': Point(7, 3)})
print(series)
# pos    (7, 3)
# dtype: object

class Point(namedtuple("Point", "x,y")):
    __next__ = None

series = pd.Series({'pos': Point(7, 3)})
print(series)
# pos    Point(x=7, y=3)
# dtype: object

I wonder, whether pandas pretty printing is helpful for anything but lists and tuples.

achapkowski commented 2 weeks ago

100% they should honor repr or str dunders