pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.09k stars 17.74k forks source link

no warning with invalid pd.set_option('display.max_colwidth') #16097

Open jkornblum opened 7 years ago

jkornblum commented 7 years ago

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame(np.array([['foo', 'bar', 'bim', 'uncomfortably long string'], ['horse', 'cow', 'banana', 'apple']]))
df
#       0    1       2                          3
# 0    foo  bar     bim  uncomfortably long string
# 1  horse  cow  banana                      apple

pd.set_option('max_colwidth', 2)
df
#       0    1       2                          3
# 0    foo  bar     bim  uncomfortably long string
# 1  horse  cow  banana                      apple

pd.set_option('max_colwidth', 6)
df
#       0    1      2      3
# 0    foo  bar    bim  un...
# 1  horse  cow  ba...  apple

Problem description

When I set the max_colwidth option to 3 or fewer it does not function and gives no warning. I assume that the ellipses take three characters so it has little meaning if you set the option less than 3. I think it should warn the user about the invalid setting or display an ellipse filled DataFrame.

Maybe the documentation could be updated to show range of 3 to xx.

Expected Output

pd.set_option('max_colwidth', 2)
# raise ValueError (or similar) .... 
# Invalid argument 2 specificied the minimum is 4

## OR just respond to the new setting but display a useless DateFrame

df
#       0   1   2   3
# 0    ..  ..  ..  ..
# 1    ..  ..  ..  ..

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: None
jreback commented 7 years ago

yeah all of these validators are pretty simple, IOW for this one it allows just int. It doesn't do anything more. Could enhance these by a) check that the value is >= 0 (0 is valid), maybe do some warning on certain ones).

This should be handled purely at the config level (and not at the actual display level).

ghost commented 7 years ago

I'm working on a commit and I've noticed quite a few more of these; all options for which negative values would be meaningless. Since I'm just starting to contribute I don't know if it's bad form to improve validation for these without being asked, but in order to help out I'll do these too. I could always provide a more limited commit if needed.

Some options (which would also be changed) currently provide exceptions such as: "ValueError: Value must be an instance of <type 'int'>|<type 'NoneType'>", which seems inelegant.

jreback commented 7 years ago

why don't you put up a prototype of what you are thinking. once one is good, can do the rest.

ghost commented 7 years ago

Just happened to PR it before I read your comment, apologies. I'll adjust as needed, let me know what you think.

usman98789 commented 4 years ago

take

kevinanker commented 2 years ago

This issue seems stale. What approach is desired to solve this? A quick solution could be to print a UserWarning for values of 3 or less. This can be implemented here as follows:

def _deprecate_negative_int_max_colwidth(key):
    value = cf.get_option(key)
    if value is not None and value < 0:
        warnings.warn(
            "Passing a negative integer is deprecated in version 1.0 and "
            "will not be supported in future version. Instead, use None "
            "to not limit the column width.",
            FutureWarning,
            stacklevel=find_stack_level(),
        )
#<-- insert this
    elif value < 4:
        warnings.warn(
            "Passing a value of 3 or less may have no effect on output.",
             UserWarning,
             stacklevel=find_stack_level(),
        )
#-->
cf.register_option(
    # TODO(2.0): change `validator=is_nonnegative_int` see GH#31569
    "max_colwidth",
    50,
    max_colwidth_doc,
    validator=is_instance_factory([type(None), int]),
    cb=_deprecate_negative_int_max_colwidth,
)
cyenyxe commented 2 months ago

After the introduction of the validator is_nonnegative_int and recent changes to config_init.py, the previous suggestion could be adapted as follows:

def _warn_max_colwidth_too_small(key):
    value = cf.get_option(key)
    if value < 4:
        warnings.warn(
            "A max_colwidth of 3 or less may have no effect on the output.",
            UserWarning,
            stacklevel=find_stack_level(),
    )

cf.register_option(
    "max_colwidth",
    50,
    max_colwidth_doc,
    validator=is_nonnegative_int,
    cb=_warn_max_colwidth_too_small
)

Unfortunately this makes the display somewhat redundant/unpleasant when visualizing a DataFrame. Using the initial example:

>>> pd.set_option('max_colwidth', 3)
<stdin>:1: UserWarning: Passing a value of 3 or less may have no effect on output.

>>> df
/usr/local/lib/python3.10/contextlib.py:135: UserWarning: Passing a value of 3 or less may have no effect on output.
  return next(self.gen)
/usr/local/lib/python3.10/contextlib.py:142: UserWarning: Passing a value of 3 or less may have no effect on output.
  next(self.gen)
       0    1       2                          3
0  foo    bar  bim     uncomfortably long string
1  horse  cow  banana  apple

When performing another type of operation, the display config is ignored.

>>> df[0]
0    foo
1    horse
Name: 0, dtype: object

Is there any other way to trigger warnings or tweak how they are displayed? Or would it make sense to just fall back to the suggestion from the initial message to update the documentation with the range?