pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.57k stars 17.9k forks source link

DataFrame output too wide / not truncated properly #32461

Open emsems opened 4 years ago

emsems commented 4 years ago

Problem description

It seems to me, that DataFrames are not always correctly trunctaed to fit the terminal width (using pandas 1.0.1). My pandas config is set such, that it auto detects terminal width and the representation of the DataFrame should fit in.

Relevant pandas config settings

```python display.width : int Width of the display in characters. In case python/IPython is running in a terminal this can be set to None and pandas will correctly auto-detect the width. Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a terminal and hence it is not possible to correctly detect the width. [default: 80] [currently: 80] display.max_columns : int If max_cols is exceeded, switch to truncate view. Depending on `large_repr`, objects are either centrally truncated or printed as a summary view. 'None' value means unlimited. In case python/IPython is running in a terminal and `large_repr` equals 'truncate' this can be set to 0 and pandas will auto-detect the width of the terminal and print a truncated object which fits the screen width. The IPython notebook, IPython qtconsole, or IDLE do not run in a terminal and hence it is not possible to do correct auto-detection. [default: 0] [currently: 0] display.max_colwidth : int or None The maximum width in characters of a column in the repr of a pandas data structure. When the column overflows, a "..." placeholder is embedded in the output. A 'None' value means unlimited. [default: 50] [currently: 50] display.expand_frame_repr : boolean Whether to print out the full DataFrame repr for wide DataFrames across multiple lines, `max_columns` is still respected, but the output will wrap-around across multiple "pages" if its width exceeds `display.width`. [default: True] [currently: True] display.large_repr : 'truncate'/'info' For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can show a truncated table (the default from 0.13), or switch to the view from df.info() (the behaviour in earlier versions of pandas). [default: truncate] [currently: truncate] display.column_space No description available. [default: 12] [currently: 12] ```

Code Sample

Please resize terminal to have width 127 to reproduce Check with shutil.get_terminal_size()

import pandas as pd
from io import StringIO
import shutil

# terminal width (in my case: columns=127, lines=40)
print('terminal width: {} characters'.format(shutil.get_terminal_size()[0]))

s = 'id,tstamp,00aaaaa,01a,02aaaaa,03a,04aaaaa,05,06aaaa,07aaaaaaaaaaaaa,08aaaaaa,09aaaaaaa,10aa,11aaaaaa,12aaaaaaaaa,13aaaaaaaa,14aaaaaaa,15aaaaaaaa,16a,17aaaaa,18aa,19aaaaaa,20,21,22aaaaaa,23aaaaaa,24aaaaa,25aa,26aaaaaa,27a,28aaaaa,29aaaa\r\n779491690,2019-02-01 00:00:02+00:00,,161.38538188324176,297.461393148902,,,,0.466667,False,False,3,,0.007,53.8849,,0.0323102,-0.4,,0.008,,0.0,,,,17.1,-1e-06,,0.024,,0.045,159.72512756708358\r\n779491691,2019-02-01 00:05:02+00:00,,162.2999981618803,299.5123814547798,,,,0.553571,True,False,3,,0.007,-85.1749,,0.0969305,-0.5,,0.008,,0.0,,,,17.0,-3e-06,,0.031,,0.049,160.6961983114413\r\n779491692,2019-02-01 00:10:02+00:00,,163.1754248277306,301.7498948568431,,,,0.530612,False,False,3,,0.007,,,-0.0646204,-0.4,,0.007,,0.0,,,,17.0,2e-06,,0.026,,0.049,161.6468595698913\r\n779491693,2019-02-01 00:15:02+00:00,,164.00520705009447,304.19960616946184,,,,0.466667,False,False,3,,0.007,,,-0.0323102,-0.4,,0.008,,0.0,,,,17.0,1e-06,,0.024,,0.045,162.5736942291059\r\n779491694,2019-02-01 00:20:02+00:00,,164.78185830034352,306.8905124087089,,,,0.511792,True,False,3,,0.007,128.438,,-0.0323102,-0.4,,0.008,,0.0,,,,17.0,1e-06,,0.031,,0.053,163.47261824257413\r\n'
f = StringIO()
f.write(s)
f.seek(0)
df = pd.read_csv(f, index_col=[0, 1])
print(df)

The lines of the string representation of the DataFrame are too long, therefore each line spans across two lines (depending on the terminal width; with the given settings it needs 129 characters instead of the available 127). To me this looks like a bug in the DataFrameFormatter. As I understand probably in the write_result method line 839ff Could that be?

By the way, I also don't quite understand the display.expand_frame_repr setting. It is now set to True. When I set it to False the DataFrame does not get truncated but the full represenation is printed across multiple lines. Shouldn't that be excactly the other way round?

The following issue seems to be related, but I did not find the exact same problem. Hope I didn't miss anything: https://github.com/pandas-dev/pandas/issues/16911

Expected Output

A DataFrame representation truncated such, that each line fits the terminal width.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.6.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0.post20200209 Cython : 0.29.15 pytest : None hypothesis : None sphinx : 2.4.3 blosc : None feather : 0.4.0 xlsxwriter : 1.2.8 lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : 1.3.2 fastparquet : 0.3.3 gcsfs : None lxml.etree : None matplotlib : 3.2.0 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : 3.6.1 tabulate : None xarray : 0.15.0 xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.8 numba : 0.48.0
alexklapheke commented 1 year ago

I'm having this problem too (can confirm it's still happening on main branch). The width of the DataFrame's repr exceeds terminal width by up to 4 characters. I think this function is not accounting for the added width of the ... that stands in for the missing columns:

https://github.com/pandas-dev/pandas/blob/2e218d10984e9919f0296931d92ea851c6a6faf5/pandas/io/formats/string.py#L159-L192

billziss-gh commented 9 months ago

Minimal example that reproduces this problem with Pandas 2.1.4:

>>> import os, pandas
>>> os.get_terminal_size()
os.terminal_size(columns=93, lines=47)
>>> df=pandas.DataFrame({"Date": "2023-08-31 00:00:00-04:00", "Open": 187.839996, "High": 189
.119995, "Low": 187.479996, "Close": 187.869995, "Volume": 60735600, "Dividends": 0.0, "Stock
 Splits": 0.0}, index=[10769])
>>> df
                            Date        Open        High  ...    Volume  Dividends  Stock Spl
its
10769  2023-08-31 00:00:00-04:00  187.839996  189.119995  ...  60735600        0.0
0.0

[1 rows x 8 columns]