pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.66k stars 17.91k forks source link

BUG: to-markdown with NA #50866

Open buhtz opened 1 year ago

buhtz commented 1 year ago

Pandas version checks

Reproducible Example

import pandas
pandas.DataFrame([[pandas.NA]]).to_markdown()

Issue Description

It seems that the markdown output has a problem with the <NA>. But his only happens on Windows with pandas 1.5.3 not on Debian 11 (stable, raspberryPi) with pandas 1.5.3.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\buhtzch\AppData\Roaming\Python\Python310\site-packages\pandas\core\frame.py", line 2843, in to_markdown
    result = tabulate.tabulate(self, **kwargs)
  File "C:\Users\buhtzch\AppData\Roaming\Python\Python310\site-packages\tabulate\__init__.py", line 2048, in tabulate
    list_of_lists, headers = _normalize_tabular_data(
  File "C:\Users\buhtzch\AppData\Roaming\Python\Python310\site-packages\tabulate\__init__.py", line 1471, in _normalize_tabular_data
    rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
  File "C:\Users\buhtzch\AppData\Roaming\Python\Python310\site-packages\tabulate\__init__.py", line 1471, in <lambda>
    rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
  File "C:\Users\buhtzch\AppData\Roaming\Python\Python310\site-packages\tabulate\__init__.py", line 107, in _is_separating_line
    (len(row) >= 1 and row[0] == SEPARATING_LINE)
  File "pandas\_libs\missing.pyx", line 382, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Expected Behavior

No exception.

Installed Versions

Sorry, I wasn't able to install main because building takes to long on a Pi4. ;) On my Windows I'm not able to install anything from git because I'm not the admin.

This is the Windows info where the error happens.

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.10.2.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252 pandas : 1.5.3 numpy : 1.23.5 pytz : 2022.6 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.2 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.2 numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.9.3 snappy : None sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None xlwt : None zstandard : None tzdata : 2022.7

This is the GNU/Linux Debian 11 info where the error is not reproducable.

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.9.2.final.0 python-bits : 64 OS : Linux OS-release : 5.10.0-20-arm64 Version : #1 SMP Debian 5.10.158-2 (2022-12-13) machine : aarch64 processor : byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8 pandas : 1.5.3 numpy : 1.23.2 pytz : 2021.1 dateutil : 2.8.1 setuptools : 66.0.0 pip : 22.3.1 Cython : None pytest : 7.2.0 hypothesis : None sphinx : 4.4.0 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : None pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.5 brotli : 1.0.9 fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.2 numba : None numexpr : 2.8.3 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : 1.1.4 pyxlsb : None s3fs : None scipy : 1.6.0 snappy : None sqlalchemy : None tables : 3.6.1 tabulate : 0.8.10 xarray : None xlrd : None xlwt : 1.3.0 zstandard : None tzdata : 2022.1
MarcoGorelli commented 1 year ago

thanks @buhtz for the report

could you report to tabulate please?

In [5]: df = pandas.DataFrame([[pandas.NA]])

In [6]: tabulate.tabulate(df)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 tabulate.tabulate(df)

File ~/pandas-dev/.311venv/lib/python3.11/site-packages/tabulate/__init__.py:2048, in tabulate(tabular_data, headers, tablefmt, floatfmt, intfmt, numalign, stralign, missingval, showindex, disable_numparse, colalign, maxcolwidths, rowalign, maxheadercolwidths)
   2045 if tabular_data is None:
   2046     tabular_data = []
-> 2048 list_of_lists, headers = _normalize_tabular_data(
   2049     tabular_data, headers, showindex=showindex
   2050 )
   2051 list_of_lists, separating_lines = _remove_separating_lines(list_of_lists)
   2053 if maxcolwidths is not None:

File ~/pandas-dev/.311venv/lib/python3.11/site-packages/tabulate/__init__.py:1471, in _normalize_tabular_data(tabular_data, headers, showindex)
   1469 headers = list(map(str, headers))
   1470 #    rows = list(map(list, rows))
-> 1471 rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
   1473 # add or remove an index column
   1474 showindex_is_a_str = type(showindex) in [str, bytes]

File ~/pandas-dev/.311venv/lib/python3.11/site-packages/tabulate/__init__.py:1471, in _normalize_tabular_data.<locals>.<lambda>(r)
   1469 headers = list(map(str, headers))
   1470 #    rows = list(map(list, rows))
-> 1471 rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
   1473 # add or remove an index column
   1474 showindex_is_a_str = type(showindex) in [str, bytes]

File ~/pandas-dev/.311venv/lib/python3.11/site-packages/tabulate/__init__.py:107, in _is_separating_line(row)
    104 def _is_separating_line(row):
    105     row_type = type(row)
    106     is_sl = (row_type == list or row_type == str) and (
--> 107         (len(row) >= 1 and row[0] == SEPARATING_LINE)
    108         or (len(row) >= 2 and row[1] == SEPARATING_LINE)
    109     )
    110     return is_sl

File ~/pandas-dev/pandas/_libs/missing.pyx:413, in pandas._libs.missing.NAType.__bool__()
    411 
    412     def __bool__(self):
--> 413         raise TypeError("boolean value of NA is ambiguous")
    414 
    415     def __hash__(self):

TypeError: boolean value of NA is ambiguous
buhtz commented 1 year ago

OK, thanks.

I'm not sure about your Issue policy. IMHO this can be closed because it is not pandas related.

MarcoGorelli commented 1 year ago

We'll probably want to set a minimum version of tabulate when it's fixed, so let's keep it open til then. Thanks!

buhtz commented 1 year ago

There is an open PR at "tabulate". https://github.com/astanin/python-tabulate/pull/232

flying-sheep commented 7 months ago

Tabulate seems unmaintained, see https://github.com/astanin/python-tabulate/issues/281

blaiseli commented 3 months ago

I had a similar bug on linux. In case this is useful, I managed to get around the bug by inserting a .astype(str) before the .to_markdown()