mwouts / itables

Pandas DataFrames as Interactive DataTables
https://mwouts.github.io/itables/
MIT License
724 stars 54 forks source link

OverflowError when showing Polars data frames containing unsigned integer types #192

Closed boschmic closed 10 months ago

boschmic commented 10 months ago

Showing the following data-frame inside a Jupyter notebook throws OverflowError (notice the u32 dtype):

shape: (2, 2)
┌─────┬───────┐
│ foo ┆ count │
│ --- ┆ ---   │
│ i64 ┆ u32   │
╞═════╪═══════╡
│ 3   ┆ 1     │
│ 1   ┆ 3     │
└─────┴───────┘

Code:

import polars as pl

df = pl.DataFrame({'foo': [1, 1, 3, 1]}).groupby('foo').count()

from itables import init_notebook_mode, show
init_notebook_mode(all_interactive=True)
show(df)

Error:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[2], line 3
      1 from itables import init_notebook_mode, show
      2 init_notebook_mode(all_interactive=True)
----> 3 show(df)

File ~/.venv/lib/python3.10/site-packages/itables/javascript.py:462, in show(df, caption, **kwargs)
    460 def show(df=None, caption=None, **kwargs):
    461     """Show a dataframe"""
--> 462     html = to_html_datatable(df, caption=caption, connected=_CONNECTED, **kwargs)
    463     display(HTML(html))

File ~/.venv/lib/python3.10/site-packages/itables/javascript.py:403, in to_html_datatable(df, caption, tableId, connected, import_jquery, **kwargs)
    400 # When the header has an extra column, we add
    401 # an extra empty column in the table data #141
    402 column_count = _column_count_in_header(table_header)
--> 403 dt_data = datatables_rows(
    404     df,
    405     column_count,
    406     warn_on_unexpected_types=warn_on_unexpected_types,
    407     warn_on_int_to_str_conversion=warn_on_int_to_str_conversion,
    408 )
    410 output = replace_value(
    411     output, "const data = [];", "const data = {};".format(dt_data)
    412 )
    414 return output

File ~/.venv/lib/python3.10/site-packages/itables/datatables_format.py:108, in datatables_rows(df, count, warn_on_unexpected_types, warn_on_int_to_str_conversion)
    104 def datatables_rows(
    105     df, count=None, warn_on_unexpected_types=False, warn_on_int_to_str_conversion=False
    106 ):
    107     """Format the values in the table and return the data, row by row, as requested by DataTables"""
--> 108     df = convert_bigints_to_str(df, warn_on_int_to_str_conversion)
    110     # We iterate over columns using an index rather than the column name
    111     # to avoid an issue in case of duplicated column names #89
    112     if count is None or len(df.columns) == count:

File ~/.venv/lib/python3.10/site-packages/itables/datatables_format.py:87, in convert_bigints_to_str(df, warn_on_int_to_str_conversion)
     83 except AttributeError:
     84     x = df[col]
     85     if (
     86         x.dtype in pl.INTEGER_DTYPES
---> 87         and ((x < JS_MIN_SAFE_INTEGER) | (x > JS_MAX_SAFE_INTEGER)).any()
     88     ):
     89         df = df.with_columns(pl.col(col).cast(pl.Utf8))
     90         converted.append(col)

File ~/.venv/lib/python3.10/site-packages/polars/series/series.py:564, in Series.__lt__(self, other)
    562 if isinstance(other, pl.Expr):
    563     return F.lit(self).__lt__(other)
--> 564 return self._comp(other, "lt")

File ~/.venv/lib/python3.10/site-packages/polars/series/series.py:512, in Series._comp(self, other, op)
    509 if f is None:
    510     return NotImplemented
--> 512 return self._from_pyseries(f(other))

OverflowError: out of range integral type conversion attempted

itables version is 1.5.3. Polars version 0.18.15.

Jupyter package versions:

IPython          : 8.14.0
ipykernel        : 6.25.1
ipywidgets       : 8.1.0
jupyter_client   : 8.3.0
jupyter_core     : 5.3.1
jupyter_server   : 2.7.1
jupyterlab       : 4.0.5
nbclient         : 0.8.0
nbconvert        : 7.7.3
nbformat         : 5.9.2
notebook         : not installed
qtconsole        : not installed
traitlets        : 5.9.0
boschmic commented 10 months ago

The root cause is the comparison (x < JS_MIN_SAFE_INTEGER) in itables/datatables_format.py:87

Polars doesn't allow comparing unsigned integers to negative numbers, for instance following code throws OverflowErrror:

import numpy as np
import polars as pl
pl.Series(np.arange(4).astype('uint32')) < -42
mwouts commented 10 months ago

Thank you @boschmic for this report, and the detailed and helpful information. Let me look into this, that should be easy to fix !