pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.23k stars 1.67k forks source link

Panic when displaying LazyFrame in Jupyter #16252

Closed mdavis-xyz closed 1 month ago

mdavis-xyz commented 1 month ago

Checks

Reproducible example

Run this in Jupyter (which automatically tries to print whatever the last line returns). Note that the column I'm selecting does not exist.

import polars as pl
(
    pl.LazyFrame(
        {
            "a": [1, 2],
        }
    )
    .with_columns(pl.col("b"))
    .select("b")
)

Log output

image

(The verbose flag didn't seem to output anything additional.)


---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[2], line 2
      1 import polars as pl
----> 2 (
      3     pl.LazyFrame(
      4         {
      5             "a": [1, 2],
      6         }
      7     )
      8     .with_columns(pl.col("b"))
      9     .select("b")
     10 )

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\displayhook.py:268, in DisplayHook.__call__(self, result)
    266 self.start_displayhook()
    267 self.write_output_prompt()
--> 268 format_dict, md_dict = self.compute_format_data(result)
    269 self.update_user_ns(result)
    270 self.fill_exec_result(result)

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\displayhook.py:157, in DisplayHook.compute_format_data(self, result)
    127 def compute_format_data(self, result):
    128     """Compute format data of the object to be displayed.
    129 
    130     The format data is a generalization of the :func:`repr` of an object.
   (...)
    155 
    156     """
--> 157     return self.shell.display_formatter.format(result)

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\formatters.py:179, in DisplayFormatter.format(self, obj, include, exclude)
    177 md = None
    178 try:
--> 179     data = formatter(obj)
    180 except:
    181     # FIXME: log the exception
    182     raise

File ~\AppData\Local\anaconda3\Lib\site-packages\decorator.py:232, in decorate.<locals>.fun(*args, **kw)
    230 if not kwsyntax:
    231     args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\formatters.py:223, in catch_format_error(method, self, *args, **kwargs)
    221 """show traceback on failed format call"""
    222 try:
--> 223     r = method(self, *args, **kwargs)
    224 except NotImplementedError:
    225     # don't warn on NotImplementedErrors
    226     return self._check_return(None, args[0])

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\formatters.py:344, in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:

File ~\AppData\Local\anaconda3\Lib\site-packages\polars\lazyframe\frame.py:533, in LazyFrame._repr_html_(self)
    531 def _repr_html_(self) -> str:
    532     try:
--> 533         dot = self._ldf.to_dot(optimized=False)
    534         svg = subprocess.check_output(
    535             ["dot", "-Nshape=box", "-Tsvg"], input=f"{dot}".encode()
    536         )
    537         return (
    538             "<h4>NAIVE QUERY PLAN</h4><p>run <b>LazyFrame.show_graph()</b> to see"
    539             f" the optimized version</p>{svg.decode()}"
    540         )

PanicException: io error: Error

Issue description

In Jupyter, if I have made a mistake with my polars operations (e.g. selecting a column which doesn't exist), when Jupyter tries to print the result, I get a Panic.

Note that I'm only able to reproduce the error with both .select() and .with_columns(). On their own each one does not cause the Panic.

Expected behavior

Since a .collect() would result in an error, I expect that a string representation of the lazyframe would either:

naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)

SELECT [col("a")] FROM

WITH_COLUMNS:

[col("b")]

DF ["b"]; PROJECT */1 COLUMNS; SELECTION: "None"

My understanding is that a Panic is never the intended behavior, and that all errors should be handled more gracefully.

Installed versions

``` --------Version info--------- Polars: 0.20.26 Index type: UInt32 Platform: Windows-10-10.0.19045-SP0 Python: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: 2.2.1 connectorx: deltalake: fastexcel: fsspec: 2023.4.0 gevent: hvplot: 0.8.4 matplotlib: 3.7.2 nest_asyncio: 1.5.6 numpy: 1.24.3 openpyxl: 3.0.10 pandas: 2.0.3 pyarrow: 11.0.0 pydantic: 1.10.8 pyiceberg: pyxlsb: sqlalchemy: 1.4.39 torch: xlsx2csv: xlsxwriter: 3.2.0 ```
cmdlineluser commented 1 month ago

Can be reproduced outside of Jupyter by forcing the .to_dot() call:

(
    pl.LazyFrame({"a": [1, 2]})
    .with_columns(pl.col("b"))
    .select("b")
    ._ldf
    .to_dot(optimized=False)
)
# could not determine schema
# thread '<unnamed>' panicked at crates/polars-lazy/src/dot.rs:49:14:
# io error: Error
# PanicException: io error: Error
cmdlineluser commented 1 month ago

I think this is actually fixed on main due to https://github.com/pola-rs/polars/pull/16237

On main I get the ColumnNotFoundError as expected:

ColumnNotFoundError: b

This error occurred with the following context stack:
    [1] 'with_columns' failed
    [2] 'select' input failed to resolve
ritchie46 commented 1 month ago

Yes, this is fixed.