pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.13k stars 1.83k forks source link

CSE panic #10824

Closed 0xbe7a closed 1 year ago

0xbe7a commented 1 year ago

Checks

Reproducible example

import polars as pl

pl.show_versions()

v = pl.col("a") / pl.col("b")
magic = pl.when(v > 0).then(pl.lit(float("nan"))).otherwise(v)
df = (
    pl.DataFrame(
        {
            "a": [1.],
            "b": [1.],
        }
    )
    .lazy()
    .select(magic)
    .collect(comm_subexpr_elim=True)
)

Issue description

When executing with comm_subexpr_elim=True polars panics

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[96], line 16
      5 v = pl.col("a") / pl.col("b")
      6 magic = pl.when(v > 0).then(pl.lit(float("nan"))).otherwise(v)
      7 df = (
      8     pl.DataFrame(
      9         {
     10             "a": [1.],
     11             "b": [1.],
     12         }
     13     )
     14     .lazy()
     15     .select(magic)
---> 16     .collect(comm_subexpr_elim=True)
     17 )

File /opt/conda/envs/kap/lib/python3.11/site-packages/polars/utils/deprecation.py:95, in deprecate_renamed_parameter..decorate..wrapper(*args, **kwargs)
     90 @wraps(function)
     91 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     92     _rename_keyword_argument(
     93         old_name, new_name, kwargs, function.__name__, version
     94     )
---> 95     return function(*args, **kwargs)

File /opt/conda/envs/kap/lib/python3.11/site-packages/polars/lazyframe/frame.py:1695, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, streaming)
   1683     comm_subplan_elim = False
   1685 ldf = self._ldf.optimization_toggle(
   1686     type_coercion,
   1687     predicate_pushdown,
   (...)
   1693     streaming,
   1694 )
-> 1695 return wrap_df(ldf.collect())

PanicException: called `Option::unwrap()` on a `None` value

Expected behavior

same behavior as comm_subexpr_elim=False

Installed versions

``` --------Version info--------- Polars: 0.19.0 Index type: UInt32 Platform: Linux-5.4.0-1103-aws-x86_64-with-glibc2.31 Python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0] ----Optional dependencies---- adbc_driver_sqlite: cloudpickle: connectorx: deltalake: fsspec: matplotlib: 3.7.2 numpy: 1.25.2 pandas: 1.5.3 pyarrow: 12.0.0 pydantic: 1.10.12 sqlalchemy: 1.4.49 xlsx2csv: xlsxwriter: 3.1.2 ```
reswqa commented 1 year ago

I guess f32::NaN != f32::NaN causes the key(i.e. identifiers) of the hashmap to be considered unequal for this expr.

ritchie46 commented 1 year ago

Ai.. we need to modify the equality of expressions to check the binary floating point values for equality.

We want to know if expressions are equal, not if the underlying values are equal conforming the float spec.