pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.23k stars 1.95k forks source link

SQL: combining CTE and CROSS JOIN leads to panic/unreachable error #17056

Open l1t1 opened 4 months ago

l1t1 commented 4 months ago

Checks

Reproducible example

>>> import polars as pl
>>> pl.__version__
'1.0.0-beta.1'
>>> sql = pl.SQLContext()
>>> sql.execute("with t as (select a from (values(1),(2))t1(a))select * from t cross join t", eager=True)
thread '<unnamed>' panicked at crates\polars-lazy\src\physical_plan\planner\lp.rs:616:20:
internal error: entered unreachable code
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python312\Lib\site-packages\polars\sql\context.py", line 437, in execute
    return res.collect() if (eager or self._eager_execution) else res
           ^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\polars\lazyframe\frame.py", line 1896, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: internal error: entered unreachable code

Log output

No response

Issue description

I use a CTE table to cross join itself, it raise errors.

Expected behavior

it returns value same as following two sql commands

>>> sql.execute("create table t as with t as (select a from (values(1),(2))t1(a))select * from t", eager=True)
shape: (1, 1)
┌──────────────┐
│ Response     │
│ ---          │
│ str          │
╞══════════════╡
│ CREATE TABLE │
└──────────────┘
>>> sql.execute("select * from t cross join t", eager=True)
shape: (4, 2)
┌─────┬─────┐
│ a   ┆ a:t │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 1   ┆ 1   │
│ 1   ┆ 2   │
│ 2   ┆ 1   │
│ 2   ┆ 2   │
└─────┴─────┘

Installed versions

``` >>> pl.show_versions() --------Version info--------- Polars: 1.0.0-beta.1 Index type: UInt32 Platform: Windows-10-10.0.19045-SP0 Python: 3.12.2 (tags/v3.12.2:6abddd9, Feb 6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: numpy: 1.26.4 openpyxl: pandas: pyarrow: 16.0.0 pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: ```
Darcy-Linde commented 3 months ago

I have a similar issue and it looks like the issue starts on version 0.20.31 because my code works until version 0.20.30. There was a change made to the join function in the 0.20.31 release as well -> https://github.com/pola-rs/polars/pull/16507/commits/17ab6662c3c465bfee3456ed668e021efc322cf5