pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.72k stars 2k forks source link

`pl.scan_ipc(…)` fails when followed by `.head(0)` #17025

Open aersam opened 5 months ago

aersam commented 5 months ago

Checks

Reproducible example

import polars as pl

sql_cont = pl.SQLContext()
sql_cont.register("test_fake_arrow", pl.scan_ipc("tests/data/arrow/fake.arrow"))
print(sql_cont.execute("SELECT * FROM test_fake_arrow s LIMIT 0").collect())

Here's the file I use: fake.zip

Log output

executing ipc read sync with row_index = None, n_rows = Some(0), predicate = false for paths ["tests/data/arrow/fake.arrow"]
thread '<unnamed>' panicked at C:\a\polars\polars\crates\polars-core\src\utils\mod.rs:740:34:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0:     0x7ff9c5e732f7 - ffi_select_with_compiled_path
   1:     0x7ff9c2d6c039 - PyInit_polars
   2:     0x7ff9c5e55e27 - ffi_select_with_compiled_path
   3:     0x7ff9c5e75e56 - ffi_select_with_compiled_path
   4:     0x7ff9c5e754e7 - ffi_select_with_compiled_path
   5:     0x7ff9c5e766d1 - ffi_select_with_compiled_path
   6:     0x7ff9c5e76069 - ffi_select_with_compiled_path
   7:     0x7ff9c5e75fef - ffi_select_with_compiled_path
   8:     0x7ff9c5e75fd8 - ffi_select_with_compiled_path
   9:     0x7ff9c5fb35f4 - ffi_select_with_compiled_path
  10:     0x7ff9c5fb378d - ffi_select_with_compiled_path
  11:     0x7ff9c5fb3bde - ffi_select_with_compiled_path
  12:     0x7ff9c44ef747 - ffi_select_with_compiled_path
  13:     0x7ff9c44ecbb9 - ffi_select_with_compiled_path
  14:     0x7ff9c44eb8da - ffi_select_with_compiled_path
  15:     0x7ff9c434725c - ffi_select_with_compiled_path
  16:     0x7ff9c2ba3880 - <unknown>
  17:     0x7ff9c22c7c8c - <unknown>
  18:     0x7ff9c2bc3401 - <unknown>
  19:     0x7ff9d7536282 - PyUnicode_ToDecimalDigit
  20:     0x7ff9d74af52b - PyObject_Vectorcall
  21:     0x7ff9d74b08d4 - PyEval_EvalFrameDefault
  22:     0x7ff9d752bbd3 - PyMapping_Check
  23:     0x7ff9d752b453 - PyEval_EvalCode
  24:     0x7ff9d754f83e - PyArena_Free
  25:     0x7ff9d754f7ba - PyArena_Free
  26:     0x7ff9d7644666 - PyThread_tss_is_created
  27:     0x7ff9d759ac89 - PyRun_SimpleFileObject
  28:     0x7ff9d75d1a18 - PyRun_AnyFileObject
  29:     0x7ff9d75d165b - PySys_GetSizeOf
  30:     0x7ff9d75d1517 - PySys_GetSizeOf
  31:     0x7ff9d756442c - Py_RunMain
  32:     0x7ff9d75642bd - Py_RunMain
  33:     0x7ff9d74ee76d - Py_Main
  34:     0x7ff6f5801230 - <unknown>
  35:     0x7ffa7b517344 - BaseThreadInitThunk
  36:     0x7ffa7b73cc91 - RtlUserThreadStart
Traceback (most recent call last):
  File "C:\Projects\BMS_Github\lakeapi\repo.py", line 5, in <module>
    print(sql_cont.execute("SELECT * FROM test_fake_arrow s LIMIT 0").collect())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\BMS_Github\lakeapi\.venv\Lib\site-packages\polars\lazyframe\frame.py", line 1848, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value

Issue description

Polars SQL raises when using Limit query

Expected behavior

well, should not raise :)

Installed versions

``` --------Version info--------- Polars: 1.0.0-alpha.1 Index type: UInt32 Platform: Windows-10-10.0.19045-SP0 Python: 3.11.1 (tags/v3.11.1:a7a450f, Dec 6 2022, 19:58:39) [MSC v.1934 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: 1.0.0 cloudpickle: connectorx: deltalake: 0.18.1 fastexcel: fsspec: 2024.6.0 gevent: hvplot: matplotlib: nest_asyncio: numpy: 1.26.4 openpyxl: pandas: 2.2.2 pyarrow: 16.1.0 pydantic: 2.7.4 pyiceberg: sqlalchemy: 2.0.18 torch: xlsx2csv: 0.8.2 xlsxwriter: 3.2.0 ```
aersam commented 5 months ago

It does not fail with limit=1, interestingly

cmdlineluser commented 5 months ago

Seems like it may be a general issue with scan_ipc

pl.DataFrame({"A": [1]}).write_ipc("1.arrow")
pl.scan_ipc("1.arrow").head(0).collect()

# thread '<unnamed>' panicked at ./polars/crates/polars-core/src/utils/mod.rs:740:34:
# called `Option::unwrap()` on a `None` value
alexander-beedie commented 5 months ago

Indeed; not SQL related - I'll update the issue title 👌

pl.sql("""
  SELECT * FROM (VALUES(1,2),(3,4)) tbl(a,b) 
  LIMIT 0
""").collect()
# shape: (0, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i32 ┆ i32 │
# ╞═════╪═════╡
# └─────┴─────┘

Error originates from accumulate_dataframes_vertical inside "crates/polars-core/src/utils/mod.rs".