pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.51k stars 1.68k forks source link

StackOverflow when using `collect_all` but not `collect` #14079

Open TylerGrantSmith opened 5 months ago

TylerGrantSmith commented 5 months ago

Checks

Reproducible example

import polars as pl

# NVARS = 117 # works
NVARS = 118
COLUMNS = list(map(str, range(NVARS)))
FACTOR_COL = "factor"

lookups = {c: pl.LazyFrame({c: ["a"], FACTOR_COL: [1.0]}) for c in COLUMNS}
ldf = pl.LazyFrame({c: ["a"] for c in COLUMNS})
for col, lookup in lookups.items():
    ldf = ldf.join(lookup.rename({FACTOR_COL: col + "_" + FACTOR_COL}), on=col, how="left")

# runs
print(pl.collect_all([ldf]))

# runs fine with collect
print(ldf.select(COLUMNS[0]).collect())

# seg faults when using collect_all
print(pl.collect_all([ldf.select(COLUMNS[0])]))

Log output

join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
join parallel: true
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished
LEFT join dataframes finished

Issue description

This is a reprex distilled from a more complicated internal process. The value of NVARS at which the errors start varies by machine. On my current machine the value decreased going from 0.20.4 to 0.20.6

Expected behavior

To run without segfaulting.

Installed versions

``` --------Version info--------- Polars: 0.20.6 Index type: UInt32 Platform: Linux-4.14.326-245.539.amzn2.x86_64-x86_64-with-glibc2.26 Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fsspec: 2023.12.2 gevent: hvplot: matplotlib: 3.8.2 numpy: 1.26.2 openpyxl: 3.1.2 pandas: 2.1.4 pyarrow: 13.0.0 pydantic: 2.5.2 pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```
cmdlineluser commented 5 months ago

Can reproduce.

If it helps with debugging: it runs when projection_pushdown is disabled.

>>> print(pl.collect_all([ldf.select(COLUMNS[0])], projection_pushdown=False))
[shape: (1, 1)
┌─────┐
│ 0   │
│ --- │
│ str │
╞═════╡
│ a   │
└─────┘]
MarcNuebel commented 5 months ago

Can reproduce and projection_pushdown=False only seems to make a higher value for NVARS possible for me before segfault

ritchie46 commented 5 months ago

You segfault because we StackOverFlow. This happens at a certain NVARS.

saoudm commented 4 months ago

I’m facing the same issue, and it seems to happen also with collect. I’ll try to work on a reproduction.

TylerGrantSmith commented 2 months ago

Thanks all, it looks like the issue was resolved along with many others as of 0.20.17

TylerGrantSmith commented 2 months ago

Pre-mature....the example passed, but still overflows (at ~N=500) now