Closed atigbadr closed 2 days ago
There should be a way to reduce the example down.
It does appear to be projection_pushdown
related:
>>> df.collect(projection_pushdown=False)
shape: (1, 73)
┌──────────────────┬──────────────────┬──────────────────┬───────┬───┬──────────────┬─────────────┬───────────┬───────────┐
│ SYSTEME_ELEMENTA ┆ PROCEDURE_END ┆ NUM_EQU_BIG ┆ ANNEE ┆ … ┆ SAFETY_CLASS ┆ PRIO_SAFETY ┆ ARRET_NEW ┆ ANNEE_NEW │
│ IRE ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ --- ┆ str ┆ str ┆ i64 ┆ ┆ str ┆ str ┆ str ┆ str │
│ str ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ │
╞══════════════════╪══════════════════╪══════════════════╪═══════╪═══╪══════════════╪═════════════╪═══════════╪═══════════╡
│ 1025951899328813 ┆ 3717447046043311 ┆ 1171031955136432 ┆ 30 ┆ … ┆ null ┆ 9 ┆ null ┆ null │
│ 7799 ┆ 022 ┆ 7107 ┆ ┆ ┆ ┆ ┆ ┆ │
└──────────────────┴──────────────────┴──────────────────┴───────┴───┴──────────────┴─────────────┴───────────┴───────────┘
Minimal repro:
import polars as pl
(pl.LazyFrame({'A': [1]})
.with_columns(B = 2)
.drop([], strict=False)
.rename({'A': 'C', 'B': 'A'})
.drop([], strict=False)
.collect()
)
# pyo3_runtime.PanicException:
# called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("C"))
Checks
Reproducible example
Log output
Issue description
Polars lazy mode struggles/bugs resolving columns properly with schema especially when using combination of rename/drop.
The code works in non lazy mode. The code also works when using
df = df.cast(df.collect_schema())
at some point before the final LoCEverything is in the code.
Data is written to parquet and scanned since this is where the original data is scanned from in our case.
Expected behavior
Lazy mode should be able to resolve columns properly.
Installed versions