pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.23k stars 1.95k forks source link

Panic if tuple kwarg passed to `df.with_columns(col_name = accidental_tuple_here)` #17413

Open DeflateAwning opened 4 months ago

DeflateAwning commented 4 months ago

Checks

Reproducible example

import polars as pl

# Step 1: Create a sample dataframe
df = pl.DataFrame({
    "col1": ["string1", "string2", "string3", "string4"],
    "col2": ["text1", "text2", "text3", "text4"]
})

# Step 2: Add a new column with a trailing comma after the kwarg column spec, making it a tuple
df = df.with_columns(
    new_col=(
        pl.lit("new_value"),  # <- extra comma here
    ),
)
print(df)

Log output

No response

Issue description

Polars panics if a tuple kwarg is passed to df.with_columns(...)

For example: df.with_columns(col_name = accidental_tuple_here)

If you do the same thing with a df.select(...), you get the following output:

shape: (1, 1)
┌─────────────────────┐
│ new_col             │
│ ---                 │
│ list[extension]     │
╞═════════════════════╡
│ [String(new_value)] │
└─────────────────────┘

While I suppose you could technically want that for some insane metaprogramming thing, I think that both should result in an error like "You probably made a typo. Remove the extra comma you added by accident at the end of the {col_name} kwarg spec."

Expected behavior

I think that both df.with_columns and df.select should result in an error like "You probably made a typo. Remove the extra comma you added by accident at the end of the {col_name} kwarg spec.", if this issues is encountered.

At the very least, it shouldn't panic.

Installed versions

Tried with both v0.20.25 and v1.0.0.

``` --------Version info--------- Polars: 1.0.0 Index type: UInt32 Platform: Linux-6.5.0-1025-oem-x86_64-with-glibc2.35 Python: 3.11.9 (main, Apr 6 2024, 17:59:24) [GCC 11.4.0] ----Optional dependencies---- adbc_driver_manager: 0.9.0 cloudpickle: connectorx: 0.3.2 deltalake: fastexcel: fsspec: 2024.6.0 gevent: great_tables: hvplot: 0.9.2 matplotlib: 3.8.0rc1 nest_asyncio: 1.5.7 numpy: 1.26.4 openpyxl: 3.1.2 pandas: 2.2.2 pyarrow: 16.1.0 pydantic: 2.7.4 pyiceberg: sqlalchemy: 2.0.25 torch: xlsx2csv: 0.8.1 xlsxwriter: ```
cmdlineluser commented 4 months ago

Can reproduce.

It seems to happen with lists also.

>>> pl.DataFrame({"x": [1]}).with_columns(y = [pl.lit(3)])
shape: (1, 2)
┌─────┬─────────────────┐
│ x   ┆ y               │
│ --- ┆ ---             │
│ i64 ┆ list[extension] │
╞═════╪═════════════════╡
│ 1   ┆ [dyn int: 3]    │
└─────┴─────────────────┘
pl.DataFrame({"x": [1, 2]}).with_columns(y = [pl.lit(3)])
# PanicException: called `Result::unwrap()` on an `Err` value: InvalidOperation(ErrString("`list_builder` operation not supported for dtype `object`"))