pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.11k stars 1.83k forks source link

Regression: unpivot (melt) with list type of values #17501

Open s-b90 opened 2 months ago

s-b90 commented 2 months ago

Checks

Reproducible example

import polars as pl

data = {
    'idx': [0, 1, 2],
    'x': [1, 2, 3],
    'y': ['a', 'b', 'c'],
    'z': [[1, 1], [2, 2], [3, 3]],
    'w': [['a', 'a'], ['b', 'b'], ['c', 'c']]
}

df = pl.DataFrame(data).melt(
    id_vars='idx',
    value_vars=['x', 'y', 'z', 'w'],
    value_name='my_name'
)

Log output

polars.exceptions.InvalidOperationError: 'unpivot' not supported for dtype: list[i64]

Issue description

It looks like there's a super type issue between list and non-list values in Polars 1.*. This code worked fine in version 0.20.31.

Expected behavior

idx variable my_name
i64 str list[str]
----- ---------- ------------
0 x ["1"]
1 x ["2"]
2 x ["3"]
0 y ["a"]
1 y ["b"]
1 z ["2", "2"]
2 z ["3", "3"]
0 w ["a", "a"]
1 w ["b", "b"]
2 w ["c", "c"]

Installed versions

``` --------Version info--------- Polars: 1.1.0 Index type: UInt32 Platform: Linux-6.8.0-36-generic-x86_64-with-glibc2.39 Python: 3.11.9 (main, Apr 27 2024, 21:16:11) [GCC 13.2.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: 1.6.0 numpy: 1.26.1 openpyxl: pandas: pyarrow: 16.1.0 pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: None ```
MarcoGorelli commented 2 months ago

thanks @s-b90 , taking a look

MarcoGorelli commented 2 months ago

It looks like there's a super type issue between list and non-list values in Polars 1.*. This code worked fine in version 0.20.31.

yup, from git bisect this is due to #16918

maybe the error message can be improved in this case

you can retain the old behaviour with

df = (
    pl.DataFrame(data)
    .with_columns(pl.concat_list("x"), pl.concat_list("y"))
    .unpivot(["x", "y", "z", "w"], index="idx", value_name="my_name")
)