Pivot with multiple columns, null column value causes column named 'null' and can cause duplicated columns

pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Other

30.23k stars 1.95k forks source link

Checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Another pivot one :)

df = pl.DataFrame({'a': [1,2,3], 'b':[4,5,6], 'c': ['a', None, None], 'd':[7,8,9]})
piv = df.pivot(index='a', columns=['c', 'd'], values='d')
piv.columns
['a', '{"a",7}', 'null', 'null']

Log output

No response

Issue description

Column names should be ['a', '{"a",7}', {'null', 8}, {'null', 9}], and duplicate columns should not be allowed.

Expected behavior

Column names should be ['a', '{"a",7}', {'null', 8}, {'null', 9}]

Installed versions

``` --------Version info--------- Polars: 0.20.7 Index type: UInt32 Platform: macOS-12.3.1-arm64-arm-64bit Python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:36:57) [Clang 15.0.7 ] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fsspec: gevent: hvplot: matplotlib: numpy: 1.26.3 openpyxl: pandas: 2.1.4 pyarrow: 15.0.0 pydantic: 2.6.1 pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```

In [1]: df = pl.DataFrame({'a': [1,2,3], 'b':[4,5,6], 'c': ['a', None, None], 'd':[7,8,9]}) ...: piv = df.pivot(index='a', columns=['c', 'd'], values='d') --------------------------------------------------------------------------- DuplicateError Traceback (most recent call last) <ipython-input-1-67b010500cd5> in ?() 1 df = pl.DataFrame({'a': [1,2,3], 'b':[4,5,6], 'c': ['a', None, None], 'd':[7,8,9]}) ----> 2 piv = df.pivot(index='a', columns=['c', 'd'], values='d') ~/polars-dev/py-polars/polars/dataframe/frame.py in ?(self, values, index, columns, aggregate_function, maintain_order, sort_columns, separator) 7431 else: 7432 aggregate_expr = aggregate_function._pyexpr 7433 7434 return self._from_pydf( -> 7435 self._df.pivot_expr( 7436 values, 7437 index, 7438 columns, DuplicateError: column with name 'null' has more than one occurrences

pola-rs / polars