pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

[BUG] polars' `pivot_longer_spec` empty list #1405

Closed samukweku closed 1 month ago

samukweku commented 1 month ago

Working on a blog post for polars' pivot_longer and noticed a silly bug:

{
    "name": "IndexError",
    "message": "list index out of range",
    "stack": "---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[75], line 1
----> 1 pivot_longer_spec(df=treatments, spec=spec)

File ~/mambaforge/envs/blogger/lib/python3.10/site-packages/janitor/polars/pivot_longer.py:154, in pivot_longer_spec(df, spec)
    147 index = [
    148     label for label in df_columns if label not in spec.get_column(\".name\")
    149 ]
    150 others = [
    151     label for label in spec_columns if label not in {\".name\", \".value\"}
    152 ]
--> 154 if (len(others) == 1) & (spec.get_column(others[0]).dtype == pl.String):
    155     # shortcut that avoids the implode/explode approach - and is faster
    156     # if the requirements are met
    157     # inspired by https://github.com/pola-rs/polars/pull/18519#issue-2500860927
    158     return _pivot_longer_dot_value_string(
    159         df=df,
    160         index=index,
    161         spec=spec,
    162         variable_name=others[0],
    163     )
    164 variable_name = \"\".join(df_columns + spec_columns)

IndexError: list index out of range"
}

The code does not handle the scenario where others could be empty