markfairbanks / tidypolars

Tidy interface to polars
http://tidypolars.readthedocs.io
MIT License
337 stars 11 forks source link

`if_else` won't create string output #239

Open kyleGrealis opened 3 months ago

kyleGrealis commented 3 months ago

I'm a new Python user, but long-time R user. While getting familiar with using tidypolars, I noticed a quirk that I cannot quite figure out. When trying to create a new variable with string output, I'm getting a ColumnNotFoundError, so I'm hoping it's something easy that I'm overlooking.

tp.Tibble({
  'Letter': ['a', 'a', 'a', 'b', 'b'],
}).mutate(
  new_col = tp.if_else(str_detect('Letter', 'a'), 1, 0)
)

and

tp.Tibble({
  'Letter': ['a', 'a', 'a', 'b', 'b'],
}).mutate(
  new_col = tp.if_else(str_detect('Letter', 'a'), 1, 0),
  new_col_2 = tp.if_else(str_detect('Letter', 'a'), col('Letter'), None)
)

can produce the expected output. However, this yields the aforementioned error:

tp.Tibble({
  'Letter': ['a', 'a', 'a', 'b', 'b'],
}).mutate(
  new_col = tp.if_else(str_detect('Letter', 'a'), 1, 0),
  new_col_2 = tp.if_else(str_detect('Letter', 'a'), col('Letter'), None),
  new_col_3 = tp.if_else(str_detect('Letter', 'a'), 'yes', 'no')      # PROBLEMATIC line of code
)

Please regard:

ColumnNotFoundError                       Traceback (most recent call last)
Cell In[21], [line 3](vscode-notebook-cell:?execution_count=21&line=3)
      [1](vscode-notebook-cell:?execution_count=21&line=1) # %%
----> [3](vscode-notebook-cell:?execution_count=21&line=3) tp.Tibble({
      [4](vscode-notebook-cell:?execution_count=21&line=4)   'Letter': ['a', 'a', 'a', 'b', 'b'],
      [5](vscode-notebook-cell:?execution_count=21&line=5) }).mutate(
      [6](vscode-notebook-cell:?execution_count=21&line=6)   new_col = tp.if_else(str_detect('Letter', 'a'), 1, 0),
      [7](vscode-notebook-cell:?execution_count=21&line=7)   new_col_2 = tp.if_else(str_detect('Letter', 'a'), col('Letter'), None),
      [8](vscode-notebook-cell:?execution_count=21&line=8)   new_col_3 = tp.if_else(str_detect('Letter', 'a'), 'yes', 'no')
      [9](vscode-notebook-cell:?execution_count=21&line=9) )

File c:\.venv\lib\site-packages\tidypolars\tibble.py:406, in Tibble.mutate(self, by, *args, **kwargs)
    [404](file:///C:/.venv/lib/site-packages/tidypolars/tibble.py:404)     out = out.groupby(by).apply(lambda x: _mutate_cols(x, exprs))
    [405](file:///C:/.venv/lib/site-packages/tidypolars/tibble.py:405) else:
--> [406](file:///C:/.venv/lib/site-packages/tidypolars/tibble.py:406)     out = _mutate_cols(out, exprs)
    [408](file:///C:/.venv/lib/site-packages/tidypolars/tibble.py:408) return out.pipe(from_polars)

File c:\.venv\lib\site-packages\tidypolars\utils.py:110, in _mutate_cols(df, exprs)
    [108](file:///C:/.venv/lib/site-packages/tidypolars/utils.py:108) def _mutate_cols(df, exprs):
    [109](file:///C:/.venv/lib/site-packages/tidypolars/utils.py:109)     for expr in exprs:
--> [110](file:///C:/.venv/lib/site-packages/tidypolars/utils.py:110)         df = df.with_columns(expr)
    [111](file:///C:/.venv/lib/site-packages/tidypolars/utils.py:111)     return df

File c:\.venv\lib\site-packages\polars\dataframe\frame.py:8634, in DataFrame.with_columns(self, *exprs, **named_exprs)
   [8488](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8488) def with_columns(
   [8489](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8489)     self,
   [8490](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8490)     *exprs: IntoExpr | Iterable[IntoExpr],
   [8491](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8491)     **named_exprs: IntoExpr,
   [8492](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8492) ) -> DataFrame:
   [8493](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8493)     """
   [8494](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8494)     Add columns to this DataFrame.
   [8495](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8495) 
   (...)
   [8632](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8632)     └─────┴──────┴─────────────┘
   [8633](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8633)     """
-> [8634](file:///C:/.venv/lib/site-packages/polars/dataframe/frame.py:8634)     return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)

File c:\.venv\lib\site-packages\polars\lazyframe\frame.py:1967, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
   [1964](file:///C:/.venv/lib/site-packages/polars/lazyframe/frame.py:1964) # Only for testing purposes atm.
   [1965](file:///C:/.venv/lib/site-packages/polars/lazyframe/frame.py:1965) callback = _kwargs.get("post_opt_callback")
-> [1967](file:///C:/.venv/lib/site-packages/polars/lazyframe/frame.py:1967) return wrap_df(ldf.collect(callback))

ColumnNotFoundError: yes

Please tell me this is something I'm missing as it's difficult to find any help on SO as well.

Thank you, Kyle