Traceback (most recent call last):
File "/Users/xyz/Library/Application Support/JetBrains/PyCharmCE2024.1/scratches/scratch_31.py", line 5, in <module>
df2 = df.filter(
^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/polars/dataframe/frame.py", line 4092, in filter
return self.lazy().filter(*predicates, **constraints).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/polars/utils/deprecation.py", line 100, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1788, in collect
return wrap_df(ldf.collect())
^^^^^^^^^^^^^
polars.exceptions.SchemaError: invalid series dtype: expected `Utf8`, got `cat`
Issue description
Not able to filter Categorical variables. I've tried:
pl.toggle_string_cache(True)
with pl.StringCache():
df2 = df.filter(
pl.col('foo').str.contains('a')
)
pl.Config.set_global_string_cache()
It seems like the API is changing every few months. It's a bit humorous that stackoverflow has comments saying it's X, and then only a couple months later it's Y. Can we just pick one and stick with it?
Checks
Reproducible example
Log output
Issue description
Not able to filter Categorical variables. I've tried:
It seems like the API is changing every few months. It's a bit humorous that stackoverflow has comments saying it's X, and then only a couple months later it's Y. Can we just pick one and stick with it?
Expected behavior
Filter a column based on its string value.
Installed versions