Closed Filimoa closed 1 month ago
Looks like a bug in pandera.typing.Series
... I think you can try just the bare type and it should work:
class Schema(pa.DataFrameModel):
city: str
price: LiteralFloat = pa.Field(coerce=True)
The correct implementation of the custom dtype is also:
from pandera.api.polars.types import PolarsData
@polars_engine.Engine.register_dtype
@dtypes.immutable
class LiteralFloat(polars_engine.Float64): # π inherit from polars_engine.Float64, not the polars dtype
def coerce(self, polars_data: PolarsData) -> pl.LazyFrame: # π note the input and output signature
"""If comes across a string, remove commas and coerce it to a float. If it fails, return NaN."""
return polars_data.lazyframe.with_columns( # π must return a lazyframe
pl.col(polars_data.key)
.str.replace(",", "")
.cast(pl.Float64, strict=False)
)
See the polars engine DataType implementation for details on the signatures of these methods: https://github.com/unionai-oss/pandera/blob/main/pandera/engines/polars_engine.py#L91
I'll look into fixing the SchemaInitError: Invalid annotation 'price: pandera.typing.pandas.Series[__main__.LiteralFloat]'
issue, if you can, would be great if the polars docs can be updated with an example of a custom datatype: https://github.com/unionai-oss/pandera/blob/main/docs/source/polars.md
I'll look into fixing the SchemaInitError: Invalid annotation 'price: pandera.typing.pandas.Series[main.LiteralFloat]' issue
So the whole Series[TYPE]
syntax is only supported in the pandas DataFrameModel and will be deprecated in that API eventually... looking forward to new backends (in this case polars) the more concise bare type will be supported. I'll add a more informative error message here.
That worked, I'll open a PR shortly!
Describe the bug A clear and concise description of what the bug is.
I'm not sure if this is a bug, intentional or just missing documentation.
Code Sample, a copy-pastable example
With the pandas API this was possible - you could write some custom dtypes that perform some basic data cleaning. For example, in our case we had a
YesNoBool
that coerces -> "yes" / "no" to booleans. This was handy since we deal with hundreds of these columns and it's a pain to write transformation logic for each one.The documentation is pretty vague on this (not sure if this is an anti-pattern) but this was my best attempt on porting the code to polars.
Is this intentional?
Desktop (please complete the following information):
Screenshots
None
Additional context
I'll be glad to open a PR to update the docs if this is just a docs issue.