Closed obiii closed 2 weeks ago
I'm not a pandera user - but this is my understanding of why it is failing:
It seems the failure_case
column can be a string or a struct.
In the case of a struct, this fails:
import polars as pl
df = pl.DataFrame({
'failure_case': [{'case_id': 'abc', 'extract_date': None}]
})
df.with_columns(pl.col("failure_case").cast(pl.String))
# ComputeError: conversion from `struct[2]` to `str` failed in column ...
A struct can be "stringified" in Polars via .struct.json_encode()
>>> df.with_columns(pl.col("failure_case").struct.json_encode())
shape: (1, 1)
┌───────────────────────────────────────┐
│ failure_case │
│ --- │
│ str │
╞═══════════════════════════════════════╡
│ {"case_id":"abc","extract_date":null} │
└───────────────────────────────────────┘
But I'm not sure if that's what pandera wants to do in this case.
Good catch! #1608 should address this
Hi @cosmicBboy
I was previously using the 0.19.3b that I installed using Pip install pre ‘pandera[polars]’
I dnt see the new tag woth your PR. Can you please let me know how do I use/install the updatws you have made in this PR?
Just cut a new beta release: https://github.com/unionai-oss/pandera/releases/tag/v0.19.0b4
Thanks!
Describe the bug We are trying a simple validation example using polars. We cant understand the problem or why it originates. But it throws polars.exceptions.ComputeError exception when any of the validation fails and there is null in data.
For example, in the code below, the dummy data contains extract_date feature with a None. It runs fine if the case_id are all int convertible string but throws the exception if any of the case_id is not int convertible.
Here is the code:
It gives: 'conversion from
struct[29]
tostr
failed in column 'failure_case' for 1 out of 1 values [{"abc","f",null}] If you uncomment "case_id": ["1", "2", "3"]
, and comment"case_id": ["1", "2", "abc"]
it runs fine.Not sure why it panics when there are nulls. If there are no nulls in the data it works fine.
The trace we get is:
Expected behavior
It should work with column that have null and are set nullable=True
versions
pandera: 0.19.0b3 polars: 0.20.23 python: 3.11