The SchemaModel.validate arguments sample, head, and tail only work with hashable column types. Sometimes it's desirable to put unhashable types (like numpy.ndarray) into the columns of a DataFrame. The presence of such a column shouldn't break schema validation.
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandera.
[x] (optional) I have confirmed this bug exists on the master branch of pandera.
Code Sample, a copy-pastable example
import numpy as np
import pandas as pd
import pandera as pa
class ExampleSchema(pa.SchemaModel):
unhashable: pa.typing.Series[pa.Object] = pa.Field()
df = pd.DataFrame(dict(unhashable=[np.random.rand(4) for _ in range(10)]))
print("validating without subsampling...")
ExampleSchema.validate(df)
print("validating with 'sample'...")
try:
ExampleSchema.validate(df, sample=1)
raise RuntimeError("unreachable")
except TypeError as e:
print(f" expected exception thrown: {e}")
Output:
validating without subsampling...
validating with 'sample'...
expected exception thrown: unhashable type: 'numpy.ndarray'
Expected behavior
Subsampling should work when unhashable types are present as columns.
Additional Comments
pandera is a great module. Willing to look into this when I can find the time.
The
SchemaModel.validate
argumentssample
,head
, andtail
only work with hashable column types. Sometimes it's desirable to put unhashable types (likenumpy.ndarray
) into the columns of a DataFrame. The presence of such a column shouldn't break schema validation.Code Sample, a copy-pastable example
Output:
Expected behavior
Subsampling should work when unhashable types are present as columns.
Additional Comments
pandera
is a great module. Willing to look into this when I can find the time.