Closed zevisert closed 2 years ago
hi @zevisert, it doesn't look like modin can handle string[python]
dtypes for indexes (it'll just silently convert it to object
)
index = pd.Index(
[
"should",
"not",
"throw",
"during",
"schema",
"validate",
],
dtype="string[python]"
)
series = pd.Series(
[
"should",
"not",
"throw",
"during",
"schema",
"validate",
],
dtype="string[python]"
)
print("Index:", index)
print("Series:", series)
output:
Index: Index(['should', 'not', 'throw', 'during', 'schema', 'validate'], dtype='object')
Series: 0 should
1 not
2 throw
3 during
4 schema
5 validate
dtype: string
You can specify the str
or object
types in your schema, which should work just fine:
class Example(pa.SchemaModel):
index: P.Index[str]
class Config:
coerce = True
would probably make sense to raise an error for cases like these where the pandas-like implementation (modin, koalas, etc.) silently casts types to some fallback type
Interesting, I didn't notice that this was on modin. Thanks!
Describe the bug A clear and concise description of what the bug is.
P.STRING
fails validation / coercion when usingpandera[modin-ray]
.Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Expected behavior
A clear and concise description of what you expected to happen.
When using
pandera[modin-ray]
, we should be able to coerce "object"-strings topandas.StringDtype
strings, even in an index.Desktop (please complete the following information):
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
Repro repository is available here: