Open TimotejPalus opened 1 month ago
It seems like it is run twice, but in the resultatn pd.Dataframe only the output from the first run of the parser ispresent:
code:
data = pd.DataFrame({
"a": [2.0, 4.0, 9.0],
"b": [2.0, 4.0, 9.0],
"c": [2.0, 4.0, 9.0],
})
class DFModel(pa.DataFrameModel):
a: float
b: float
c: float
@pa.parser("b")
def negate(cls, series):
print('\n -------------',f'\nbefore parsing: {series.tolist()}', f'\nafter parsing: {(series + 1).tolist()}')
return series + 1
data = DFModel.validate(data)
print('\n -------------',f'\nResulting "b" column in the "data" pd.DataFrame: {data["b"].tolist()}')
console:
-------------
before parsing: [2.0, 4.0, 9.0]
after parsing: [3.0, 5.0, 10.0]
-------------
before parsing: [3.0, 5.0, 10.0]
after parsing: [4.0, 6.0, 11.0]
-------------
Resulting "b" column in the "data" pd.DataFrame: [3.0, 5.0, 10.0]
Also mentioned in #1684
Describe the bug Hello, It seems like the parse function is called twice for a specified column of given pandas dataframe. Please check sample code and sample output.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Slightly modified example from https://pandera.readthedocs.io/en/stable/parsers.html#parsers-in-dataframemodel
Printed to console
Expected behavior
From what is printed to the console it is obvious that the negate is run twice. I would expect for the parser to be run once. I was not able to find in the documentation why this is so. From what i have googled i found similar issue: https://github.com/unionai-oss/pandera/issues/1707
Additional context
pandera version: '0.20.4'
Thank you very much :)