Describe the bug
I saw a bug on from_records static method from pandera DataFrame when trying to sort columns that may be not required in the schema but are trying to be sorted:
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandera.
[X] (optional) I have confirmed this bug exists on the master branch of pandera.
Example
from typing import Optional
from pandera import Field, SchemaModel
from pandera.typing import Series, DataFrame
class Schema(SchemaModel):
state: Series[str]
city: Series[str]
price: Series[float]
postal_code: Optional[Series[int]] = Field(nullable=True)
raw_data = [
{
"state": "NY",
"city": "New York",
"price": 8.0,
},
{
"state": "FL",
"city": "Miami",
"price": 12.0,
},
]
DataFrame.from_records(Schema, raw_data)
and raises the following error: KeyError: "['postal_code'] not in index".
Also, as is currently implemented, the keys may not be sorted as schema says.
The expected behaviour is to return a pandera DataFrame with the required an existent columns sorted.
Environment
python: 3.11
pandera: 0.15.1
I created the following PR to solve this issue. For any additional concerns, let me know :).
Describe the bug I saw a bug on from_records static method from pandera DataFrame when trying to sort columns that may be not required in the schema but are trying to be sorted:
Example
and raises the following error: KeyError: "['postal_code'] not in index". Also, as is currently implemented, the keys may not be sorted as schema says. The expected behaviour is to return a pandera DataFrame with the required an existent columns sorted.
Environment
python: 3.11 pandera: 0.15.1
I created the following PR to solve this issue. For any additional concerns, let me know :).