unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.38k stars 310 forks source link

Bug while ordering optional keys from schema in static method from_records from pandera df #1241

Closed manel-ab closed 1 year ago

manel-ab commented 1 year ago

Describe the bug I saw a bug on from_records static method from pandera DataFrame when trying to sort columns that may be not required in the schema but are trying to be sorted:

Example

from typing import Optional
from pandera import Field, SchemaModel
from pandera.typing import Series, DataFrame
class Schema(SchemaModel):
    state: Series[str]
    city: Series[str]
    price: Series[float]
    postal_code: Optional[Series[int]] = Field(nullable=True)

raw_data = [
    {
        "state": "NY",
        "city": "New York",
        "price": 8.0,
    },
    {
        "state": "FL",
        "city": "Miami",
        "price": 12.0,
    },
]
DataFrame.from_records(Schema, raw_data)

and raises the following error: KeyError: "['postal_code'] not in index". Also, as is currently implemented, the keys may not be sorted as schema says. The expected behaviour is to return a pandera DataFrame with the required an existent columns sorted.

Environment

python: 3.11 pandera: 0.15.1

I created the following PR to solve this issue. For any additional concerns, let me know :).

manel-ab commented 1 year ago

fixed by 1238.