unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.38k stars 310 forks source link

FastAPI OpenAPI spec of an endpoint using Pandera not working with Pydantic v2 #1395

Closed airibarne closed 5 months ago

airibarne commented 1 year ago

Describe the bug When using Pydantic v2, FastAPI is not able to render OpenAPI specs when having a Pandera schema model as endpoint input or output, as it used to be with Pydantic v1.

Code Sample, a copy-pastable example

Having the following requirements installed

python = "^3.10"
fastapi = "^0.104.0"
pydantic = "^2.4.2"
pandera = "^0.17.2"

The following code would fail:


import pandas as pd
from fastapi import FastAPI
from pandera import SchemaModel, Field
from pandera.typing import DataFrame, Series

class Transactions(SchemaModel):
    id: Series[int]
    cost: Series[float] = Field(ge=0, le=1000)

    class Config:
        coerce = True
        to_format = "dict"
        to_format_kwargs = {"orient": "records"}

app = FastAPI()

@app.get("/", response_model=DataFrame[Transactions])
async def root():
    return pd.DataFrame.from_records(
        [{"id": 1, "cost": 10.0}, {"id": 2, "cost": 20.0}]
    )

app.openapi()

However, if we downgrade pydantic to v1:

python = "^3.10"
fastapi = "^0.104.0"
pydantic = "^1.10"
pandera = "^0.17.2"
uvicorn = "^0.23.2"

Then everything works fine, and the OpenAPI spec is correctly generated. Same happens when the schema model is used to validate an endpoint parameter.

Expected behavior

Would expect to see the same code that works with Pydantic v1 as one would expect, working with Pydantic v2.

Desktop (please complete the following information):

cosmicBboy commented 1 year ago

Hi @airibarne can you report what the failure is exactly?

airibarne commented 1 year ago

Sure @cosmicBboy, thanks for the quick response. Here is a screenshot of the error:

image

It seems to be a schema generation issue, but I am not familiarized enough with the way Pandera integrates with Pydantic to expose its JSON schema to fully assess it.

I'd add, for extra context, that this is breaking the /docs page of a standard FastAPI application having an endpoint that uses a Pandera model as either an input or an output. The only workaround I found up to today is to exclude these endpoints from the final OpenAPI schema using FastAPI's include_in_schema (documented here).

cosmicBboy commented 1 year ago

The part of the code base responsible for converting pandera schema to json schema representation can be found here: https://github.com/unionai-oss/pandera/blob/cf6b5e45dfb0cd593f948b12a2a327bbf3699657/pandera/api/pandas/model.py#L574-L604

For simplicity it only converts pandera schemas to a very simple json schema representation (column names and types).

Feel free to make a PR for this!

ekiim commented 8 months ago

Any news on this particular issue?

I've been encountering this issue, and it's actually when doing a Model.model_json_schema(). If the model contains a DataFrame as a field, regardless of the DataFrame structure, as the following, we get an error.

from pandera import DataFrameModel
from pandera.typing import DataFrame
from pydantic import BaseModel

class Schema(DataFrameModel):
    ...

class Model(BaseModel):
    field: DataFrame[Schema]

Model.model_json_schema()

This yields this error

    raise PydanticInvalidForJsonSchema(f'Cannot generate a JsonSchema for {error_info}')
pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.PlainValidatorFunctionSchema ({'type': 'no-info', 'function': functools.partial(<bound method DataFrame.pydantic_validate of <class 'pandera.typing.pandas.DataFrame'>>, schema_model=Schema)})

For further information visit https://errors.pydantic.dev/2.6/u/invalid-for-json-schema

While using pydantic 2.6 and pandera 0.18.0 with python 3.11

Are there any plans to address this?

cosmicBboy commented 8 months ago

Currently don't have capacity to look into this issue, but contributions for this fix would be very welcome!

ekiim commented 5 months ago

We can close this, as we found out that this was a local issue, that had to do with overlapping names on the scope.

Thanks.

TCoeffic commented 5 months ago

Testing with both the example given by @airibarne and the one from #1398 updated with pydantic 2.7.1 and pandera 0.18.3, there appear to be an issue with the CoreSchema generation here:

https://github.com/unionai-oss/pandera/blob/c24dda9ad64ec904b1bab8e9eca6e3607f92832d/pandera/typing/pandas.py#L199-L211

which returns a plain function validator, but pydantic expects a before, after or wraps function validator, as seen in these snippets:

https://github.com/pydantic/pydantic/blob/90d60bd9102e5ca985100bea95867d9a6aae275c/pydantic/json_schema.py#L974-L986

https://github.com/pydantic/pydantic/blob/90d60bd9102e5ca985100bea95867d9a6aae275c/pydantic/_internal/_core_utils.py#L65-L68

https://github.com/pydantic/pydantic/blob/90d60bd9102e5ca985100bea95867d9a6aae275c/pydantic/_internal/_core_utils.py#L40

Pydantic provides function to generate the expected CoreSchema types (core_schema.no_info_*_validator_function), but they require an additional parameter, and I don't know how to build it from the pandera models.

Considering this, we should reopen this issue