unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.17k stars 298 forks source link

OpenAPI is wrong and Input / Output schemas are missing #1398

Open tkaraouzene opened 9 months ago

tkaraouzene commented 9 months ago

Describe the bug A clear and concise description of what the bug is.

Code Sample, a copy-pastable example

from typing import List

import pandas as pd
import pandera as pa
from fastapi import FastAPI, status
from pandera.typing import Series
from pydantic import BaseModel

TIMESTAMP_COL = "timestamp"

app = FastAPI()

class InputPydantic(BaseModel):
    x: List[float]

    def data_frame(self):
        """Convert input into pandas.DataFrame."""
        return pd.DataFrame(self.dict())

class OutputPydantic(BaseModel):
    y: List[float]

class InputPandera(pa.DataFrameModel):
    x: Series[float]

class OutputPandera(pa.DataFrameModel):
    y: Series[float]

    class Config:
        to_format = "dict"

@app.post(
    "/predict_pydantic",
    status_code=status.HTTP_200_OK,
    response_model=OutputPydantic,
)
def predict_pydantic(dataset: InputPydantic):
    df = dataset.data_frame()
    # Do some stuff
    # ...
    return {"y": [1.0, 2.0, 3.0]}

@app.post(
    "/predict_pandera",
    status_code=status.HTTP_200_OK,
    response_model=pa.typing.DataFrame[OutputPandera],
)
def predict_pandera(df: pa.typing.DataFrame[InputPandera]):
    # Do some stuff
    # ...
    return pd.DataFrame({"y": [1.0, 2.0, 3.0]})

Expected behavior

When I run my app I would like to see InputPandera and OutputPandera in the generated openAPI as it is done for InputPydantic and OutputPydantic objects:

image

Moreover, provided examples for both request and response are wrong:

Request

image

x should be float and not int so it leads to an execution failure when "Try it out" image

Response

Provided response example is has not the good format:

image

Instead of: image

Desktop (please complete the following information):

Additional context

Both [io] and [fastapi] extensions have been installed

cosmicBboy commented 9 months ago

This looks like a duplicate of https://github.com/unionai-oss/pandera/issues/1395, correct?

tkaraouzene commented 9 months ago

My bad I forgot to specify that I was using pydantic v1 (1.10.2). So no it is not a duplicate

cosmicBboy commented 9 months ago

Gotcha. Feel free to try and debug this and opening a PR for a fix!

The part of the codebase that creates the json schema representation is here

tkaraouzene commented 9 months ago

Thanks, I'll have a look on it

eharkins commented 2 months ago

@tkaraouzene did you have any success getting pandera to work with generating openapi schema accurately using pydantic v1 and fastapi? I am having a similar issue where I am orienting the dataframe as records (list) but openapi is generating a schema where each column is an array like:

                 "year_id": {
                    "type": "array",
                    "items": {
                      "type": "integer"
                    }
                  },

My actual response model is returned as a list of record objects that each have a year_id, etc. , which doesn't match the schema being generated.

cosmicBboy commented 2 months ago

The state of the json_schema support is still the same as before, see this issue: https://github.com/unionai-oss/pandera/issues/1395

Would welcome any PRs to actually fix the to_json_schema function here: https://github.com/unionai-oss/pandera/blob/cf6b5e45dfb0cd593f948b12a2a327bbf3699657/pandera/api/pandas/model.py#L651-L661

eharkins commented 2 months ago

Ok, thanks. In the meantime is there a recommended way to override the _to_json_schema function to explicitly specify the schema? Alternatively, would returning a dataframe model with

    class Config:
        to_format = "dict"
        to_format_kwargs = {"orient": "list"}

match the way that the schema is currently generated with columns as arrays?

Edit: using the above kwargs worked for me to make my responses match the way pandera generates the openapi schema