pydantic / pydantic

Data validation using Python type hints
https://docs.pydantic.dev
MIT License
21.09k stars 1.9k forks source link

How to associate text labels with choice values in schemas #1401

Closed dodumosu closed 1 year ago

dodumosu commented 4 years ago

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

            pydantic version: 1.4
            pydantic compiled: True
            python version: 3.8.2 (default, Apr 16 2020, 05:44:45)  [GCC 7.5.0]
            platform: Linux-4.4.0-18362-Microsoft-x86_64-with-glibc2.27
            optional deps. installed: ['email-validator']

From the documentation, pydantic uses enums for choices. I haven't been able to determine how to link a value with a label, which might be useful for an application consuming a pydantic-powered API.

For example:

import enum
import pydantic

class Apple(enum.IntEnum):
    RED_DELICIOUS = 1
    GOLDEN_DELICIOUS = 2
    MCINTOSH = 3
    FUJI = 4 

class AppleModel(pydantic.BaseModel):
    variety: Apple

ApplePieModel.schema()
# {'title': 'AppleModel', 'type': 'object', 'properties': {'variety': {'title': 'Variety', 'enum': [1, 2, 3, 4], 'type': 'integer'}}, 'required': ['variety']}

From the documentation, it is possible to use enums that subclass str, but that seems a less than ideal solution to my mind.

For contrast, something like Django or WTForms allows you specify choices in [value], [label] pairs:

APPLE_CHOICES = (
    (1, 'Red Delicious'),
    (2, 'Golden Delicious'),
    (3, 'McIntosh'),
    (4, 'Fuji'),
)

I'm not saying this is how it should be done, just trying to find out if there's a way it can be done.

dodumosu commented 4 years ago

From this issue, it looks like oneOf is the accepted way of doing this in JSON Schema land, but pydantic doesn't generate that particular JSON Schema type.

Atheuz commented 4 years ago

Here's something I tried, I was unable to tie a secondary value to it though:

from pydantic import BaseModel
from typing_extensions import Literal
from typing import Union, Tuple

Variety = Union[
    Literal["Red Delicious"],
    Literal["Golden Delicious"],
    Literal["McIntosh"],
    Literal["Fuji"],
]

class AppleModel(BaseModel):
    variety: Variety

def main():
    print(AppleModel.schema())

if __name__ == "__main__":
    main()

Produces =>

{
    "title": "AppleModel",
    "type": "object",
    "properties":
    {
        "variety":
        {
            "title": "Variety",
            "anyOf": [
                {
                    "const": "Red Delicious",
                    "type": "string"
                },
                {
                    "const": "Golden Delicious",
                    "type": "string"
                },
                {
                    "const": "McIntosh",
                    "type": "string"
                },
                {
                    "const": "Fuji",
                    "type": "string"
                }
            ]
        }
    },
    "required": ["variety"]
}
dodumosu commented 4 years ago

That's interesting. The (untested) concept I came up with is using the aenum package to create enumerations that can have a label field, then creating a class method __modify_schema__ to customise the schema generation.

Cheers

samuelcolvin commented 4 years ago

@dodumosu can you explain more how aenum worked? I too really want something like this.

I think we should add some standard way to achieve this in pydantic.

dodumosu commented 4 years ago

so here's what i cooked up:

# -*- coding: utf-8 -*-
from typing import Optional

import aenum
import pydantic

class AppleVariety(aenum.Enum):
    _init_ = "value label"

    FUJI = 1, "Fuji"
    GOLDEN_DELICIOUS = 2, "Golden Delicious"
    MCINTOSH = 3, "McIntosh"
    RED_DELICIOUS = 4, "Red Delicious"

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def __modify_schema__(cls, field_schema):
        # see notes below
        field_schema.pop("enum")
        field_schema.update(
            {
                "oneOf": [
                    {"const": choice.value, "title": choice.label}
                    for choice in cls
                ]
            }
        )

    @classmethod
    def validate(cls, v):
        try:
            new_v = cls(int(v))
        except (TypeError, ValueError):
            raise

        return new_v

class Apple(pydantic.BaseModel):
    variety: Optional[AppleVariety] = None

Apple.schema()
# {'title': 'Apple', 'type': 'object', 'properties': {'variety': {'title': 'Variety', 'oneOf': [{'const': 1, 'title': 'Fuji'}, {'const': 2, 'title': 'Golden Delicious'}, {'const': 3, 'title': 'McIntosh'}, {'const': 4, 'title': 'Red Delicious'}]}}}

Notes:

3ynm commented 4 years ago

Just connected to say I've been looking for something like this integrated into pydantic for several months... I'll be waiting for it :D

selimb commented 4 years ago

For contrast, something like Django or WTForms allows you specify choices in [value], [label] pairs

This can also be done using the new django.db.models.enums.Choices class, which actually uses enum.Enum under the hood. Turns out this just works with pydantic, too:

from django.db.models import TextChoices
import pydantic

class RunnerType(TextChoices):
    LOCALFS = "localfs", "Local FS"
    SGE = "sge", "SGE"

class Job(pydantic.BaseModel):
    runner_type: RunnerType

print("schema", Job.schema())
job = Job(runner_type="localfs")
print("job", job)
print("json", job.json())

# OUTPUT:
schema {'title': 'Job', 'type': 'object', 'properties': {'runner_type': {'title': 'Runner Type', 'enum': ['localfs', 'sge'], 'type': 'string'}}, 'required': ['runner_type']}
job runner_type=<RunnerType.LOCALFS: 'localfs'>
json {"runner_type": "localfs"}

If oneOf is desired over enum in the schema, @dodumosu's __modify_schema__ method can be simply copy-pasted into a subclass of TextChoices:

from django.db.models import TextChoices
import pydantic

class Choices(TextChoices):  # new
    @classmethod
    def __modify_schema__(cls, field_schema):
        # see notes below
        field_schema.pop("enum")
        field_schema.update({"oneOf": [{"const": choice.value, "title": choice.label} for choice in cls]})

class RunnerType(Choices):  # modified
    LOCALFS = "localfs", "Local FS"
    SGE = "sge", "SGE"

class Job(pydantic.BaseModel):
    runner_type: RunnerType

print("schema", Job.schema())
job = Job(runner_type="localfs")
print("job", job)
print("json", job.json())

# OUTPUT:
schema {'title': 'Job', 'type': 'object', 'properties': {'runner_type': {'title': 'Runner Type', 'type': 'string', 'oneOf': [{'const': 'localfs', 'title': 'Local FS'}, {'const': 'sge', 'title': 'SGE'}]}}, 'required': ['runner_type']}
job runner_type=<RunnerType.LOCALFS: 'localfs'>
json {"runner_type": "localfs"}

I mention this because the code for django.db.models.enums.Choices is pretty lightweight (especially compared to aenum). Would it make sense to simply vendor (some) that code (plus __modify_schema__) with pydantic?

EDIT: Hmm, looks like it's difficult to get this, or any variants, to play well with mypy, at least without .pyi files (see django-stubs)

selimb commented 4 years ago

Actually, this minimal code does the trick for my (limited) use case, and it correctly type-checks .value and .label:

# labelled_enum.py
"""
A special Enum that plays well with ``pydantic`` and ``mypy``, while allowing human-readable
labels similarly to ``django.db.models.enums.Choices``.
"""
from typing import TypeVar, Type
import enum

T = TypeVar("T")

class LabelledEnum(enum.Enum):
    """Enum with labels. Assumes both the value and label are strings."""

    def __new__(cls: Type[T], value: str, label: str) -> T:
        obj = object.__new__(cls)
        obj._value_ = value
        obj.label = label
        return obj
# labelled_enum.pyi
import enum

class LabelledEnum(enum.Enum):
    @property
    def label(self) -> str: ...
    @property
    def value(self) -> str: ...
# example usage
class RunnerType(LabelledEnum):
    LOCALFS = "localfs", "Local FS"
    SGE = "sge", "SGE"

Again, __modify_schema__ can easily be defined in LabelledEnum if desired -- I don't personally need it.

Hope this helps.

jokull commented 3 years ago

Adapted the Django code and combined with pieces from above ^ and it works quite well.

elonzh commented 1 year ago

From this issue, it looks like oneOf is the accepted way of doing this in JSON Schema land, but pedantic doesn't generate that particular JSON Schema type.

Considering the role of text labels is just describing values and OpenAPI 3.0 compatibility with JSON schema. I think maybe we can generate a description for an enum field like https://github.com/tfranzel/drf-spectacular/blob/0.26.2/drf_spectacular/hooks.py#L123-L128.

Actually, displaying Enum.name should be OK for most cases. There seems to be no need to introduce text labels in Pydantic to describe enum values if Pydantic generates a detailed description with enum names and values.

Users can override the default description or write a custom data type like below.

Proof of concept

import enum

from django.db.models import enums as django_enums

__all__ = [
    "Enum",
    "IntEnum",
    "IntegerChoices",
    "TextChoices",
]

class PydanticEnumSchema:
    __enum_description_field__ = "name"

    @classmethod
    def __modify_schema__(cls, field_schema):
        description = field_schema.get("description", "")
        if description and description != "An enumeration.":
            # We assume description has not been overridden when it is the default value.
            # This behavior should be more accurate if schema is generated by pydantic.
            return
        enum_list = "\n".join(
            [
                f"* `{choice.value}` - {getattr(choice, cls.__enum_description_field__)}"
                for choice in cls
            ]
        )
        if not enum_list:
            return
        if description:
            description += "\n\n" + enum_list
        else:
            description = enum_list
        field_schema["description"] = description

class Enum(PydanticEnumSchema, enum.Enum):
    ...

class IntEnum(PydanticEnumSchema, enum.IntEnum):
    ...

class PydanticDjangoChoicesSchema(PydanticEnumSchema):
    __enum_description_field__ = "label"

class IntegerChoices(PydanticDjangoChoicesSchema, django_enums.IntegerChoices):
    ...

class TextChoices(PydanticDjangoChoicesSchema, django_enums.TextChoices):
    ...

Swagger UI display comparison

Combine oneOf and const for django.db.models.enums.Choices

image

drf-spectacular approach

image

adriangb commented 1 year ago

Would something like this work?

image

I would like that to be generatable using:

from typing import Annotated, Literal, Union

from pydantic import BaseModel, Field

Apple = Union[
    Annotated[
        Literal[1],
        Field(title='Red Delicious'),
    ],
    Annotated[
        Literal[2],
        Field(title='Cosmic Crisp'),
    ]
]

class Model(BaseModel):
    apple: Apple

print(Model.model_json_schema())
"""
{"type": "object", "properties": {"apple": {"anyOf": [{"const": 1}, {"const": 2}], "title": "Apple"}}, "required": ["apple"], "title": "Model"}
"""

But unfortunately the title gets dropped at the moment, which I would call a bug. But I'd like to know if this is even what all of you want before trying to fix this bug (if it is a bug at all).

dodumosu commented 1 year ago

Actually, this minimal code does the trick for my (limited) use case, and it correctly type-checks .value and .label:

# labelled_enum.py
"""
A special Enum that plays well with ``pydantic`` and ``mypy``, while allowing human-readable
labels similarly to ``django.db.models.enums.Choices``.
"""
from typing import TypeVar, Type
import enum

T = TypeVar("T")

class LabelledEnum(enum.Enum):
    """Enum with labels. Assumes both the value and label are strings."""

    def __new__(cls: Type[T], value: str, label: str) -> T:
        obj = object.__new__(cls)
        obj._value_ = value
        obj.label = label
        return obj
# labelled_enum.pyi
import enum

class LabelledEnum(enum.Enum):
    @property
    def label(self) -> str: ...
    @property
    def value(self) -> str: ...
# example usage
class RunnerType(LabelledEnum):
    LOCALFS = "localfs", "Local FS"
    SGE = "sge", "SGE"

Again, __modify_schema__ can easily be defined in LabelledEnum if desired -- I don't personally need it.

Hope this helps.

it's been a while since I opened this issue :) just wanted to say that in the Python standard library, something like this is used for httplib.HTTPStatus. it's actually an enum.IntEnum subclass with extra data associated with each member.

akvadrako commented 9 months ago

How was this solved? I cannot find documentation on how to use it.