pydantic / pydantic-settings

Settings management using pydantic
https://docs.pydantic.dev/latest/usage/pydantic_settings/
MIT License
498 stars 47 forks source link

Field validators broken for nested models since v2.3.2 #331

Open LachlanMarnham opened 3 days ago

LachlanMarnham commented 3 days ago

pydantic version: 2.8.0 OS: confirmed broken on Windows and Linux, haven't tested others pydantic-settings version: 2.3.2

Hello, I noticed that there is a breaking change in pydantic-settings==2.3.2, which I think was introduced by PR: https://github.com/pydantic/pydantic-settings/pull/309. Here is a minimal working example:

from pydantic import BaseModel, field_validator
import os
from pydantic_settings import BaseSettings, SettingsConfigDict

class Child(BaseModel):
    ATTRIBUTE: list[str]

    @field_validator("ATTRIBUTE", mode="before")
    def convert_fields(cls, input: str) -> list[str]:
        return input.split("|")

class Parent(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="PREFIX_",
        env_nested_delimiter="__",
        case_sensitive=False,
        env_file_encoding="utf-8",
    )

    child: Child

os.environ["PREFIX_CHILD__ATTRIBUTE"] = "a|b|c"
settings = Parent()
assert settings.child.ATTRIBUTE == ["a", "b", "c"]

This test passes for pydantic-settings up to and including 2.3.1, but has been broken since 2.3.2. The test doesn't simply fail: the instantiation of Parent raises with the following traceback:

Traceback (most recent call last):
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 375, in __call__  
    field_value = self.prepare_field_value(field_name, field, field_value, value_is_complex)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 568, in prepare_field_value
    env_val_built = self.explode_env_vars(field_name, field, self.env_vars)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 703, in explode_env_vars
    raise e
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 700, in explode_env_vars
    env_val = self.decode_complex_value(last_key, target_field, env_val)  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 187, in decode_complex_value
    return json.loads(value)
           ^^^^^^^^^^^^^^^^^
  File "C:\Python\Python311\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\software\pydantic-test\test.py", line 26, in <module>
    settings = Parent()
               ^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 141, in __init__     
    **__pydantic_self__._settings_build_values(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 311, in _settings_build_values
    return deep_update(*reversed([source() for source in sources]))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 311, in <listcomp>   
    return deep_update(*reversed([source() for source in sources]))
                                  ^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 377, in __call__  
    raise SettingsError(
pydantic_settings.sources.SettingsError: error parsing value for field "child" from source "EnvSettingsSource"

So it's trying to JSON-decode the string "a|b|c" and failing. But my field_validator is running in "before" mode so this shouldn't happen. Indeed, if I simply comment-out the field_validator I see exactly the same error. So I believe that decorator is either being ignored altogether, or "before" mode has been broken.

hramezani commented 3 days ago

Thanks @LachlanMarnham for reporting this.

Actually, the example code shouldn't have worked before because ATTRIBUTE field's type is list[str], and pydantic-settings considers it as a complex field. pydantic-settings parses the values of the complex field and this is the reason that you get the error.

By fixing the bug that we had before, you don't need a field_validator anymore. you can pass a list as value and pydantic-settings parses it.

from pydantic import BaseModel
import os
from pydantic_settings import BaseSettings, SettingsConfigDict

class Child(BaseModel):
    ATTRIBUTE: list[str]

class Parent(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="PREFIX_",
        env_nested_delimiter="__",
        case_sensitive=False,
        env_file_encoding="utf-8",
    )

    child: Child

os.environ["PREFIX_CHILD__ATTRIBUTE"] = '["a", "b", "c"]'
settings = Parent()
assert settings.child.ATTRIBUTE == ["a", "b", "c"]
LachlanMarnham commented 3 days ago

Ok thanks @hramezani. I think this is maybe a problem of documentation, with a conflict between pydantic and pydantic-settings. To your point, the pydantic-settings documentation does say:

Complex types like list, set, dict, and sub-models are populated from the environment by treating the environment variable's value as a JSON-encoded string.

Which is fair enough. However, pydantic guarantees that I can avoid this behaviour if I want to:

Before validators run before Pydantic's internal parsing and validation (e.g. coercion of a str to an int). These are more flexible than After validators since they can modify the raw input, but they also have to deal with the raw input, which in theory could be any arbitrary object.

It does feel a bit as if pydantic-settings is breaking a contract with pydantic here, given that internal parsing and validation is in fact taking place before the before validator.

hramezani commented 3 days ago

Good point @LachlanMarnham.

The json parsing behavior was in pydanatic-settings v1(actually pydantic. because pydantic-settings was in pydantic in V1). That's why we keep the behavior in V2. I think we can have a config flag to disable parsing and let the people parse the value by before field_validator

LachlanMarnham commented 3 days ago

I think we can have a config flag...

Do you mean a flag in the SettingsConfigDict? I guess such a flag can't really be added to the field_validator itself since that's owned by pydantic. Personally I don't rely on any settings being JSON, but I can imagine that some people might think it is a bit too much for the two options to be "all fields must be JSON" and "JSON de-serlialisation will be switched off anywhere" (unless I am mis-understanding you).

Maybe the way I'm using pydantic-settings is a bit of an edge case, seeing as 2.3.2 was released a few weeks ago and nobody else has raised a ticket yet. If you think the current behaviour is correct, just a bit out of sync with the documentation, I can open a PR to update the docs.

Thanks

hramezani commented 2 days ago

Yes, I meant a flag in pydantic-settings SettingsConfigDict like enable_json_parsing which has two values:

Maybe the way I'm using pydantic-settings is a bit of an edge case, seeing as 2.3.2 was released a few weeks ago and nobody else has raised a ticket yet. If you think the current behaviour is correct, just a bit out of sync with the documentation, I can open a PR to update the docs.

Yes, I think the current behaviour is correct. what do you think? which part is wrong?