langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.04k stars 14.95k forks source link

Prepare for pydantic 2 migration by refactoring vanilla @root_validator() usage #22819

Closed eyurtsev closed 3 months ago

eyurtsev commented 3 months ago

Privileged issue

Issue Content

Context

Currently, LangChain supports Pydantic 2 only through the v1 namespace.

The plan is to transition for Pydantic 2 with release 0.3.0 of LangChain, and drop support for Pydantic 1.

LangChain has around ~1000 pydantic objects across different packages While LangChain uses a number of deprecated features, one of the harder things to update is the usage of a vanilla @root_validator() (which is used ~250 times across the code base).

The goal of this issue is to do as much preliminary work as possible to help prepare for the migration from pydantic v1 to pydantic 2.

To help prepare for the migration, we'll need to refactor each occurrence of a vanilla root_validator() to one of the following 3 variants (depending on what makes sense in the context of the model):

  1. root_validator(pre=True) -- pre initialization validator
  2. root_validator(pre=False, skip_on_failure=True) -- post initialization validator
  3. root_validator(pre=True) AND root_validator(pre=False, skip_on_failure=True) to include both pre initialization and post initialization validation.

Guidelines

What not to do

Simple Example

class Foo(BaseModel):
    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        values["api_key"] = get_from_dict_or_env(
            values, "some_api_key", "SOME_API_KEY", default=""
        )

        if values["temperature"] is not None and not 0 <= values["temperature"] <= 1:
            raise ValueError("temperature must be in the range [0.0, 1.0]")
        return values

After refactor

class Foo(BaseModel):
    @root_validator(pre=True)
    def pre_init(cls, values):
        # Logic for setting defaults goes in the pre_init validator.
        # While in some cases, the logic could be pulled into the `Field` definition
        # directly, it's perfectly fine for this refactor to keep the changes minimal
        # and just move the logic into the pre_init validator.
        values["api_key"] = get_from_dict_or_env(
            values, "some_api_key", "SOME_API_KEY", default=""
        )
        return values

    @root_validator(pre=False, skip_on_failure=True)
    def post_init(self, values):
        # Post init validation works with an object that is already initialized
        # so it can access the fields and their values (e.g., temperature).
        # if this logic were part of the pre_init validator, it would raise
        # a KeyError exception since `temperature` does not exist in the values
        # dictionary at that point.
        if values["temperature"] is not None and not 0 <= values["temperature"] <= 1:
            raise ValueError("temperature must be in the range [0.0, 1.0]")
        return values

Example Refactors

Here are some actual for the refactors https://gist.github.com/eyurtsev/be30ddbc54dcdc02f98868eacb24b2a1

If you're feeling especially creative, you could try to use the example refactors, an LLM chain built with an appropriate prompt to attempt to automatically fix this code using LLMs!

Vanilla `root_validator

eyurtsev commented 3 months ago

Maybe superseded by: https://github.com/langchain-ai/langchain/pull/23841