Context

Currently, LangChain supports Pydantic 2 only through the v1 namespace.

The plan is to transition for Pydantic 2 with release 0.3.0 of LangChain, and drop support for Pydantic 1.

LangChain has around ~1000 pydantic objects across different packages While LangChain uses a number of deprecated features, one of the harder things to update is the usage of a vanilla @root_validator() (which is used ~250 times across the code base).

The goal of this issue is to do as much preliminary work as possible to help prepare for the migration from pydantic v1 to pydantic 2.

To help prepare for the migration, we'll need to refactor each occurrence of a vanilla root_validator() to one of the following 3 variants (depending on what makes sense in the context of the model):

root_validator(pre=True) -- pre initialization validator
root_validator(pre=False, skip_on_failure=True) -- post initialization validator
root_validator(pre=True) AND root_validator(pre=False, skip_on_failure=True) to include both pre initialization and post initialization validation.

Guidelines

Pre-initialization is most useful for creating defaults for values, especially when the defaults cannot be supplied per field individually.
Post-initialization is most useful for doing more complex validation, especially one that involves multiple fields.

What not to do

Do NOT upgrade to model_validator. We're trying to break the work into small chunks that can be done while we're still using Pydantic v1 functionality!
Do NOT create field_validators when doing the refactor.

Simple Example

class Foo(BaseModel):
    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        values["api_key"] = get_from_dict_or_env(
            values, "some_api_key", "SOME_API_KEY", default=""
        )

        if values["temperature"] is not None and not 0 <= values["temperature"] <= 1:
            raise ValueError("temperature must be in the range [0.0, 1.0]")
        return values

After refactor

class Foo(BaseModel):
    @root_validator(pre=True)
    def pre_init(cls, values):
        # Logic for setting defaults goes in the pre_init validator.
        # While in some cases, the logic could be pulled into the `Field` definition
        # directly, it's perfectly fine for this refactor to keep the changes minimal
        # and just move the logic into the pre_init validator.
        values["api_key"] = get_from_dict_or_env(
            values, "some_api_key", "SOME_API_KEY", default=""
        )
        return values

    @root_validator(pre=False, skip_on_failure=True)
    def post_init(self, values):
        # Post init validation works with an object that is already initialized
        # so it can access the fields and their values (e.g., temperature).
        # if this logic were part of the pre_init validator, it would raise
        # a KeyError exception since `temperature` does not exist in the values
        # dictionary at that point.
        if values["temperature"] is not None and not 0 <= values["temperature"] <= 1:
            raise ValueError("temperature must be in the range [0.0, 1.0]")
        return values

Example Refactors

Here are some actual for the refactors https://gist.github.com/eyurtsev/be30ddbc54dcdc02f98868eacb24b2a1

If you're feeling especially creative, you could try to use the example refactors, an LLM chain built with an appropriate prompt to attempt to automatically fix this code using LLMs!

langchain-ai / langchain

Prepare for pydantic 2 migration by refactoring vanilla @root_validator() usage #22819

Privileged issue

Issue Content