truefoundry / cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
https://cognita.truefoundry.com
Apache License 2.0
3.2k stars 259 forks source link

Port pydantic v1 models to pydantic v2 #219

Closed chiragjn closed 1 month ago

abhinavk454 commented 3 months ago

I want to work with this issue..

chiragjn commented 3 months ago

Awesome, I suppose the first order of business is to update requirements to install pydantic v2, change all imports to pydantic.v1 and ensure things are working fine

Fortunately we don't have many files that reference pydantic https://github.com/search?q=repo%3Atruefoundry%2Fcognita+%22from+pydantic%22&type=code In second part we start porting models themselves to v2 and changing validators, model serialization, deserialization

We would appreciate pull requests :)

cocobeach commented 3 months ago

Maybe I can help a bit Pydantic v1 to v2 errors:

  1. pydantic-settings /python3.11/site-packages/pydantic/_migration.py", line 296, in wrapper raise PydanticImportError( pydantic.errors.PydanticImportError: BaseSettings has been moved to the pydantic-settings package. See https://docs.pydantic.dev/2.7/migration/#basesettings-has-moved-to-pydantic-settings for more details.

Resolution pip install pydantic-settings & Modify your settings.py to import BaseSettings from the pydantic-settings package instead of pydantic

  1. JOB_FQN

/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 406, in inspect_namespace raise PydanticUserError( pydantic.errors.PydanticUserError: A non-annotated attribute was detected: JOB_FQN = ''. All model fields require a type annotation; if JOB_FQN is not meant to be a field, you may be able to resolve this error by annotating it as a ClassVar or updating model_config['ignored_types'].

In Pydantic v2, all attributes of a model class need to be either type-annotated fields or explicitly marked as ClassVar if they are not meant to be fields.

Here’s how you can resolve this issue:

Annotate JOB_FQN as a ClassVar if it's not meant to be a field of the model. Update model_config['ignored_types'] if necessary to include the type of JOB_FQN.

  1. validate_fqn

python3.11/site-packages/pydantic/main.py", line 176, in init self.__pydantic_validator__.validate_python(data, self_instance=self) pydantic_core._pydantic_core.ValidationError: 1 validation error for AssociatedDataSources

fqn property in your BaseDataSource class is being used in a way that Pydantic doesn't expect. Specifically, Pydantic expects a string but is encountering the property object instead.

To resolve this, ensure that fqn is computed and stored as a regular field during model validation rather than using the property method.

Remove the property Decorator:

Instead of using the @property decorator, compute fqn within a model_validator. Set fqn in model_validator:

Ensure fqn is set after validation so that it becomes a regular field accessible as a string

Number 3 I kept having problems with BaseDatasourcve and Localmetadatasource: We need to ensure that fqn is correctly initialized during the creation of the DataSource object.

==> That's all I got

abhinavk454 commented 3 months ago

After Upgrading To V2 getting this

image

backend/settings.py

import os
from typing import Optional,ClassVar

import orjson

from backend.types import EmbeddingCacheConfig, MetadataStoreConfig, VectorDBConfig
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    """
    Settings class to hold all the environment variables
    """

    LOG_LEVEL: str = "info"
    METADATA_STORE_CONFIG: MetadataStoreConfig
    VECTOR_DB_CONFIG: VectorDBConfig
    TFY_SERVICE_ROOT_PATH: Optional[str] = "/"
    TFY_API_KEY: str
    OPENAI_API_KEY: Optional[str]
    TFY_HOST: Optional[str]
    TFY_LLM_GATEWAY_URL: str
    EMBEDDING_CACHE_CONFIG: Optional[EmbeddingCacheConfig] = None

    LOG_LEVEL = os.getenv("LOG_LEVEL", "info")
    VECTOR_DB_CONFIG = os.getenv("VECTOR_DB_CONFIG", "")
    METADATA_STORE_CONFIG = os.getenv("METADATA_STORE_CONFIG", "")
    TFY_SERVICE_ROOT_PATH = os.getenv("TFY_SERVICE_ROOT_PATH", "")
    JOB_FQN:ClassVar[str] = os.getenv("JOB_FQN", "")
    JOB_COMPONENT_NAME:ClassVar[str] = os.getenv("JOB_COMPONENT_NAME", "")
    TFY_API_KEY = os.getenv("TFY_API_KEY", "")
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
    TFY_HOST = os.getenv("TFY_HOST", "")
    TFY_LLM_GATEWAY_URL = os.getenv("TFY_LLM_GATEWAY_URL", "")
    EMBEDDING_CACHE_CONFIG = (
        EmbeddingCacheConfig.model_validate(
            orjson.loads(os.getenv("EMBEDDING_CACHE_CONFIG"))
        )
        if os.getenv("EMBEDDING_CACHE_CONFIG", None)
        else None
    )

    LOCAL: bool = os.getenv("LOCAL", False)
    OLLAMA_URL: str = os.getenv("OLLAMA_URL", "http://localhost:11434")
    EMBEDDING_SVC_URL: str = os.getenv("EMBEDDING_SVC_URL", "")
    RERANKER_SVC_URL: str = os.getenv("RERANKER_SVC_URL", "")

    if not VECTOR_DB_CONFIG:
        raise ValueError("VECTOR_DB_CONFIG is not set")

    if not METADATA_STORE_CONFIG:
        raise ValueError("METADATA_STORE_CONFIG is not set")

    if not TFY_LLM_GATEWAY_URL:
        TFY_LLM_GATEWAY_URL = f"{TFY_HOST}/api/llm"

    try:
        VECTOR_DB_CONFIG = VectorDBConfig.model_validate(orjson.loads(VECTOR_DB_CONFIG))
    except Exception as e:
        raise ValueError(f"VECTOR_DB_CONFIG is invalid: {e}")
    try:
        METADATA_STORE_CONFIG = MetadataStoreConfig.model_validate(
            orjson.loads(METADATA_STORE_CONFIG)
        )
    except Exception as e:
        raise ValueError(f"METADATA_STORE_CONFIG is invalid: {e}")

settings = Settings()
# Check the env vars set
print("Settings: ", settings)
chiragjn commented 3 months ago

Can we look at your changes somewhere? Would be good to have them on a branch that we can checkout and resolve this. From the surface looks like some dependencies are missing

cocobeach commented 3 months ago

I tried but like a mentioned I went in over my head, those are 3 things I could recover from my chat GPT account but the whole repo was wiped clean and pulled again, I sorry, it’s gone. It wasn’t good code anyway trust me.

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: Chirag Jain @.> Sent: Monday, June 17, 2024 3:57:49 PM To: truefoundry/cognita @.> Cc: MaxTensor @.>; Comment @.> Subject: Re: [truefoundry/cognita] Port pydantic v1 models to pydantic v2 (Issue #219)

Can we look at your changes somewhere? Would be good to have them on a branch that we can checkout and resolve this. From the surface looks like some dependencies are missing

— Reply to this email directly, view it on GitHubhttps://github.com/truefoundry/cognita/issues/219#issuecomment-2173642801, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA7O7MI7UUGRVVJFVORG3LZH32O3AVCNFSM6AAAAABJJ3CDAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTGY2DEOBQGE. You are receiving this because you commented.Message ID: @.***>

chiragjn commented 3 months ago

Sorry for confusion, my previous message was addressed to @abhinavk454 as he is working on code changes for this

abhinavk454 commented 3 months ago

After Upgrading To V2 getting this

image

backend/settings.py

import os
from typing import Optional,ClassVar

import orjson

from backend.types import EmbeddingCacheConfig, MetadataStoreConfig, VectorDBConfig
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    """
    Settings class to hold all the environment variables
    """

    LOG_LEVEL: str = "info"
    METADATA_STORE_CONFIG: MetadataStoreConfig
    VECTOR_DB_CONFIG: VectorDBConfig
    TFY_SERVICE_ROOT_PATH: Optional[str] = "/"
    TFY_API_KEY: str
    OPENAI_API_KEY: Optional[str]
    TFY_HOST: Optional[str]
    TFY_LLM_GATEWAY_URL: str
    EMBEDDING_CACHE_CONFIG: Optional[EmbeddingCacheConfig] = None

    LOG_LEVEL = os.getenv("LOG_LEVEL", "info")
    VECTOR_DB_CONFIG = os.getenv("VECTOR_DB_CONFIG", "")
    METADATA_STORE_CONFIG = os.getenv("METADATA_STORE_CONFIG", "")
    TFY_SERVICE_ROOT_PATH = os.getenv("TFY_SERVICE_ROOT_PATH", "")
    JOB_FQN:ClassVar[str] = os.getenv("JOB_FQN", "")
    JOB_COMPONENT_NAME:ClassVar[str] = os.getenv("JOB_COMPONENT_NAME", "")
    TFY_API_KEY = os.getenv("TFY_API_KEY", "")
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
    TFY_HOST = os.getenv("TFY_HOST", "")
    TFY_LLM_GATEWAY_URL = os.getenv("TFY_LLM_GATEWAY_URL", "")
    EMBEDDING_CACHE_CONFIG = (
        EmbeddingCacheConfig.model_validate(
            orjson.loads(os.getenv("EMBEDDING_CACHE_CONFIG"))
        )
        if os.getenv("EMBEDDING_CACHE_CONFIG", None)
        else None
    )

    LOCAL: bool = os.getenv("LOCAL", False)
    OLLAMA_URL: str = os.getenv("OLLAMA_URL", "http://localhost:11434")
    EMBEDDING_SVC_URL: str = os.getenv("EMBEDDING_SVC_URL", "")
    RERANKER_SVC_URL: str = os.getenv("RERANKER_SVC_URL", "")

    if not VECTOR_DB_CONFIG:
        raise ValueError("VECTOR_DB_CONFIG is not set")

    if not METADATA_STORE_CONFIG:
        raise ValueError("METADATA_STORE_CONFIG is not set")

    if not TFY_LLM_GATEWAY_URL:
        TFY_LLM_GATEWAY_URL = f"{TFY_HOST}/api/llm"

    try:
        VECTOR_DB_CONFIG = VectorDBConfig.model_validate(orjson.loads(VECTOR_DB_CONFIG))
    except Exception as e:
        raise ValueError(f"VECTOR_DB_CONFIG is invalid: {e}")
    try:
        METADATA_STORE_CONFIG = MetadataStoreConfig.model_validate(
            orjson.loads(METADATA_STORE_CONFIG)
        )
    except Exception as e:
        raise ValueError(f"METADATA_STORE_CONFIG is invalid: {e}")

settings = Settings()
# Check the env vars set
print("Settings: ", settings)

Solved, it was just a Docker Compose issue.