run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.26k stars 4.96k forks source link

[Bug]: MarvinMetadataExtractor not working proerly, as well as the ai_model decorator #14987

Open DioTeamGit opened 1 month ago

DioTeamGit commented 1 month ago

Bug Description

I came across the example notebook on using marvin (through llama-index) to extract relevant metadata on chunks of text. The first issue is related to the ai_model decorator: following the same instructions included in the notebook, it seems that is not possible to decorate a class that inherits from the BaseModel implementation in "llama_index.core.bridge.pydantic".

I tried solving the issue using BaseModel directly from pydantic, which solved it but generated a new one: in the class MarvinMetadataExtractor, the following code :

from marvin import ai_model

if not issubclass(marvin_model, ai_model): raise ValueError("marvin_model must be a subclass of ai_model")

which starts at line 63, seems to be broken by definition, since ai_model is not a class but a function.

Version

llama-index-extractors-marvin="0.1.3"

Steps to Reproduce

from pydantic.main import BaseModel import marvin from marvin import ai_model

_ = load_dotenv(find_dotenv()) openai_key = os.environ.get("OPENAI_KEY")

marvin.settings.openai.api_key = openai_key llm_model = "gpt-4o"

@ai_model class RiskExtractor(BaseModel): risk: str = Field( ..., description=( "a description of all the chunk parts related to risks " f"that can be one of the following: {Config.RISKS}." ) )

metadata_extractor = MarvinMetadataExtractor( marvin_model=RiskExtractor, llm_model_string=llm_model )

Relevant Logs/Tracbacks

TypeError Traceback (most recent call last) Cell In[9], line 3 1 # from pydantic.main import BaseModel ----> 3 @ai_model 4 class RiskExtractor(BaseModel): 5 risk: str = Field( 6 ..., 7 description=( (...) 10 ) 11 )

File c:\Users\TobiaTommasini\pythonprojects\riskmanai.venv\Lib\site-packages\marvin\ai\text.py:645, in model(type, model_kwargs, client) 642 return WrappedModel 644 if type_ is not None: --> 645 return decorator(type_) 646 return decorator

File c:\Users\TobiaTommasini\python_projects\riskmanai.venv\Lib\site-packages\marvin\ai\text.py:633, in model..decorator(cls) 632 def decorator(cls: Type[M]) -> Type[M]: --> 633 class WrappedModel(Model, cls): 634 @wraps(cls.init) 635 def init(self, *args, *kwargs): 636 super().init( 637 args, model_kwargs=model_kwargs, client=client, **kwargs 638 )

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases


TypeError Traceback (most recent call last) Cell In[48], line 5 1 node_parser = TokenTextSplitter( 2 separator=" ", chunk_size=512, chunk_overlap=100 3 ) ----> 5 metadata_extractor = MarvinMetadataExtractor( 6 marvin_model=RiskExtractor, llm_model_string=llm_model 7 )

File c:\Users\TobiaTommasini\python_projects\riskmanai.venv\Lib\site-packages\llama_index\extractors\marvin\base.py:65, in MarvinMetadataExtractor.init(self, marvin_model, llm_model_string, **kwargs) 62 import marvin 63 from marvin import ai_model ---> 65 if not issubclass(marvin_model, ai_model): 66 raise ValueError("marvin_model must be a subclass of ai_model") 68 if llm_model_string:

TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union

No response

dosubot[bot] commented 1 month ago

Hey @DioTeamGit! I'm here to help you with any issues you're facing. I'm a bot that can assist you in solving bugs, answering questions, and becoming a contributor. Let me know how I can assist you!

To address the issues you're encountering with the MarvinMetadataExtractor and the ai_model decorator, follow these steps:

  1. Ensure ai_model is correctly applied: The ai_model decorator should be applied to a class that inherits from Pydantic's BaseModel. Ensure that the ai_model decorator is correctly imported and applied.

  2. Check the ai_model decorator: The ai_model decorator should be a function that modifies the class it decorates. If ai_model is not a class but a function, the check issubclass(marvin_model, ai_model) will fail. You need to ensure that ai_model is used correctly.

  3. Modify the MarvinMetadataExtractor class: Update the MarvinMetadataExtractor class to correctly handle the ai_model decorator. Here is an example of how you can modify the class:

from typing import Type, Optional, Any, Sequence, List, Dict, Iterable
from pydantic import BaseModel, Field
from llama_index.legacy.extractors.interface import BaseExtractor
from llama_index.legacy.schema import BaseNode, TextNode
from llama_index.legacy.utils import get_tqdm_iterable

class MarvinMetadataExtractor(BaseExtractor):
    marvin_model: Type[BaseModel] = Field(
        description="The Marvin model to use for extracting custom metadata"
    )
    llm_model_string: Optional[str] = Field(
        description="The LLM model string to use for extracting custom metadata"
    )

    def __init__(
        self,
        marvin_model: Type[BaseModel],
        llm_model_string: Optional[str] = None,
        **kwargs: Any,
    ) -> None:
        import marvin
        from marvin import ai_model

        if not hasattr(marvin_model, '__ai_model__'):
            raise ValueError("marvin_model must be decorated with @ai_model")

        if llm_model_string:
            marvin.settings.llm_model = llm_model_string

        super().__init__(
            marvin_model=marvin_model, llm_model_string=llm_model_string, **kwargs
        )

    @classmethod
    def class_name(cls) -> str:
        return "MarvinEntityExtractor"

    async def aextract(self, nodes: Sequence[BaseNode]) -> List[Dict]:
        from marvin import ai_model

        ai_model_instance = self.marvin_model()
        metadata_list: List[Dict] = []

        nodes_queue: Iterable[BaseNode] = get_tqdm_iterable(
            nodes, self.show_progress, "Extracting marvin metadata"
        )
        for node in nodes_queue:
            if self.is_text_node_only and not isinstance(node, TextNode):
                metadata_list.append({})
                continue

            metadata = ai_model_instance(node.get_content())
            metadata_list.append({"marvin_metadata": metadata.dict()})
        return metadata_list
  1. Example usage: Ensure your usage of the MarvinMetadataExtractor and ai_model decorator is correct:
from pydantic import BaseModel, Field
import marvin
from marvin import ai_model

@ai_model
class RiskExtractor(BaseModel):
    risk: str = Field(
        ..., 
        description=(
            "a description of all the chunk parts related to risks "
            f"that can be one of the following: {Config.RISKS}."
        )
    )

metadata_extractor = MarvinMetadataExtractor(
    marvin_model=RiskExtractor, llm_model_string="gpt-4o"
)

By ensuring that the ai_model decorator is correctly applied and modifying the MarvinMetadataExtractor class to check for the decorator, you should be able to resolve the issues you're facing [1][2].

To continue talking to Dosu, mention @dosu.