run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.9k stars 5.28k forks source link

[Bug]: guideline evaluation is throwing error #17012

Open Rohith-Scalers opened 2 days ago

Rohith-Scalers commented 2 days ago

Bug Description

guideline evaluation is throwing error saying missing model

Version

latest

Steps to Reproduce

run the guideline evaluation example

Relevant Logs/Tracbacks

Traceback (most recent call last):
  File "/home/rohit/.local/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 33, in asyncio_run
    return loop.run_until_complete(coro)
  File "/home/rohit/.local/lib/python3.10/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/home/rohit/.local/lib/python3.10/site-packages/llama_index/core/evaluation/guideline.py", line 118, in aevaluate
    eval_data = self._output_parser.parse(eval_response)
  File "/home/rohit/.local/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
    result = func(*args, **kwargs)
  File "/home/rohit/.local/lib/python3.10/site-packages/llama_index/core/output_parsers/pydantic.py", line 62, in parse
    return self._output_cls.parse_raw(json_str)
  File "/home/rohit/.local/lib/python3.10/site-packages/pydantic/v1/main.py", line 549, in parse_raw
    return cls.parse_obj(obj)
  File "/home/rohit/.local/lib/python3.10/site-packages/pydantic/v1/main.py", line 526, in parse_obj
    return cls(**obj)
  File "/home/rohit/.local/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 2 validation errors for EvaluationData
passing
  field required (type=value_error.missing)
feedback
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rohit/RagEval/src/main.py", line 86, in <module>
    GuidelineData.get_score(query,reference_answer,reference_contexts,[guideline_index])
  File "/home/rohit/RagEval/src/evaluation_metrics/guideline_score.py", line 25, in get_score
    eval_result = evaluator.evaluate(
  File "/home/rohit/.local/lib/python3.10/site-packages/llama_index/core/evaluation/base.py", line 62, in evaluate
    return asyncio_run(
  File "/home/rohit/.local/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 47, in asyncio_run
    raise type(e)(f"Error running coroutine: {e!s}") from e
TypeError: ValidationError.__init__() missing 1 required positional argument: 'model'
dosubot[bot] commented 2 days ago

The "missing model" error you're encountering during the guideline evaluation in LlamaIndex is likely due to a problem with the EvaluationData model not being properly instantiated. The error message indicates that the EvaluationData model is missing required fields (passing and feedback) during its initialization.

Here are some steps to troubleshoot and resolve this issue:

  1. Check the Evaluation Response: Ensure that the response from the language model (LLM) contains the necessary data to populate the EvaluationData model. The response should include both passing and feedback fields.

  2. Verify the Output Parser: The output_parser is responsible for parsing the LLM's response into an EvaluationData object. Make sure that the response format matches what the PydanticOutputParser expects. The parser should be correctly configured to handle the expected output structure.

  3. Debug the LLM Response: Add logging or print statements to inspect the raw response from the LLM before it is parsed. This can help identify if the response is missing the required fields.

  4. Review the aevaluate Method: Ensure that the aevaluate method in the GuidelineEvaluator class is correctly implemented and that it properly handles the LLM's response. The method should call the apredict function and pass the response to the output parser.

By addressing these areas, you should be able to resolve the validation errors related to the EvaluationData model. If the issue persists, further investigation into the LLM's response and the parsing logic may be necessary.

To continue talking to Dosu, mention @dosu.

Rohith-Scalers commented 2 days ago

@dosu can you give me a example output parser

dosubot[bot] commented 2 days ago

Certainly! Here's an example of an output parser using Pydantic that you can adapt for your use case:

from pydantic import BaseModel

# Define the EvaluationData model with the required fields
class EvaluationData(BaseModel):
    passing: bool
    feedback: str

# Example output parser function
def parse_evaluation_response(response: str) -> EvaluationData:
    try:
        # Assuming the response is a JSON string
        eval_data = EvaluationData.parse_raw(response)
        return eval_data
    except Exception as e:
        print(f"Error parsing evaluation response: {e}")
        raise

# Example usage
response = '{"passing": true, "feedback": "The answer is correct and well-explained."}'
parsed_data = parse_evaluation_response(response)
print(parsed_data)

This example assumes that the response from the language model is a JSON string containing the passing and feedback fields. Adjust the fields and types in the EvaluationData model as needed to match your specific requirements.

To continue talking to Dosu, mention @dosu.

Rohith-Scalers commented 2 days ago

@dosu can you identify the error


from llama_index.core.evaluation import GuidelineEvaluator
import nest_asyncio
from pydantic import BaseModel
nest_asyncio.apply()
# Define the EvaluationData model with the required fields
# class EvaluationData(BaseModel):
#     passing: bool
#     feedback: str

# # Example output parser function
# def parse_evaluation_response(response: str) -> EvaluationData:
#     try:
#         # Assuming the response is a JSON string
#         eval_data = EvaluationData.model_validate_json(response)
#         return eval_data
#     except Exception as e:
#         print(f"Error parsing evaluation response: {e}")
#         raise

class GuidelineScore:
    def __init__(self, llm , guidelines):
        self.llm = llm
        self.guidelines = guidelines
        self.evaluators  = [GuidelineEvaluator(llm=llm, guidelines=guide) for guide in self.guidelines]
        self.counterList = [0 for i in range(len(guidelines))]
        self.running_total_List = [0 for i in range(len(guidelines))]

    def get_score(self,query,response,contexts,indexs):
        for idx , (guideline, evaluator) in enumerate(zip(self.guidelines, self.evaluators)):
            if idx in indexs:
                eval_result = evaluator.evaluate(
                    query=query,
                    contexts=contexts,
                    response=response,
                )
                result = 1 if eval_result.passing else 0
                self.counterList[idx] = self.counterList[idx] + 1
                self.running_total_List[idx] = self.running_total_List[idx] + result ```
dosubot[bot] commented 2 days ago

The error in your code seems to be related to the commented-out EvaluationData model and the parse_evaluation_response function. The GuidelineEvaluator class expects an EvaluationData model to parse the response from the language model (LLM). Since the EvaluationData model and the parsing function are commented out, the evaluator cannot parse the response correctly, leading to the validation error.

To fix this, you need to ensure that the EvaluationData model is defined and used correctly. Here's how you can adjust your code:

  1. Uncomment and Define the EvaluationData Model: Make sure the EvaluationData model is defined with the required fields (passing and feedback).

  2. Implement the Output Parser: Ensure that the GuidelineEvaluator uses an output parser that can parse the LLM's response into an EvaluationData object.

Here's an example of how you might define the EvaluationData model and the parser function:

from pydantic import BaseModel

# Define the EvaluationData model with the required fields
class EvaluationData(BaseModel):
    passing: bool
    feedback: str

# Example output parser function
def parse_evaluation_response(response: str) -> EvaluationData:
    try:
        # Assuming the response is a JSON string
        eval_data = EvaluationData.parse_raw(response)
        return eval_data
    except Exception as e:
        print(f"Error parsing evaluation response: {e}")
        raise

Ensure that the GuidelineEvaluator is configured to use this parser to handle the LLM's response. This should resolve the validation errors related to missing fields in the EvaluationData model.

To continue talking to Dosu, mention @dosu.

Rohith-Scalers commented 2 days ago

@dosu its not the problem , sometimes the code works and sometimes its throwing this error , it think its due to llm unable to respond in this format passing: bool , feedback: str .

dosubot[bot] commented 2 days ago

To ensure that the language model (LLM) consistently responds with the required format, including the fields passing: bool and feedback: str, you should use the PydanticOutputParser with the EvaluationData model. This parser enforces the structure of the output to match the specified fields. Here's how it is set up in the code:

from pydantic import BaseModel, Field
from llama_index.core.evaluation import BaseEvaluator
from llama_index.core.output_parsers import PydanticOutputParser

class EvaluationData(BaseModel):
    passing: bool = Field(description="Whether the response passes the guidelines.")
    feedback: str = Field(description="The feedback for the response based on the guidelines.")

class GuidelineEvaluator(BaseEvaluator):
    def __init__(self, llm, guidelines, eval_template=None, output_parser=None):
        ...
        self._output_parser = output_parser or PydanticOutputParser(output_cls=EvaluationData)
        self._eval_template.output_parser = self._output_parser

By using PydanticOutputParser with EvaluationData, the LLM's output is parsed and validated against the EvaluationData model, ensuring that the response includes the passing and feedback fields [1].

To continue talking to Dosu, mention @dosu.