truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
2.07k stars 181 forks source link

Error: ValueError: Some selectors do not exist in the app or record. #1106

Closed sadaf0714 closed 3 months ago

sadaf0714 commented 5 months ago

I would like to address an issue I am encountering while fetching context relevance and groundedness feedback metrics using TruLens. I am evaluating huggingface's "meta-llama/Llama-2-7b-chat-hf" (4 bit quantized version) model for RAG and LiteLLM(model_engine="huggingface/mistralai/Mistral-7B-Instruct-v0.1") for TruLens.

Below is the snapshot of error I am getting while running this piece of code: tru_recorder = TruChain( qa, app_id="App_1", feedbacks=[ f_qa_relevance, f_context_relevance, f_groundedness ] ) Error: ValueError: Some selectors do not exist in the app or record. image image

dosubot[bot] commented 5 months ago

To resolve the ValueError and "Selector check failed" issues with TruLens and models from Hugging Face, follow these targeted steps:

  1. Ensure Selector Targets Exist: Confirm that the data or method you're trying to select with a selector is present and accessible in the app or record.

  2. Use Correct Selector Syntax: Follow the TruLens documentation for the correct syntax. Utilize predefined selectors like on_input_output() for simplification.

  3. Handle Empty Selectors Gracefully: Use the if_exists parameter in your feedback definition to prevent errors when a selector fails to find its target.

  4. Leverage Selector Shortcuts: Simplify feedback definitions with shortcuts such as on_input(), on_output(), ensuring precise selection.

  5. Adopt Model-Specific Selectors: For specific frameworks, use the provided selectors that accurately extract parts of the model output.

  6. Specify Behavior for Missing Parameters: Use the if_missing parameter in feedback definitions to control behavior when a selector does not find its target.

  7. Debug with Logging: Increase logging level for detailed output on selector operations and potential failures.

  8. Utilize Huggingface Provider: For feedback functions calling Huggingface APIs, use the Huggingface class for methods like context_relevance.

  9. Validate Inputs: Ensure inputs match expected types and constraints to avoid ValueError.

For detailed guidance, refer to TruLens documentation on feedback schema, selector shortcuts, and the Huggingface provider. These resources provide comprehensive examples and best practices for effectively managing selectors and feedback functions with TruLens and Hugging Face models.

#### Details **Sources** - [trulens_eval/trulens_eval/schema/feedback.py](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/schema/feedback.py) - [docs/trulens_eval/evaluation/feedback_selectors/selector_shortcuts.md](https://github.com/truera/trulens/blob/main/docs/trulens_eval/evaluation/feedback_selectors/selector_shortcuts.md) - [trulens_eval/trulens_eval/feedback/provider/hugs.py](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/provider/hugs.py) - [trulens_eval/generated_files/all_tools.py](https://github.com/truera/trulens/blob/main/trulens_eval/generated_files/all_tools.py) - [docs/trulens_eval/evaluation/feedback_selectors/selecting_components.md](https://github.com/truera/trulens/blob/main/docs/trulens_eval/evaluation/feedback_selectors/selecting_components.md) - [trulens_eval/trulens_eval/tru_chain.py](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/tru_chain.py) - [trulens_eval/trulens_eval/schema/record.py](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/schema/record.py) - [trulens_eval/trulens_eval/feedback/feedback.py](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/feedback.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

sadaf0714 commented 5 months ago

I have checked and all of this is fine. Can you please suggest why it i not extracting context relevance and groundedness metrics while i am getting answer relevance fine.

sadaf0714 commented 5 months ago

Any updates on this?

joshreini1 commented 5 months ago

Hey @sadaf0714 - the issue here is that the context in your application call structure is not in the place you're defining it. Can you share more of your code so I can reproduce and help?

sadaf0714 commented 5 months ago

Sure here's the code: tru = Tru() tru.reset_database() os.environ["HUGGINGFACE_API_KEY"] = hf_token

provider = LiteLLM(model_engine="huggingface/mistralai/Mistral-7B-Instruct-v0.2")

f_qa_relevance = Feedback( provider.relevance_with_cot_reasons, name="Answer Relevance" ).on_input_output()

f_context_relevance = ( Feedback( provider.qs_relevance_with_cot_reasons, name="Context Relevance", ) .on(Select.RecordCalls.retrieve.args.query) .on(Select.RecordCalls.retrieve.rets.collect()) .aggregate(np.mean) ) grounded = Groundedness(groundedness_provider=provider)

f_groundedness = ( Feedback( grounded.groundedness_measure_with_cot_reasons, name="Groundedness", ) .on(Select.RecordCalls.retrieve.rets.collect()) .on_output() .aggregate(grounded.grounded_statements_aggregator) ) tru_recorder = TruChain( qa, app_id="App_1", feedbacks=[ f_qa_relevance, f_context_relevance, f_groundedness ] ) eval_ques = evaluationQuestions(ques_path) for question in eval_ques: with tru_recorder as recording: qa.run(question)

records, feedback = tru.get_records_and_feedback(app_ids=[])

metrices = records[["input", "output"] + feedback]

Please have a look. @joshreini1

joshreini1 commented 5 months ago

@sadaf0714 please share the RAG setup as well so I can reproduce.

sadaf0714 commented 5 months ago

please find the full notebook attached. https://github.com/sadaf0714/trulens/blob/main/TruLens_langchain.ipynb

joshreini1 commented 5 months ago

@sadaf0714 can you give me access to your repo?

sadaf0714 commented 5 months ago

https://github.com/sadaf0714/trulens @joshreini1 please try with this

joshreini1 commented 4 months ago

Hi @sadaf0714 - thanks for sharing your notebook.

Please try updating your selectors as follows:

  query = Select.Record.app.retriever._get_relevant_documents.args.query  
  context = Select.Record.app.retriever.get_relevant_documents.rets[:].page_content

  f_context_relevance = (
      Feedback(
          provider.qs_relevance_with_cot_reasons,
          name="Context Relevance",
      )
      .on(query)
      .on(context)
      .aggregate(np.mean)
  )
  grounded = Groundedness(groundedness_provider=provider)

  # Define a groundedness feedback function
  f_groundedness = (
      Feedback(
          grounded.groundedness_measure_with_cot_reasons,
          name="Groundedness",
      )
      .on(context.collect())
      .on_output()
      .aggregate(grounded.grounded_statements_aggregator)
  )

You can see where these selectors come from in the TruLens UI - see the screenshot below and let me know if you have questions on this. Thanks!

Screenshot 2024-05-09 at 6 49 41 PM
saisescapades commented 3 months ago

I am facing the same error. This is my code:

from trulens_eval import Feedback, Select from trulens_eval.feedback.provider.openai import AzureOpenAI

import numpy as np

provider = AzureOpenAI(deployment_name=azure_openai_chatgpt_deployment,api_version=azure_openai_api_version,azure_endpoint=azure_openai_endpoint,api_key=azure_openai_key)

Define a groundedness feedback function

f_groundedness = ( Feedback(provider.groundedness_measure_with_cot_reasons, name = "Groundedness") .on(Select.RecordCalls.retrieve.rets.collect()) .on_output() )

Question/answer relevance between overall question and answer.

f_answer_relevance = ( Feedback(provider.relevance_with_cot_reasons, name = "Answer Relevance") .on_input() .on_output() )

Context relevance between question and each context chunk.

f_context_relevance = ( Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance") .on_input() .on(Select.RecordCalls.retrieve.rets[:]) .aggregate(np.mean) # choose a different aggregation method if you wish )

from trulens_eval import TruCustomApp tru_rag = TruCustomApp(query_engine, app_id = 'RAG v1', feedbacks = [f_groundedness, f_answer_relevance, f_context_relevance])

I am using the official documentation for reference which is this: https://www.trulens.org/trulens_eval/getting_started/quickstarts/quickstart/#set-up-feedback-functions