truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
2.14k stars 184 forks source link

Way to get fixed selector call even if RAG structure is changing. #1300

Closed poojatambe closed 3 weeks ago

poojatambe commented 3 months ago

Bug Description Implemented a hybrid retriever RAG with feedback functions to evaluate it. Then improved RAG with multiple index. The selector for context got changed. Is there any way to get fixed selector call for getting context to find context relevance and groundedness?

To Reproduce Which steps should someone take to run into the same error? A small, reproducible code example is useful here.

Expected behavior A clear and concise description of what you expected to happen.

Relevant Logs/Tracebacks Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks. If the issue is related to the TruLens dashboard, please also include a screenshot.

Environment:

Additional context Add any other context about the problem here.

dosubot[bot] commented 3 months ago

Hello @poojatambe! I'm here to help you with any bugs, questions, or contributions you have. Let's squash those bugs together!

To maintain a fixed selector call for context to ensure consistent context relevance and groundedness after improving a hybrid retriever RAG with multiple indexes, you can use the select_context method provided by trulens_eval. This method works across different frameworks such as Langchain, Llama-index, and NeMo.

Here is an example of how you can use it:

from trulens_eval.app import App

# Assuming `rag_chain` is your hybrid retriever RAG with multiple indexes
context = App.select_context(rag_chain)

Additionally, you can set up feedback functions to evaluate the relevance and groundedness of the retrieved context. The RAG triad includes three key evaluations: context relevance, groundedness, and answer relevance. These evaluations help ensure that your LLM application is free from hallucinations by verifying the quality of the retrieved context, the groundedness of the generated answers, and the relevance of the final response to the user input.

Here is how you can define and use these feedback functions:

from trulens_eval import Feedback, Select, Tru, TruChain, feedback
import numpy as np

tru = Tru()
provider = feedback.OpenAI()

# Define groundedness
f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(TruChain.select_context(rag_chain).collect())  # context
    .on_output()
)

# Question/answer relevance between overall question and answer.
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance").on_input_output()
)

# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(TruChain.select_context(rag_chain))
    .aggregate(np.mean)
)

feedback_functions = [f_answer_relevance, f_context_relevance, f_groundedness]

# Wrap with TruLens
tru_chain_recorder = TruChain(
    rag_chain, app_id='RAG_Application', feedbacks=feedback_functions
)

# Example of submitting queries and tracking them
prompts = [
    "Name some famous dental floss brands?",
    "Which year did Cincinnati become the Capital of Ohio?",
    "Which year was Hawaii's state song written?",
    "How many countries are there in the world?",
    "How many total major trophies has Manchester United won?"
]

with tru_chain_recorder as recording:
    for prompt in prompts:
        rag_chain(prompt)

By implementing these feedback functions, you can ensure that the context retrieval is consistent and grounded, regardless of the underlying framework used in your RAG implementation [1][2][3].

To continue talking to Dosu, mention @dosu.

poojatambe commented 3 months ago

Hi, Getting error with context = App.select_context(rag_chain) : "Cannot find any BaseRetriever in app."

dosubot[bot] commented 3 months ago

To resolve the error "Cannot find any BaseRetriever in app" when using the select_context method from trulens_eval, ensure that your LangChain application includes at least one BaseRetriever. The select_context method looks for instances of BaseRetriever in the provided app. If none are found, it raises this error.

Here's an example of how to define a LangChain RAG application with a retriever:

from langchain import hub
from langchain.chat_models import ChatOpenAI
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Define your retriever
retriever = vectorstore.as_retriever()

# Define your prompt and LLM
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Define your RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Wrap your application in a TruChain recorder
from trulens_eval import TruChain
tru_recorder = TruChain(
    rag_chain,
    app_id='Chain1_ChatApplication',
    feedbacks=[f_context_relevance]
)

# Use the select_context method
from trulens_eval.app import App
context = App.select_context(rag_chain)

Ensure that the retriever is correctly defined and included in your application chain. This should resolve the error and allow the select_context method to function correctly [1][2].

To continue talking to Dosu, mention @dosu.

sfc-gh-pdharmana commented 3 weeks ago

@poojatambe closing this for now. pls reopen if you still need help