Open kaoru-will opened 6 days ago
please help @joshreini1 thank you
Hey @kaoru-will! 👋 I'm here to help you with any bugs, questions, or contributions while you wait for a human maintainer. I'm Dosu, a bot designed to assist you with all things related to the repository.
To resolve the issue with the Context Relevance
feedback function providing inconsistent answers and sometimes failing with a ParseError
, you should ensure that the PromptTemplate
instances are correctly defined and used. Here are some steps and best practices to follow:
Ensure Correct PromptTemplate Usage:
Make sure the PromptTemplate
is correctly instantiated and used in your code. Here is the relevant part of the code for ContextRelevance
:
class ContextRelevance(Relevance, WithPrompt):
system_prompt: ClassVar[PromptTemplate] = PromptTemplate.from_template(
"""You are a RELEVANCE grader; providing the relevance of the given CONTEXT to the given QUESTION.
Respond only as a number from 0 to 10 where 0 is the least relevant and 10 is the most relevant.
A few additional scoring guidelines:
- Long CONTEXTS should score equally well as short CONTEXTS.
- RELEVANCE score should increase as the CONTEXTS provides more RELEVANT context to the QUESTION.
- RELEVANCE score should increase as the CONTEXTS provides RELEVANT context to more parts of the QUESTION.
- CONTEXT that is RELEVANT to some of the QUESTION should score of 2, 3 or 4. Higher score indicates more RELEVANCE.
- CONTEXT that is RELEVANT to most of the QUESTION should get a score of 5, 6, 7 or 8. Higher score indicates more RELEVANCE.
- CONTEXT that is RELEVANT to the entire QUESTION should get a score of 9 or 10. Higher score indicates more RELEVANCE.
- CONTEXT must be relevant and helpful for answering the entire QUESTION to get a score of 10.
- Never elaborate."""
)
user_prompt: ClassVar[PromptTemplate] = PromptTemplate.from_template(
"""QUESTION: {question}
CONTEXT: {context}
RELEVANCE: """
)
Check for ParseError:
The ParseError
might be due to incorrect formatting or missing placeholders in the template strings. Double-check the template strings for any syntax issues or missing placeholders.
Use the context_relevance
Method:
You can use the context_relevance
method from the Provider
class to generate the relevance score. Here is an example of how to use it:
def context_relevance(
self, question: str, context: str, temperature: float = 0.0
) -> float:
"""
Uses chat completion model. A function that completes a template to
check the relevance of the context to the question.
Args:
question (str): A question being asked.
context (str): Context related to the question.
Returns:
float: A value between 0.0 (not relevant) and 1.0 (relevant).
"""
return self.generate_score(
system_prompt=prompts.CONTEXT_RELEVANCE_SYSTEM,
user_prompt=str.format(
prompts.CONTEXT_RELEVANCE_USER,
question=question,
context=context
),
temperature=temperature
)
Example Usage: Here is an example of how to initialize and use feedback functions, including context relevance:
import numpy as np
from trulens_eval import Feedback
from trulens_eval.feedback.provider import OpenAI
# Initialize provider class
provider = OpenAI()
# select context to be used in feedback. the location of context is app specific.
from trulens_eval.app import App
context = App.select_context(rag_chain)
# Define a groundedness feedback function
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons)
.on(context.collect()) # collect context chunks into a list
.on_output()
)
# Question/answer relevance between overall question and answer.
f_answer_relevance = Feedback(provider.relevance).on_input_output()
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons)
.on_input()
.on(context)
.aggregate(np.mean)
)
# Instrument chain for logging with TruLens
from trulens_eval import TruChain
tru_recorder = TruChain(
rag_chain,
app_id='Chain1_ChatApplication',
feedbacks=[f_answer_relevance, f_context_relevance, f_groundedness]
)
response, tru_record = tru_recorder.with_record(
rag_chain.invoke, "What is Task Decomposition?"
)
# Retrieve records and feedback
rec = tru_record.get() # use .get if only one record
for feedback, feedback_result in rec.wait_for_feedback_results().items():
print(feedback.name, feedback_result.result)
By following these steps, you should be able to resolve the inconsistencies and errors with the Context Relevance
feedback function [1][2][3].
@joshreini1, it seems like this issue requires a deeper look and your expertise would be highly valuable here.
To continue talking to Dosu, mention @dosu.
from trulens_eval.feedback.provider.openai import OpenAI
from trulens_eval.app import App
from trulens_eval import Feedback, Select, TruLlama, FeedbackMode, feedback
import numpy as np
provider = OpenAI()
these are the modules that I used and the provider used
hi any updates on this?
To address the issue with the Context Relevance
feedback function providing inconsistent answers and sometimes failing with a ParseError
, you can follow these steps to ensure proper setup and usage:
Ensure Correct PromptTemplate Usage:
Make sure the PromptTemplate
is correctly instantiated and used in your code. Here is the relevant part of the code for ContextRelevance
:
class ContextRelevance(Relevance, WithPrompt):
system_prompt: ClassVar[PromptTemplate] = PromptTemplate.from_template(
"""You are a RELEVANCE grader; providing the relevance of the given CONTEXT to the given QUESTION.
Respond only as a number from 0 to 10 where 0 is the least relevant and 10 is the most relevant.
A few additional scoring guidelines:
- Long CONTEXTS should score equally well as short CONTEXTS.
- RELEVANCE score should increase as the CONTEXTS provides more RELEVANT context to the QUESTION.
- RELEVANCE score should increase as the CONTEXTS provides RELEVANT context to more parts of the QUESTION.
- CONTEXT that is RELEVANT to some of the QUESTION should score of 2, 3 or 4. Higher score indicates more RELEVANCE.
- CONTEXT that is RELEVANT to most of the QUESTION should get a score of 5, 6, 7 or 8. Higher score indicates more RELEVANCE.
- CONTEXT that is RELEVANT to the entire QUESTION should get a score of 9 or 10. Higher score indicates more RELEVANCE.
- CONTEXT must be relevant and helpful for answering the entire QUESTION to get a score of 10.
- Never elaborate."""
)
user_prompt: ClassVar[PromptTemplate] = PromptTemplate.from_template(
"""QUESTION: {question}
CONTEXT: {context}
RELEVANCE: """
)
Check for ParseError:
The ParseError
might be due to incorrect formatting or missing placeholders in the template strings. Double-check the template strings for any syntax issues or missing placeholders.
Use the context_relevance
Method:
You can use the context_relevance
method from the Provider
class to generate the relevance score. Here is an example of how to use it:
def context_relevance(
self, question: str, context: str, temperature: float = 0.0
) -> float:
"""
Uses chat completion model. A function that completes a template to
check the relevance of the context to the question.
Args:
question (str): A question being asked.
context (str): Context related to the question.
Returns:
float: A value between 0.0 (not relevant) and 1.0 (relevant).
"""
return self.generate_score(
system_prompt=prompts.CONTEXT_RELEVANCE_SYSTEM,
user_prompt=str.format(
prompts.CONTEXT_RELEVANCE_USER,
question=question,
context=context
),
temperature=temperature
)
Example Usage: Here is an example of how to initialize and use feedback functions, including context relevance:
import numpy as np
from trulens_eval import Feedback
from trulens_eval.feedback.provider import OpenAI
# Initialize provider class
provider = OpenAI()
# select context to be used in feedback. the location of context is app specific.
from trulens_eval.app import App
context = App.select_context(rag_chain)
# Define a groundedness feedback function
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons)
.on(context.collect()) # collect context chunks into a list
.on_output()
)
# Question/answer relevance between overall question and answer.
f_answer_relevance = Feedback(provider.relevance).on_input_output()
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons)
.on_input()
.on(context)
.aggregate(np.mean)
)
# Instrument chain for logging with TruLens
from trulens_eval import TruChain
tru_recorder = TruChain(
rag_chain,
app_id='Chain1_ChatApplication',
feedbacks=[f_answer_relevance, f_context_relevance, f_groundedness]
)
response, tru_record = tru_recorder.with_record(
rag_chain.invoke, "What is Task Decomposition?"
)
# Retrieve records and feedback
rec = tru_record.get() # use .get if only one record
for feedback, feedback_result in rec.wait_for_feedback_results().items():
print(feedback.name, feedback_result.result)
By following these steps, you should be able to resolve the inconsistencies and errors with the Context Relevance
feedback function [1][2].
To continue talking to Dosu, mention @dosu.
Bug Description The feedback function Context Relevance I am trying to use provides inconsistent answers. Sometimes it would give a successful Feedback, but most times it would give a Failed Feedback
To Reproduce Which steps should someone take to run into the same error? A small, reproducible code example is useful here. My Code
Expected behavior A clear and concise description of what you expected to happen. I have taken note that feedbacks dont immediately provide their respective result, which is why I had used wait_for_feedback_results so I can itirate per feedback result and get my supposed results
These are the logs that I get if it passes. It would only pass sometimes when I freshly run my code
Relevant Logs/Tracebacks Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks. If the issue is related to the TruLens dashboard, please also include a screenshot.
Environment:
Additional context Add any other context about the problem here. Is there a way where we can check if we're getting the context properly? Is this an issue where the context is not yet set, but the feedback suddenly runs?
This is the calls that im getting if the feedback passes