Closed mlevtov closed 4 months ago
To resolve the issue with context selection in a llama_index
RAG application when chat_mode
is set to "context", you should use the select_context
method provided by the TruChain
class. This method is designed to get the path to the context in the query output.
Here is an example of how you can use the select_context
method:
import numpy as np
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from trulens_eval import TruLlama, Feedback, Tru, feedback
from trulens_eval.feedback import Groundedness
tru = Tru()
tru.reset_database()
def evaluate(app_id, chat_engine, tru):
openai = feedback.OpenAI()
# Correct context selection using select_context
context_selection = TruLlama.select_context(chat_engine)
# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai.relevance, name="QA Relevance").on_input_output()
grounded = Groundedness(groundedness_provider=openai)
f_groundedness = (
Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness")
.on(context_selection)
.on_output()
)
f_context_relevance = (
Feedback(
openai.qs_relevance_with_cot_reasons,
np.max,
name="Context relevance",
)
.on_input()
.on(context_selection)
)
tru_recorder = TruLlama(
chat_engine,
app_id=app_id,
tru=tru,
feedbacks=[f_context_relevance, f_groundedness, f_qa_relevance],
)
question = "What did the author do growing up?"
with tru_recorder as recording:
response = chat_engine.chat(question)
print(response)
app_id = "test_chat_engine"
documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)
chat_engine = index.as_chat_engine(chat_mode="context")
evaluate(app_id, chat_engine, tru)
tru.run_dashboard()
This code snippet demonstrates how to select the context using the select_context
method and use it in a feedback function [1][2].
To continue talking to Dosu, mention @dosu.
Hi; can you print out the calls that got recorded in your example with this snippet (add it after you print response):
for call in recording.get().calls:
print(call.stack[-1].path)
The snippet you shared prints the following:
app._memory
app._retriever
app._retriever
app._llm
app._memory
app
If it's helpful, here's the entire output with the full stack trace:
🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.
✅ In QA Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In QA Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .
Could not locate app.query.rets.source_nodes[:].node.text in app/record.
Run of run in <Thread(TP.submit with debug timeout_0, started 10822430720)> failed with: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
Could not locate app.query.rets.source_nodes[:].node.text in app/record.
Exception in thread Thread-1 (_future_target_wrapper):
Traceback (most recent call last):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 708, in extract_selection
exception calling callback for <Future at 0x16ffa1840 state=finished raised RuntimeError>
Traceback (most recent call last):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 708, in extract_selection
arg_vals[k] = list(q_within_o.get(o))
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
[Previous line repeated 2 more times]
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 944, in get
for last_selection in last_step.get(start_selection):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 402, in get
raise KeyError(
KeyError: 'Key not in dictionary: query'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
callback(self)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 1173, in _add_future_feedback
res = future_result.result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
return func(*args, **kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 174, in _run_with_timeout
raise e
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 159, in _run_with_timeout
res: T = fut.result(timeout=timeout)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 504, in run
raise e
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 499, in run
input_combinations = list(
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 710, in extract_selection
raise RuntimeError(
RuntimeError: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
arg_vals[k] = list(q_within_o.get(o))
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
Run of run in <Thread(TP.submit with debug timeout_1, started 10856083456)> failed with: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
for start_selection in start_items:
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
exception calling callback for <Future at 0x16ff4db10 state=finished raised RuntimeError>
Traceback (most recent call last):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 708, in extract_selection
arg_vals[k] = list(q_within_o.get(o))
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
[Previous line repeated 2 more times]
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 944, in get
for last_selection in last_step.get(start_selection):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 402, in get
raise KeyError(
KeyError: 'Key not in dictionary: query'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
callback(self)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 1173, in _add_future_feedback
res = future_result.result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
return func(*args, **kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 174, in _run_with_timeout
raise e
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 159, in _run_with_timeout
res: T = fut.result(timeout=timeout)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 504, in run
raise e
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 499, in run
input_combinations = list(
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 710, in extract_selection
raise RuntimeError(
RuntimeError: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
for start_selection in start_items:
[Previous line repeated 2 more times]
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 944, in get
for last_selection in last_step.get(start_selection):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 402, in get
raise KeyError(
KeyError: 'Key not in dictionary: query'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
return func(*args, **kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 559, in _manage_pending_feedback_results
record.wait_for_feedback_results()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/schema.py", line 277, in wait_for_feedback_results
feedback_result = future_result.result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
callback(self)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 1173, in _add_future_feedback
res = future_result.result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
return func(*args, **kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 174, in _run_with_timeout
raise e
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 159, in _run_with_timeout
res: T = fut.result(timeout=timeout)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 504, in run
raise e
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 499, in run
input_combinations = list(
File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 710, in extract_selection
raise RuntimeError(
RuntimeError: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
Process finished with exit code 0
Hi,
Checking in on this.
Is there anything I should be doing differently to get this to work?
Apologies I forgot an important bit to print:
for call in recording.get().calls:
print(call.stack[-1].path, call.method())
That returns the following:
app._memory obj=Obj(cls=llama_index.core.memory.chat_memory_buffer.ChatMemoryBuffer, id=6337690640, init_bindings=None) name='put'
app._retriever obj=Obj(cls=llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever, id=6338512688, init_bindings=None) name='_retrieve'
app._retriever obj=Obj(cls=llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever, id=6338512688, init_bindings=None) name='retrieve'
app._llm obj=Obj(cls=llama_index.llms.openai.base.OpenAI, id=6343690256, init_bindings=None) name='wrapped_llm_chat'
app._memory obj=Obj(cls=llama_index.core.memory.chat_memory_buffer.ChatMemoryBuffer, id=6337690640, init_bindings=None) name='put'
app obj=Obj(cls=llama_index.core.chat_engine.context.ContextChatEngine, id=6337691456, init_bindings=None) name='chat'
Checking in on this again to make sure it's not lost in the shuffle.
I have upgraded to llama-index==0.10.44, trulens_eval==0.31.0. Now I'm getting a different error.
Here is the code:
import numpy as np
from openai import OpenAI
from trulens_eval import TruLlama, Tru, Feedback
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from trulens_eval.feedback.provider.openai import OpenAI as fOpenAI
tru = Tru()
documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)
openai_client = OpenAI()
provider = fOpenAI(client=openai_client)
chat_engine = index.as_chat_engine(chat_mode="context")
f_qa_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on_input_output()
)
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons, name = "Groundedness")
.on(TruLlama.select_source_nodes().node.text.collect())
.on_output()
# .aggregate(grounded.grounded_statements_aggregator)
)
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance")
.on_input()
.on(TruLlama.select_source_nodes().node.text)
.aggregate(np.mean)
)
tru_chat_recorder = TruLlama(chat_engine, app_id="test_chat_engine_2", feedbacks=[f_qa_relevance, f_groundedness, f_context_relevance])
with tru_chat_recorder as recording:
llm_response = chat_engine.chat("What did the author do growing up?")
tru.run_dashboard()
I get this error: ValueError: Some selectors do not exist in the app or record.
With this output:
✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.query.rets.source_nodes[:].node.text .
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Selector check failed ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Source of argument source to Groundedness does not exist in app or expected
record:
__record__.app.query.rets.source_nodes[:].node.text.collect()
# or equivalently
Select.RecordCalls.query.rets.source_nodes[:].node.text.collect()
The data used to make this check may be incomplete. If you expect records
produced by your app to contain the selected content, you can ignore this error
by setting selectors_nocheck in the TruLlama constructor. Alternatively, setting
selectors_check_warning will print out this message but will not raise an error.
Additional information:
Feedback function signature:
(source: str, statement: str) -> Tuple[float, dict]
The prefix __record__.app selects this data that exists in your app or typical
records:
• Object of type dict starting with:
{
'_retriever': {'retrieve': [...], '_retrieve': [...], '_aretrieve':
[...]},
'_llm': {
'complete': [...],
'stream_complete': [...],
'acomplete': [...],
'astream_complete': [...],
'chat': [...],
'achat': [...],
'stream_chat': [...]
},
'_memory': {'put': [...]},
'chat': [RecordAppCall(...), RecordAppCall(...)],
'achat': [RecordAppCall(...), RecordAppCall(...)],
'stream_chat': [RecordAppCall(...), RecordAppCall(...)]
}
python-BaseException
You can reproduce this by creating a new conda environment, running pip install llama-index==0.10.44 trulens-eval llama-index-readers-web
, then running the code above.
Thanks for the printouts. It looks like the app you configured has a component under the _retriever
attribute and that component might have called a method called retrieve
. Lets hope that this method has the same functionality as the query
method did in the app configured without the chat mode.
Can you try replacing the selector TruLlama.select_source_nodes().node.text.collect()
with the following:
# Add to imports: from trulens_eval import Select
Select.RecordCalls._retriever.retrieve.rets.source_nodes[:].node.text.collect()
You might have the adjust the parts after rets
in case the retrieve
method produces a different structure than query
did.
@sfc-gh-pmardziel Thanks. That got me further, but still not quite there.
I changed the functions to be the following:
f_qa_relevance = Feedback(
provider.relevance_with_cot_reasons, name="Answer Relevance"
).on_input_output()
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
.on(Select.RecordCalls._retriever.retrieve.rets.source_nodes[:].node.text.collect())
.on_output()
# .aggregate(grounded.grounded_statements_aggregator)
)
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
.on_input()
.on(Select.RecordCalls._retriever.retrieve.rets.source_nodes[:].node.text.collect())
.aggregate(np.mean)
)
Now it runs, but I get the following warnings:
Object (of type list is a sequence containing more than one dictionary. Lookup by item or attribute `source_nodes` is ambiguous. Use a lookup by index(es) or slice first to disambiguate.
Object (of type list is a sequence containing more than one dictionary. Lookup by item or attribute `source_nodes` is ambiguous. Use a lookup by index(es) or slice first to disambiguate.
In the dashboard the context is empty and context relevance and groundedness are both set to 0.0.
Do I have to get source_nodes
differently?
This message suggests that retriever was called more than once. I'm unsure which call is the one you might be looking for. You can adjust the selector for a each call, for example here are the selectors to pick the first and second call respectively.
Select.RecordCalls._retriever.retrieve[0].rets.source_nodes[:].node.text.collect()
Select.RecordCalls._retriever.retrieve[1].rets.source_nodes[:].node.text.collect()
You can also try looking at the contexts from all of the calls at once using this selector:
Select.RecordCalls._retriever.retrieve[:].rets.source_nodes[:].node.text.collect()
This seems to be stale - closing for now. If your issues persist @mlevtov - please reopen or open a new issue.
Bug Description When evaluating a llama_index rag application, context selection does not work properly when chat_mode="context"
To Reproduce
Expected behavior This should run and provide context relevance, answer relevance, and groundedness. This is what happens if you remove
chat_mode="context"
Relevant Logs/Tracebacks
Environment:
Additional context Should I be setting
context_selection
to something else?