truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
2.16k stars 189 forks source link

context_selection doesn't work correctly when chat_mode = "context" #1175

Closed mlevtov closed 4 months ago

mlevtov commented 5 months ago

Bug Description When evaluating a llama_index rag application, context selection does not work properly when chat_mode="context"

To Reproduce

import numpy as np
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from trulens_eval import TruLlama, Feedback, Tru, feedback
from trulens_eval.feedback import Groundedness

tru = Tru()
tru.reset_database()

def evaluate(app_id, chat_engine, tru):
    openai = feedback.OpenAI()

    context_selection = TruLlama.select_source_nodes().node.text

    # Question/answer relevance between overall question and answer.
    f_qa_relevance = Feedback(openai.relevance, name="QA Relevance").on_input_output()

    grounded = Groundedness(groundedness_provider=openai)
    f_groundedness = (
        Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness")
        .on(context_selection)
        .on_output()
        # .aggregate(grounded.grounded_statements_aggregator)
    )

    f_context_relevance = (
        Feedback(
            openai.qs_relevance_with_cot_reasons,
            np.max,
            name="Context relevance",
        )
        .on_input()
        .on(context_selection)
    )

    tru_recorder = TruLlama(
        chat_engine,
        app_id=app_id,
        tru=tru,
        feedbacks=[f_context_relevance, f_groundedness, f_qa_relevance],
    )

    question = "What did the author do growing up?"
    with tru_recorder as recording:
        response = chat_engine.chat(question)
    print(response)

app_id = "test_chat_engine"

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine(chat_mode="context")

evaluate(app_id, chat_engine, tru)

tru.run_dashboard()

Expected behavior This should run and provide context relevance, answer relevance, and groundedness. This is what happens if you remove chat_mode="context"

Relevant Logs/Tracebacks

RuntimeError: Could not locate app.query.rets.source_nodes[:].node.text in app/record.

Environment:

Additional context Should I be setting context_selection to something else?

dosubot[bot] commented 5 months ago

To resolve the issue with context selection in a llama_index RAG application when chat_mode is set to "context", you should use the select_context method provided by the TruChain class. This method is designed to get the path to the context in the query output.

Here is an example of how you can use the select_context method:

import numpy as np
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from trulens_eval import TruLlama, Feedback, Tru, feedback
from trulens_eval.feedback import Groundedness

tru = Tru()
tru.reset_database()

def evaluate(app_id, chat_engine, tru):
    openai = feedback.OpenAI()

    # Correct context selection using select_context
    context_selection = TruLlama.select_context(chat_engine)

    # Question/answer relevance between overall question and answer.
    f_qa_relevance = Feedback(openai.relevance, name="QA Relevance").on_input_output()

    grounded = Groundedness(groundedness_provider=openai)
    f_groundedness = (
        Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness")
        .on(context_selection)
        .on_output()
    )

    f_context_relevance = (
        Feedback(
            openai.qs_relevance_with_cot_reasons,
            np.max,
            name="Context relevance",
        )
        .on_input()
        .on(context_selection)
    )

    tru_recorder = TruLlama(
        chat_engine,
        app_id=app_id,
        tru=tru,
        feedbacks=[f_context_relevance, f_groundedness, f_qa_relevance],
    )

    question = "What did the author do growing up?"
    with tru_recorder as recording:
        response = chat_engine.chat(question)
    print(response)

app_id = "test_chat_engine"

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine(chat_mode="context")

evaluate(app_id, chat_engine, tru)

tru.run_dashboard()

This code snippet demonstrates how to select the context using the select_context method and use it in a feedback function [1][2].

To continue talking to Dosu, mention @dosu.

piotrm0 commented 5 months ago

Hi; can you print out the calls that got recorded in your example with this snippet (add it after you print response):

for call in recording.get().calls:
    print(call.stack[-1].path)
mlevtov commented 5 months ago

The snippet you shared prints the following:

app._memory
app._retriever
app._retriever
app._llm
app._memory
app

If it's helpful, here's the entire output with the full stack trace:

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.
✅ In QA Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In QA Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .
Could not locate app.query.rets.source_nodes[:].node.text in app/record.
Run of run in <Thread(TP.submit with debug timeout_0, started 10822430720)> failed with: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
Could not locate app.query.rets.source_nodes[:].node.text in app/record.
Exception in thread Thread-1 (_future_target_wrapper):
Traceback (most recent call last):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 708, in extract_selection
exception calling callback for <Future at 0x16ffa1840 state=finished raised RuntimeError>
Traceback (most recent call last):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 708, in extract_selection
    arg_vals[k] = list(q_within_o.get(o))
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
  [Previous line repeated 2 more times]
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 944, in get
    for last_selection in last_step.get(start_selection):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 402, in get
    raise KeyError(
KeyError: 'Key not in dictionary: query'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 1173, in _add_future_feedback
    res = future_result.result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
    return func(*args, **kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 174, in _run_with_timeout
    raise e
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 159, in _run_with_timeout
    res: T = fut.result(timeout=timeout)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 504, in run
    raise e
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 499, in run
    input_combinations = list(
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 710, in extract_selection
    raise RuntimeError(
RuntimeError: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
    arg_vals[k] = list(q_within_o.get(o))
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
Run of run in <Thread(TP.submit with debug timeout_1, started 10856083456)> failed with: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
    for start_selection in start_items:
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
exception calling callback for <Future at 0x16ff4db10 state=finished raised RuntimeError>
Traceback (most recent call last):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 708, in extract_selection
    arg_vals[k] = list(q_within_o.get(o))
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
  [Previous line repeated 2 more times]
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 944, in get
    for last_selection in last_step.get(start_selection):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 402, in get
    raise KeyError(
KeyError: 'Key not in dictionary: query'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 1173, in _add_future_feedback
    res = future_result.result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
    return func(*args, **kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 174, in _run_with_timeout
    raise e
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 159, in _run_with_timeout
    res: T = fut.result(timeout=timeout)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 504, in run
    raise e
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 499, in run
    input_combinations = list(
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 710, in extract_selection
    raise RuntimeError(
RuntimeError: Could not locate app.query.rets.source_nodes[:].node.text in app/record.
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 943, in get
    for start_selection in start_items:
  [Previous line repeated 2 more times]
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 944, in get
    for last_selection in last_step.get(start_selection):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/serial.py", line 402, in get
    raise KeyError(
KeyError: 'Key not in dictionary: query'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
    return func(*args, **kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 559, in _manage_pending_feedback_results
    record.wait_for_feedback_results()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/schema.py", line 277, in wait_for_feedback_results
    feedback_result = future_result.result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/app.py", line 1173, in _add_future_feedback
    res = future_result.result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/python.py", line 374, in _future_target_wrapper
    return func(*args, **kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 174, in _run_with_timeout
    raise e
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/utils/threading.py", line 159, in _run_with_timeout
    res: T = fut.result(timeout=timeout)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 504, in run
    raise e
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 499, in run
    input_combinations = list(
  File "/Users/mlevtov/bin/miniconda3/envs/trulens_test/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 710, in extract_selection
    raise RuntimeError(
RuntimeError: Could not locate app.query.rets.source_nodes[:].node.text in app/record.

Process finished with exit code 0
mlevtov commented 5 months ago

Hi,

Checking in on this.

Is there anything I should be doing differently to get this to work?

piotrm0 commented 5 months ago

Apologies I forgot an important bit to print:

for call in recording.get().calls:
    print(call.stack[-1].path, call.method())
mlevtov commented 5 months ago

That returns the following:

app._memory obj=Obj(cls=llama_index.core.memory.chat_memory_buffer.ChatMemoryBuffer, id=6337690640, init_bindings=None) name='put'
app._retriever obj=Obj(cls=llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever, id=6338512688, init_bindings=None) name='_retrieve'
app._retriever obj=Obj(cls=llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever, id=6338512688, init_bindings=None) name='retrieve'
app._llm obj=Obj(cls=llama_index.llms.openai.base.OpenAI, id=6343690256, init_bindings=None) name='wrapped_llm_chat'
app._memory obj=Obj(cls=llama_index.core.memory.chat_memory_buffer.ChatMemoryBuffer, id=6337690640, init_bindings=None) name='put'
app obj=Obj(cls=llama_index.core.chat_engine.context.ContextChatEngine, id=6337691456, init_bindings=None) name='chat'
mlevtov commented 5 months ago

Checking in on this again to make sure it's not lost in the shuffle.

I have upgraded to llama-index==0.10.44, trulens_eval==0.31.0. Now I'm getting a different error.

Here is the code:

import numpy as np
from openai import OpenAI
from trulens_eval import TruLlama, Tru, Feedback
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from trulens_eval.feedback.provider.openai import OpenAI as fOpenAI

tru = Tru()

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

openai_client = OpenAI()
provider = fOpenAI(client=openai_client)

chat_engine = index.as_chat_engine(chat_mode="context")
f_qa_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on_input_output()
)

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name = "Groundedness")
    .on(TruLlama.select_source_nodes().node.text.collect())
    .on_output()
    # .aggregate(grounded.grounded_statements_aggregator)
)

f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance")
    .on_input()
    .on(TruLlama.select_source_nodes().node.text)
    .aggregate(np.mean)
)
tru_chat_recorder = TruLlama(chat_engine, app_id="test_chat_engine_2", feedbacks=[f_qa_relevance, f_groundedness, f_context_relevance])

with tru_chat_recorder as recording:
    llm_response = chat_engine.chat("What did the author do growing up?")

tru.run_dashboard()

I get this error: ValueError: Some selectors do not exist in the app or record.

With this output:

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.query.rets.source_nodes[:].node.text .
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                            Selector check failed                             ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Source of argument source to Groundedness does not exist in app or expected     
record:                                                                         

 __record__.app.query.rets.source_nodes[:].node.text.collect()                  
 # or equivalently                                                              
 Select.RecordCalls.query.rets.source_nodes[:].node.text.collect()              

The data used to make this check may be incomplete. If you expect records       
produced by your app to contain the selected content, you can ignore this error 
by setting selectors_nocheck in the TruLlama constructor. Alternatively, setting
selectors_check_warning will print out this message but will not raise an error.

                            Additional information:                             

Feedback function signature:                                                    

 (source: str, statement: str) -> Tuple[float, dict]                            

The prefix __record__.app selects this data that exists in your app or typical  
records:                                                                        

 • Object of type dict starting with:                                           

       {                                                                        
         '_retriever': {'retrieve': [...], '_retrieve': [...], '_aretrieve':    
 [...]},                                                                        
         '_llm': {                                                              
           'complete': [...],                                                   
           'stream_complete': [...],                                            
           'acomplete': [...],                                                  
           'astream_complete': [...],                                           
           'chat': [...],                                                       
           'achat': [...],                                                      
           'stream_chat': [...]                                                 
         },                                                                     
         '_memory': {'put': [...]},                                             
         'chat': [RecordAppCall(...), RecordAppCall(...)],                      
         'achat': [RecordAppCall(...), RecordAppCall(...)],                     
         'stream_chat': [RecordAppCall(...), RecordAppCall(...)]                
       }                                                                        

python-BaseException

You can reproduce this by creating a new conda environment, running pip install llama-index==0.10.44 trulens-eval llama-index-readers-web, then running the code above.

sfc-gh-pmardziel commented 5 months ago

Thanks for the printouts. It looks like the app you configured has a component under the _retriever attribute and that component might have called a method called retrieve. Lets hope that this method has the same functionality as the query method did in the app configured without the chat mode.

Can you try replacing the selector TruLlama.select_source_nodes().node.text.collect() with the following:

# Add to imports: from trulens_eval import Select
Select.RecordCalls._retriever.retrieve.rets.source_nodes[:].node.text.collect()

You might have the adjust the parts after rets in case the retrieve method produces a different structure than query did.

mlevtov commented 5 months ago

@sfc-gh-pmardziel Thanks. That got me further, but still not quite there.

I changed the functions to be the following:

f_qa_relevance = Feedback(
    provider.relevance_with_cot_reasons, name="Answer Relevance"
).on_input_output()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(Select.RecordCalls._retriever.retrieve.rets.source_nodes[:].node.text.collect())
    .on_output()
    # .aggregate(grounded.grounded_statements_aggregator)
)

f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(Select.RecordCalls._retriever.retrieve.rets.source_nodes[:].node.text.collect())
    .aggregate(np.mean)
)

Now it runs, but I get the following warnings:

Object (of type list is a sequence containing more than one dictionary. Lookup by item or attribute `source_nodes` is ambiguous. Use a lookup by index(es) or slice first to disambiguate.
Object (of type list is a sequence containing more than one dictionary. Lookup by item or attribute `source_nodes` is ambiguous. Use a lookup by index(es) or slice first to disambiguate.

In the dashboard the context is empty and context relevance and groundedness are both set to 0.0.

Do I have to get source_nodes differently?

sfc-gh-pmardziel commented 4 months ago

This message suggests that retriever was called more than once. I'm unsure which call is the one you might be looking for. You can adjust the selector for a each call, for example here are the selectors to pick the first and second call respectively.

Select.RecordCalls._retriever.retrieve[0].rets.source_nodes[:].node.text.collect()
Select.RecordCalls._retriever.retrieve[1].rets.source_nodes[:].node.text.collect()

You can also try looking at the contexts from all of the calls at once using this selector:

Select.RecordCalls._retriever.retrieve[:].rets.source_nodes[:].node.text.collect()
sfc-gh-jreini commented 4 months ago

This seems to be stale - closing for now. If your issues persist @mlevtov - please reopen or open a new issue.