Closed tomdgr closed 1 month ago
chain.astream
When using chain.astream
, the structure of the data and the way you access it can differ from chain.invoke
. Here’s how you can correctly define your feedback functions:
Understanding the Data Structure:
chain.invoke
, you used Select.Record.app.middle[0].invoke.rets
to get the context.chain.astream
, you need to adapt the selectors to match the streaming data structure.Defining Feedback Functions:
on_input()
to specify the main app input.on_output()
to specify the main app output.on()
to specify other parts of the app record.Here’s an example of how you can define your feedback functions for chain.astream
:
from trulens_eval import Feedback, Select
import numpy as np
# Define feedback functions
def feedback_groundedness_function(context, response):
groundedness = openai_provider.groundedness_measure_with_cot_reasons(context, response)
return groundedness
f_groundedness = (
Feedback(feedback_groundedness_function, name="Groundedness")
.on(Select.Record.app.middle[0].rets) # Adjust selector for astream
.on_output()
.aggregate(np.mean)
)
f_context_relevance = (
Feedback(openai_provider.context_relevance_with_cot_reasons, name="Context Relevance")
.on_input()
.on(Select.Record.app.middle[0].rets) # Adjust selector for astream
.aggregate(np.mean)
)
f_feedback_cot = Feedback(feedback_cot_function, name="Answer Relevance COT").on_input().on_output()
astream
OutputThe astream
method can be slower due to several potential bottlenecks:
To mitigate these issues, consider the following:
TruChain
Here’s an example of how to use TruChain
with the defined feedback functions and chain.astream
:
from trulens_eval import TruChain, TruSession
from app.utils.langchain_funcs import get_chain
from app.datastructures import Conversation, Message
from langchain.memory import ConversationBufferMemory
from langchain_core.messages.ai import AIMessageChunk
# Initialize TruSession
tru = TruSession()
tru.reset_database()
tru.start_dashboard()
# Define conversation
conversation = Conversation(
title="Ny Samtale",
id="conversation-1234",
filename="sample_file.txt",
indexName="livsvitenskap",
promptType="juridisk ekspert",
messages=[Message(role="user", content="Hei, hva er de juridiske implikasjonene for saken?")]
)
# Get chain
chain = await get_chain(conversation=conversation, prompt_type="RESPONSE")
# Initialize memory
memory = ConversationBufferMemory(memory_key="history", return_messages=False)
for message in conversation.messages:
if message.role == "user":
memory.chat_memory.add_user_message(message.content)
elif message.role == "assistant":
memory.chat_memory.add_ai_message(message.content)
history = memory.load_memory_variables({})["history"]
# Initialize TruChain with feedback functions
tru_recorder = TruChain(
app_name="testapp",
app=chain,
provider=openai_provider,
feedbacks=[f_feedback_cot, f_groundedness, f_context_relevance]
)
# Stream responses
result = ""
async with tru_recorder as recording:
async for chunk in chain.astream({"conversation": conversation, "history": history}):
if "response" in chunk:
content = chunk["response"]
if isinstance(content, AIMessageChunk):
result += content.content
else:
raise Exception("Something went wrong in yield chunk in chain response, try opening a new chat")
This setup should help you correctly define and use feedback functions with chain.astream
and address the performance issues.
To continue talking to Dosu, mention @dosu.
Still need this to be looked at.
@tomdgr see you marked this as completed. Can you share what the resolution was? Thanks!
Hi @sfc-gh-jreini. There is no solution yet, but I have come alot further.
here is a general update on what has happened
STDERR
2024-09-24 15:01:10.262 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling
result = func()
^^^^^^
File "C:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 590, in code_to_exec
exec(code, module.__dict__)
File "C:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\dashboard\Leaderboard.py", line 839, in <module>
render_leaderboard(app_name)
File "C:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\dashboard\Leaderboard.py", line 814, in render_leaderboard
_render_grid_tab(
File "C:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\dashboard\Leaderboard.py", line 494, in _render_grid_tab
grid_data.event_data
AttributeError: 'AgGridReturn' object has no attribute 'event_data'
Here is the new chain:
from langchain_core.output_parsers.string import StrOutputParser
from langchain_openai import AzureChatOpenAI
from app.config import (AZURE_OPENAI_API_KEY,
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_VERSION,
AZURE_SEARCH_ENDPOINT,
AZURE_SEARCH_KEY
)
async def get_chain(conversation: Conversation, prompt_type: str = "RESPONSE") -> AsyncIterator[str]:
"""
Asynchronously creates a language model chain and streams the generated responses.
Args:
conversation (Conversation): The current conversation context containing messages and other details.
prompt_type (str): The type of prompt to use, either "RESPONSE" or another supported prompt type. Defaults to "RESPONSE".
Returns:
RunnableSequence: The final chain object that can be used for streaming responses.
"""
hyper_params = initialize_hyperparameters()
retriever = initialize_retriever(
hyper_params=hyper_params,
conversation=conversation
)
prompt = construct_prompt(
conversation=conversation,
prompt_type=prompt_type,
hyper_params=hyper_params
)
llm = AzureChatOpenAI(
deployment_name=hyper_params.get("deployment_name"),
openai_api_key=AZURE_OPENAI_API_KEY,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
api_version=hyper_params.get("azure_openai_version"),
temperature=hyper_params.get("temperature"),
streaming=hyper_params.get("streaming")
)
# context chain
context_chain = RunnableLambda(lambda x: retriever.get_relevant_documents(x["conversation"]))
# query chain
query_chain = RunnableLambda(lambda x: create_query(x["conversation"]))
# Create an instance of StrOutputParser
str_parser = StrOutputParser()
# main chain
parallel_chain = (
RunnableParallel(
{
"context": context_chain,
"query": query_chain,
"history": itemgetter("history"),
}
)
)
final_chain = (
parallel_chain
| prompt
| llm
| str_parser
)
return final_chain
Hi @tomdgr regarding the aggrid issue, can you verify you're using streamlit-aggrid>=1.0.5
which should resolve that issue? It's possible while reinstalling trulens there may have been a dependency version mismatch.
hi @sfc-gh-chu. I managed to solve that by updating the package.
c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\feedback\llm_provider.py:289: UserWarning: No supporting evidence provided. Returning score only.
warnings.warn(
Error removing trivial statements: unterminated string literal (detected at line 21) (<string>, line 21). Proceeding with all statements.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3 based on the overlap of information. Please provide a specific statement for evaluation.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3. Please provide a specific statement for evaluation.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3 based on the criteria.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3 based on the criteria.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3. Please provide a specific statement for evaluation.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3 based on the overlap of information. Please provide a specific statement for evaluation.
Multiple valid rating values found in the string: I'm sorry, I can only provide a score from 0 to 3.
Feedback Function exception caught: Traceback (most recent call last):
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\core\feedback\feedback.py", line 895, in run
mod_base_endpoint.Endpoint.track_all_costs_tally(
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\core\feedback\endpoint.py", line 516, in track_all_costs_tally
result, cbs = Endpoint.track_all_costs(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\core\feedback\endpoint.py", line 489, in track_all_costs
return Endpoint._track_costs(
^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\core\feedback\endpoint.py", line 581, in _track_costs
result: T = __func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tomg\AppData\Local\Temp\ipykernel_24300\1291823463.py", line 40, in feedback_groundedness_function
groundedness = openai_provider.groundedness_measure_with_cot_reasons(context, response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\feedback\llm_provider.py", line 1584, in groundedness_measure_with_cot_reasons
results.append(future.result())
^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\_base.py", line 401, in __get_result
raise self._exception
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\core\utils\python.py", line 633, in _future_target_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\feedback\llm_provider.py", line 1566, in evaluate_hypothesis
score, reason = self.generate_score_and_reasons(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\feedback\llm_provider.py", line 282, in generate_score_and_reasons
re_configured_rating(
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\feedback\generated.py", line 73, in re_configured_rating
raise ParseError("int or float number", s, pattern=PATTERN_NUMBER)
trulens.feedback.generated.ParseError: Tried to find int or float number using pattern ([+-]?[0-9]+\.[0-9]*|[1-9][0-9]*|0) in
I'm sorry, I can't assist with that.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\tomg\AppData\Local\Programs\Python\Python311\Lib\site-packages\trulens\core\feedback\feedback.py", line 910, in run
raise RuntimeError(
RuntimeError: Evaluation of Groundedness failed on inputs:
{'context': 'some string'
Is the only thing I hav enot been able to fix yet,
def feedback_groundedness_function(context, response):
groundedness = openai_provider.groundedness_measure_with_cot_reasons(context, response)
return groundedness
f_groundedness = (
Feedback(feedback_groundedness_function, name="Groundedness")
.on(Select.Record.app.middle[0].astream.args.input.context)
.on_output()
)
I am currently using:
retriever = AzureAISearchRetriever(
service_name = AZURE_SEARCH_ENDPOINT,
api_key = AZURE_SEARCH_KEY,
index_name = "livsvitenskap",
top_k=top_k,
content_key = "content"
)
context = TruChain.select_context(chain)
context
#gives otuput;
Lens().__record__.app.first.steps__.context.first.invoke.rets[:].page_content
# however this is not in Records
@tomdgr Hi Tom,
could you try either updating the groundedness feedback definition by changing one param use_sent_tokenize
to True
def feedback_groundedness_function(context, response):
groundedness = openai_provider.groundedness_measure_with_cot_reasons(context, response, use_sent_tokenize=True)
return groundedness
?
Alternatively, the latest TruLens source on the main branch should work w/o the argument use_sent_tokenize
set explicitly to True.
My guess based on the stacktrace is that our groundedness feedback function was not able to get the context
as source and generate proper feedback scores, because of the previous released version of TruLens was using LLM to perform sentence splitting and lead to hallucination (this happen as intermediate steps in groudnedss_measure_with_cot_reasons
)
Let me know if this works and happy to dive deeper if not. thanks!
@sfc-gh-dhuang Thank you for the response, I will take a look at this and update you asap
Hey @tomdgr - any success?
Closing this for now @tomdgr pls reopen if you need more help
Bug Description Issue number 1:
Is there a way to figure out how to correctly pass the on_input(), on() and on_ouput() to the feedback-functions in general? for .invoke it was Select.Record.app.middle[0].invoke.rets to get the context, on_input() got the answer, and on_output() got the response. However this is now not the same for chain.astream
This is maybe different for different langcain-chains, so here is mine added:
which results in the following output from
chain = await get_chain(conversationt=conversation,prompt_type="RESPONSE")
:So the main question is: how do I figure out how to properly define my feedback-functions based on a langchain? The examples provided in examples/notebooks/ was to simple and I also could not find any in the documentation
Issue number 2:
To Reproduce
Expected behavior Groundedness, Context Relevance and Answer Relevance all with COT in dashboard, comparably fast to chain.invoke, when calling chain.astream instead.
Relevant Logs/Tracebacks Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks. If the issue is related to the TruLens dashboard, please also include a screenshot.
Environment:
3.11.8 python trulens 1.0.1 trulens-apps-langchain 1.0.1 trulens-core main ON PR #1487 trulens-dashboard 1.0.1 trulens_eval 1.0.1 trulens-feedback 1.0.1 trulens-providers-openai 1.0.1 langchain 0.2.16 langchain-cli 0.0.31 langchain-community 0.2.17 langchain-core 0.2.40 langchain-openai 0.1.25 langchain-text-splitters 0.2.4
Additional context The datastructures used can be found in my previous issue #1487