microsoft / TaskWeaver

A code-first agent framework for seamlessly planning and executing data analytics tasks.
https://microsoft.github.io/TaskWeaver/
MIT License
5.38k stars 691 forks source link

AgentOps integration issues with the limited info exposed in the `SessionEventHandler` class #445

Open the-praxs opened 19 hours ago

the-praxs commented 19 hours ago

Is your feature request related to a problem? Please describe.

I am using AgentOps as my agent observability platform, and I'm trying to build a TaskWeaver integration. Currently there are some blockers that prevent TaskWeaver from exporting the required information to AgentOps.

What is AgentOps?

AgentOps is a platform for tracking and analyzing the interactions between users and AI agents. It provides a python SDK for tracking the analytics of AI agents. and a dashboard for visualization of the collected data.
The docs are available here and the Github repo is available here.

Challenges integrating TaskWeaver with AgentOps

This PR is using the SessionEventHandler class to track the analytics of the TaskWeaver app. We want to track the information about different events in the TaskWeaver app i.e. the Session, the Round and the Post by mapping them to the AgentOps events - ActionEvent, LLMEvent, ToolEvent, and an additional ErrorEvent for reporing errors during tracking of any aforementioned events.

However, the following caveats are observed:

  1. When using the SessionEventHandler class, we are able to track most of the information except those of the LLM calls and the associated Tool calls. Since the send_message function is not handled by the SessionEventHandler class, the user query is not available for tracking. Similarly, the taskweaver_config.json file is not being tracked as the app_dir variable is not available in the SessionEventHandler class.
  2. AgentOps tracks LLM calls and events through monkey patching. It currently integrates with the official LLM provider libraries (e.g. openai, anthropic etc) but TaskWeaver uses a custom wrapper around the LLM provider libraries using a CompletionService class. I made an attempt to patch the CompletionService class for each LLM provider in TaskWeaver to track the LLM calls, but no information was captured.
  3. The TaskWeaver documentation mentions the use of OpenTelemetry for tracing, using the Tracing module which is a wrapper around the opentelemetry library. However, I see no detailed documentation on modifying this class to use a different tracing backend.

Describe the solution you'd like

From my observations, the TaskWeaver app is using the Injector library to manage the dependency injections in the codebase. The following proposed solutions are based on this observation:

  1. Modify the Injector class to inject the AgentOps python SDK so that the SessionEventHandler class can track the analytics of the TaskWeaver app.
  2. Modify the CompletionService class to track the LLM calls and the associated Tool calls.
  3. Modify the Tracing class to use a different tracing backend.

Alternatively, I would like to know if we can expose the information ranging from the user query to the LLM provider used in the SessionEventHandler class so that we can track the analytics using the AgentOps python SDK.

Describe alternatives you've considered

One of the ways is to pass the Session object to the AgentOps handler directly and extract the required information. This is not a clean solution as changes to the TaskWeaver codebase would break the integration and thus affect scalability.

Additional context

Here is the code I used to track the analytics of the TaskWeaver app:

from taskweaver.app.app import TaskWeaverApp
from agentops.partners.taskweaver_event_handler import TaskWeaverEventHandler

# This is the folder that contains the taskweaver_config.json file and not the repo root. Defaults to "./project/"
app_dir = "TaskWeaver/project/."
app = TaskWeaverApp(app_dir=app_dir)
session = app.get_session()
handler = TaskWeaverEventHandler()

session.event_emitter.register(handler)

user_query = "hello, what can you do?"
# response_round = session.send_message(user_query, event_handler=handler)
response_round = session.send_message(user_query)
print(response_round.to_dict())

This video will demonstrate the tracking on the AgentOps dashboard.

the-praxs commented 19 hours ago

@liqul here you will find the documentation of the AgentOps platform and the challenges in integrating TaskWeaver with it.

liqul commented 18 hours ago

To explain some context:

For now, Tracing has tracked all the input/output to the LLM endpoint. It is standard Opentelemetry tracing, and each trace is a single round of a conversation (or session), started with the user's request and ended with the response to the user. So, you should be able to find all information for ActionEvent and LLMEvent from the traces. However, I'm not an expert of Opentelemetry and not sure if it is easy to customize a Opentelemtry data collector.

The hard part is the ToolEvent. Currently, we only track the code snippet execution as a whole, and you may know that each tool is a python function call in the code. The dynamically generated code is also in the trace, but we do not support tracking individual function call during the execution. This part could be tricky to support as we do not assume that the code execution will always happen in the local enviroment. It could be inside a container or even on a remote server (though we haven't supported this feature yet), which means, to track the fine-grained execution time of each plugin, there is still quite a lot of effort on the infrastructure.

So, in conclusion, I believe you can find most information from the traces, but I haven't any experiences in handling Opentelemetry traces. Since you have already get most of ActionEvents from SessionEventHandler, having a minimum modification to CompletionService is sufficient for LLMEvents. But tracking fine-grained tool invocations is not supported today.

the-praxs commented 9 hours ago

I can experiment with the Tracing module for what I need in the LLMEvent attributes.

What kind of modifications are you thinking for the CompletionService? I think that's easier than collecting the traces so it will get the ball rolling.