run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.76k stars 4.74k forks source link

[Bug]: Output from tool calls are not very observable #9770

Closed tslmy closed 2 days ago

tslmy commented 6 months ago

Bug Description

What seems wrong

I can't see responses from function calls in Arize Phoenix...

image

... or in ChainLit:

image

What I was expecting

I usually can see results from other types of Spans, such as embedding in Arize Phoenix...

image

... and Query Engines tools in ChainLit:

image

so I was expecting the same level of observability for FunctionTools.

My guess

I think this is because "function_call_response", the string LlamaIndex uses to identify responses from generic function calls with callback handlers, was non-standard to OpenInference Tracing conventions.

The "function_call_response" is defined here: https://github.com/run-llama/llama_index/blob/3b522159eeade961618e2de7411fff87356bd6cb/llama_index/callbacks/schema.py#L61 .

It is sent to callback managers here: https://github.com/run-llama/llama_index/blob/3b522159eeade961618e2de7411fff87356bd6cb/llama_index/agent/react/step.py#L230 .

And the conventions are here. Indeed, Arize Phoenix doesn't know "function_call_response", as shown here:

https://github.com/Arize-ai/phoenix/blob/363e8913a3a0f7dafca5dc6bba6bf0e9776c1158/app/src/openInference/tracing/semanticConventions.ts#L51-L58

Version

0.9.23

Steps to Reproduce

from llama_hub.tools.wikipedia import WikipediaToolSpec
from llama_index.agent import ReActAgent
from llama_index.callbacks.base import CallbackManager
from llama_index.llms import OpenAILike
import chainlit as cl
import phoenix as px

# "Phoenix can display in real time the traces automatically collected from your LlamaIndex application."
# https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html
px.launch_app()

@cl.on_chat_start
async def factory():
    callback_manager = CallbackManager([cl.LlamaIndexCallbackHandler()])

    local_llm = OpenAILike(
        api_base="http://localhost:1234/v1",
        timeout=600,  # secs
        api_key="loremIpsum",
        # I'm using this: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta .
        model="zephyr beta 7B q5_k_m gguf",
        is_chat_model=True,
        is_function_calling_model=True,
        context_window=32768,
        callback_manager=callback_manager,
    )

    tool_spec = WikipediaToolSpec()
    agent = ReActAgent.from_tools(
        tools=tool_spec.to_tool_list(),
        llm=local_llm,
        callback_manager=callback_manager,
    )
    cl.user_session.set("agent", agent)

@cl.on_message
async def main(message: cl.Message):
    """ChainLit provides a web GUI for this application."""
    agent: ReActAgent = cl.user_session.get("agent")
    response = await cl.make_async(agent.chat)(message.content)
    response_message = cl.Message(content="")
    response_message.content = response.response
    await response_message.send()

Save the script above as main.py. Run in terminal:

chainlit run main.py -w

Ask the Agent "Search Wikipedia and summarize what is a cheesecake", and observe.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 6 months ago

🤖

It's great that you've identified the issue with the function_call_response not adhering to OpenInference Tracing conventions. Your detailed analysis and references to the specific code locations and conventions are very helpful.

If you have a proposed solution or fix in mind, I would encourage you to go ahead and create a pull request with your changes. Your contribution will be valuable in improving LlamaIndex and ensuring it aligns with the OpenInference Tracing conventions.

Thank you for your dedication to improving LlamaIndex!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich commented 6 months ago

@tslmy do you have any idea on what the fix here would be? Looking at arize's spec, I actually have no idea 😆 We should be populating a tool_calls attribute?

tslmy commented 6 months ago

@logan-markewich , tool_calls is closer, but it also doesn't seem to have a conventional way to specify tool outputs.

I opened an Issue to the Open Inference Specifications repo above, where I asked maintainers there to come up with a conventional way to specify tool outputs. Let's see what they will say.

dosubot[bot] commented 3 months ago

Hi, @tslmy,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue highlighted a bug where the output from function calls is not observable in Arize Phoenix and ChainLit due to a non-standard string used to identify responses from generic function calls with callback handlers. The resolution involved encouraging the author to create a pull request with their proposed changes, and there is ongoing discussion about the need for a conventional way to specify tool outputs, with an issue opened in the Open Inference Specifications repo for further discussion.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!

tslmy commented 3 months ago

@dosubot shall we wait till https://github.com/Arize-ai/open-inference-spec/issues/37 is closed before closing this?

teis-e commented 3 months ago

Hello, i am also trying to get a non OpenAI model working. Is OpenAILike the only way to do this, like you did. Or can i just use HuggingFaceLLM?

dosubot[bot] commented 3 months ago

Based on the context provided, OpenAILike is not the only option for integrating non-OpenAI models into your project. You can also use HuggingFaceLLM for this purpose. Both OpenAILike and HuggingFaceLLM are designed to provide interfaces for generating text completions and handling chat interactions, but they are tailored to work with different ecosystems.

OpenAILike provides an interface similar to OpenAI's API. It supports customization through various parameters such as model, api_key, context_window, max_tokens, and an optional tokenizer. It allows for generating text completions with the complete method and handling chat interactions with the chat method. You can find more details in the test_openai_like.py file.

On the other hand, HuggingFaceLLM is designed to integrate with Hugging Face's language models. It allows specifying a model and tokenizer from Hugging Face, with default options set to "StabilityAI/stablelm-tuned-alpha-3b". It supports custom configurations for the context window size, maximum number of new tokens to generate, system prompts, and query wrapper prompts. It also offers methods for both one-off text completions (complete) and streaming completions (stream_complete), as well as methods for handling chat interactions (chat and stream_chat). More details can be found in the huggingface.py file.

So, depending on the specific requirements of your project and the ecosystem you are working with, you can choose either OpenAILike or HuggingFaceLLM to integrate non-OpenAI models into your project.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 3 months ago

@teis-e openailike is for models hosted on a server that has the same API as openai

HuggingfaceLLM is specifically for running a model locally (no server, just in the same process/program as llamaindex)

teis-e commented 3 months ago

@teis-e openailike is for models hosted on a server that has the same API as openai

HuggingfaceLLM is specifically for running a model locally (no server, just in the same process/program as llamaindex)

Right! Thnx, but the openailike gives the opurtunity to run agents and function calling etc without OpenAI. So just a local model, like the Zypher from above?

logan-markewich commented 3 months ago

Using openailike does not magically give access to function calling (I wish haha)

The server running the api has to actually have this implemented to handle the tools kwarg when running the model