[Bug]: FunctionTool not passing referenced objects/enums in json schema

leotronolone commented 9 months ago

Bug Description

I am no longer able to use referenced objects/enums within function call inputs on the latest version of llamaindex(0.9.40). This is caused by the referenced objects no longer being added to the json schema sent to the LLM when defining the function calls. These json schemas were being generated from the fn_schema input(which extends the BaseModel) from FunctionTool.fromDefaults() previously.

EX: LlamaIndex Latest version: '{"properties": {"ask": {"description": "natural language query string", "title": "Ask", "type": "string"}, "entity_types": {"description": "The types of entities to search for answer", "items": {"$ref": "#/$defs/EntityType"}, "title": "Entity Types", "type": "array"}, "required": ["ask", "entity_types"], "type": "object"}'

LlamaIndex Old Version: "{'$defs': {'EntityType': {'enum': ['file', 'response', 'member'], 'title': 'EntityType', 'type': 'string'}}, 'properties': {'ask': {'description': 'natural language query string', 'title': 'Ask', 'type': 'string'}, 'entity_types': {'description': 'The types of entities to search for answer', 'items': {'$ref': '#/$defs/EntityType'}, 'title': 'Entity Types', 'type': 'array'}}, 'required': ['ask', 'entity_types'], 'title': 'AskQuestionGPTFunctionModel', 'type': 'object'}"

This is causing the LLM to return function call inputs that don't respect the desired json schema, causing an exception when it tries to call the function using a bad input value.

EX: it will try calling the AskQuestion function with entity_types = ['document'] instead of one of the valid values of ["file", "response", "member"]

This was working for me on 0.9.14. I tried updating pydantic and openai to the latest version as well but had no luck

Version

0.9.15+

Steps to Reproduce

Run the following example code:

from enum import Enum
from llama_index.llms.openai import OpenAI
from llama_index.tools import FunctionTool
from llama_index.agent import OpenAIAgent

from pydantic import BaseModel

class MATH_OPERATOR(str, Enum):
  ADD = "add"
  SUBTRACT = "subtract"

def calculate(math_operator: MATH_OPERATOR, valueA: int, valueB: int):
   return "answer"

class CalculateFunctionModel(BaseModel):
  math_operator: MATH_OPERATOR
  valueA: int
  valueB: int

def run():
  llm = OpenAI(model="gpt-4-1106-preview", temperature=0)
  calculator_tool = FunctionTool.from_defaults(
      fn=calculate,
      fn_schema=CalculateFunctionModel
    )
  agent = OpenAIAgent.from_tools(llm=llm, tools=[calculator_tool], verbose=True)
  return agent.chat("what is 2 * 2, make sure to use a tool")

Relevant Logs/Tracbacks

Added user message to memory: what is 212 * 2909, make sure to use a tool
** Messages: **
user: what is 212 * 2909, make sure to use a tool
**************************************************
** Response: **
assistant: None
**************************************************

=== Calling Function ===
Calling function: calculate with args: {"math_operator":"MULTIPLY","valueA":212,"valueB":2909}

dosubot[bot] commented 9 months ago

🤖

Thank you for the detailed bug report, leotronolone. It seems like the issue is related to the get_parameters_dict method in the ToolMetadata class. This method is responsible for generating the JSON schema for the function parameters. In the latest version of LlamaIndex, the method is filtering out keys that are not in the list ["type", "properties", "required", "definitions"].

Here is the relevant code snippet:

parameters = self.fn_schema.schema()
parameters = {
    k: v
    for k, v in parameters.items()
    if k in ["type", "properties", "required", "definitions"]
}

This means that any additional keys in the schema, such as those representing referenced objects and enums, are being removed. This is likely the cause of the LLM returning function call inputs that don't respect the desired JSON schema.

To fix this issue, you could modify the get_parameters_dict method to include these additional keys in the schema. However, without knowing the exact structure of your schemas and how they are being used, it's hard to provide a specific solution.

Unfortunately, I wasn't able to find the exact changes made to the get_parameters_dict method between version 0.9.14 and 0.9.40 of LlamaIndex. It would be helpful if you could provide more information about the changes you've noticed in the behavior of this method between these versions.

You can find the source code for the ToolMetadata class in the llama_index/tools/types.py file.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

llama_index/tools/types.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

leotronolone commented 6 months ago

FYI I was able to make a workaround where I defined the enum/object's schema in the description for the field.


class CalculateFunctionModel(BaseModel):
  math_operator: MATH_OPERATOR = Field(description=f"Possible MATH_OPERATOR vales: {[a.value for a in MATH_OPERATOR]}")
  valueA: int
  valueB: int

This workaround only works for the first layer of referenced object schemas though(which is enough for my usecase). Fixing this would still be greatly appreciated as I'm sure my usecase will need this eventually 🙏

dosubot[bot] commented 6 months ago

Thank you for sharing your workaround, leotronolone! We appreciate your feedback and understanding of the issue. We'll go ahead and close this issue now. If you have any more questions or need further assistance in the future, feel free to reach out.

#### Details

**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

leotronolone commented 6 months ago

@dosu Can we reopen this? The underlying issue has still not been resolved

dosubot[bot] commented 6 months ago

@logan-markewich Can you please assist with reopening this issue? The user has indicated that the underlying issue still needs attention. Thank you!

#### Details

**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich commented 6 months ago

@leotronolone pretty sure this is solved in latest versions

leotronolone commented 6 months ago

@logan-markewich This workaround was required for 0.10.33 (latest version as of today)

logan-markewich commented 6 months ago

@leotronolone looks fine to me, but I think you are defining the function tool in an unexpected way

https://colab.research.google.com/drive/1GEt_dkEXnnVkygBqtwYnFfMOAvr78OKi?usp=sharing

run-llama / llama_index