gpt-4o-mini agent throwing error when using tools

kaitakami commented 1 month ago

Every time I run the agent I get the following error:

TypeError: undefined is not an object (evaluating 'JSON.stringify(value, null, 2).replace')
      at stringifyJSONToMessageContent (/.../node_modules/llamaindex/dist/internal/utils.js:19:68)
      at /.../node_modules/llamaindex/dist/agent/utils.js:101:30
[23:05:11.529] ERROR (24460): Error handling incoming message:
    err: {
      "type": "TypeError",
      "message": "undefined is not an object (evaluating 'JSON.stringify(value, null, 2).replace')",
      "stack":
          TypeError: undefined is not an object (evaluating 'JSON.stringify(value, null, 2).replace')
              at stringifyJSONToMessageContent (/.../node_modules/llamaindex/dist/internal/utils.js:19:68)
              at <anonymous> (/.../node_modules/llamaindex/dist/agent/utils.js:80:49)
              at processTicksAndRejections (:12:39)
      "originalLine": 8,
      "originalColumn": 40
    }

Code to reproduce this bug: (normal OpenAI Agent)

const openaiLLM = new OpenAI({ model: "gpt-4o-mini", temperature: 0.7 });

const onboardingAgent = new OpenAIAgent({
  chatHistory: conversation,
  tools: [updateUser, updateHealthMetrics, updateDietaryHabits, updatePreferredMessageTime, addAllergy, addInjury],
  llm: openaiLLM,
}

The agent only throws error when it makes use of the tools.

From my understanding gpt-4o-mini supports function calling, thus supporting llamaindex agents? Is this a known issue or am I doing something wrong?

This works perfectly fine when using gpt-4o

erik-balfe commented 1 month ago

It appears that the model is not as proficient in returning valid JSON objects, which leads to direct JSON parsing failures with its output. You might consider testing with simpler requests where it's easier for the model to generate valid JSON. Generally, smaller, cheaper models are less reliable in producing accurate JSON and thus function calling, except for those specifically fine-tuned for function calling.

Alternatively, this issue could potentially be mitigated with additional parsing techniques within the library to handle invalid JSON, or by prompting the model to provide correct JSON if it initially fails. Unfortunately, smaller models often struggle with this aspect, and there don't seem to be immediate plans to address this with extra parsing in the library.

kaitakami commented 1 month ago

I think invalid JSON should be handled by the library but I could be wrong. I will try prompt engineering to provide correct JSON and share my updates here. It would be a boomer if we can't use agents with gpt-4o-mini. The tools I'm passing are not complex.

erik-balfe commented 1 month ago

We can add some simple fixes to address this issue. For example, cases with JSON like:

 "arguments": "```json:{\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}" ```

can be corrected using a regex.

However, if a general-purpose model is consistently bad at generating correct JSON, it might be more effective to use a more proficient model rather than trying to fix its output. Llama 70B is the most affordable general-purpose model that can handle this well. Low-cost conversational models will continue to struggle with this until the LM provider fine-tunes the model.

@marcusschiesser Can I add some simple JSON fixing in agents and maybe optional retry logic when a tool call can't be parsed? For example integrating this one https://github.com/josdejong/jsonrepair

kaitakami commented 1 month ago

I was able to make gpt-4o-mini return valid JSON by being overly specific about the expected output. I specify which function to use and an example of valid JSON.

By doing this I can say that 99% of the cases it won't throw an error. Still, I think perhaps having a retry param would be quite useful for these cases where we get an error because of the model output when using agents.

Thank you for your help @erik-balfe! You saved me from migrating to something else :pray:

marcusschiesser commented 1 month ago

@erik-balfe using something like jsonrepair and an optional retry sounds like a good idea, what do you think @himself65 ?

himself65 commented 1 month ago

yeah I think retry time wound be acceptable, but I want to check what the behavior in python side first

run-llama / LlamaIndexTS

gpt-4o-mini agent throwing error when using tools #1077